Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunked data reading #543

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

chunked data reading #543

wants to merge 4 commits into from

Conversation

jsbucy
Copy link

@jsbucy jsbucy commented Feb 28, 2025

What do these changes do?

Add new handler hook DATA_CHUNK which is invoked from the data reading loop. This allows streaming the data to a file or other storage api without having to buffer the whole message in memory first.

Are there changes in behavior for the user?

The new hook is opt-in, there should be no behavior changes for existing users.

Related issue number

This may also improve/fix #293 which I suspect is due to the tight loop decoding dotstuff for the whole message at once hogging the GIL in the old implementation.

Checklist

  • I think the code is well written
  • Unit tests for the changes exist
  • tox testenvs have been executed in the following environments:
    • Linux (Ubuntu 18.04, Ubuntu 20.04, Arch): {py36,py37,py38,py39}-{nocov,cov,diffcov}, qa, docs
    • Windows (7, 10): {py36,py37,py38,py39}-{nocov,cov,diffcov}
    • WSL 1.0 (Ubuntu 18.04): {py36,py37,py38,py39}-{nocov,cov,diffcov}, pypy3-{nocov,cov}, qa, docs
    • FreeBSD (12.2, 12.1, 11.4): {py36,pypy3}-{nocov,cov,diffcov}, qa
    • Cygwin: py36-{nocov,cov,diffcov}, qa, docs
  • Documentation reflects the changes
  • Add a news fragment into the NEWS.rst file
    • Add under the "aiosmtpd-next" section, creating one if necessary
      • You may create subsections to group the changes, if you like
    • Use full sentences with correct case and punctuation
    • Refer to relevant Issue if applicable

smtp_DATA() could buffer an unbounded amount of data in line_fragments until
it got crlf and we're going to throw it away anyway.

drop TestSMTPWithController.test_long_line_leak which is now moot
…g loop.

DATA_CHUNK takes 3 parameters:
data : bytes, decoded_data : Optional[str], last : bool
and returns Optional[bytes] response

If the hook returns a response prior to the last=True chunk, smtp_DATA
will read/discard the remaining data from the client without invoking
the hook again.

This allows streaming the data to a file or other storage api without
having to buffer the whole message in memory first.

Move dotstuff and utf8 decode into data reading loop to support this.

This may also improve/fix aio-libs#293 which I suspect is due to the tight
loop decoding dotstuff for the whole message at once hogging the GIL
in the old implementation.
otherwise do it at the end as before so we don't ~double the memory
while we're reading from the client
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Entire python process stalls when receiving multiple large inbound emails
1 participant