Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

Open
tnakamot opened this issue Jan 17, 2025 · 0 comments
Labels
stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error

Comments

@tnakamot
Copy link

tnakamot commented Jan 17, 2025

Bug report

Bug description:

I would like to extract a binary file in a multipart MIME file using email.parser.BytesParser, but a byte sequence "0x0d 0x0a" (CR + LF) in the binary file is replaced by "0x0a" (LF). Below is a minimal reproducible example.

from email.parser import BytesParser
from email.policy import default
from io import BytesIO

mime_file_byte_array = b'MIME-Version: 1.0\r\nContent-Type: multipart/mixed; boundary="MIME\
_boundary-1";\r\n\r\n--MIME_boundary-1\r\nContent-Type: application/octet-stream\r\nContent\
-Location: test.bin\r\n\r\na\r\nb\r\n--MIME_boundary-1--\r\n\r\n'
fp = BytesIO(mime_file_byte_array)
parser = BytesParser(policy=default)
msg = parser.parse(fp)

parts = [part for part in msg.walk()]
binary_data = parts[1].get_payload(decode=True)

print('===== Beginning of Original MIME File =====')
print(mime_file_byte_array.decode())
print('===== End of Original MIME File =====')
print('')
print('===== test.bin after parse =====')
print(binary_data)
print('===== test.bin after parse =====')

As can be seen in the fifth line, the multipart MIME file includes a binary file "test.bin". The contents of the binary file is b"a\r\nb".
Therefore, the variable binary_data is supposed to contain b"a\r\nb", but it was actually b"a\nb".

It is probably because TextIOWrapper in BytesParser.parse() translates CR+LF to LF on Linux.

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape')

When I replaced the above line with the line below, this problem was fixed. However, this fix may have a side effect which I cannot foresee.

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')

CPython versions tested on:

3.10

Operating systems tested on:

Linux

@tnakamot tnakamot added the type-bug An unexpected behavior, bug, or error label Jan 17, 2025
@picnixz picnixz added stdlib Python modules in the Lib dir topic-email labels Jan 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-email type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

No branches or pull requests

2 participants