email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

tnakamot · 2025-01-17T15:37:00Z

Bug report

Bug description:

I would like to extract a binary file in a multipart MIME file using email.parser.BytesParser, but a byte sequence "0x0d 0x0a" (CR + LF) in the binary file is replaced by "0x0a" (LF). Below is a minimal reproducible example.

from email.parser import BytesParser
from email.policy import default
from io import BytesIO

mime_file_byte_array = b'MIME-Version: 1.0\r\nContent-Type: multipart/mixed; boundary="MIME\
_boundary-1";\r\n\r\n--MIME_boundary-1\r\nContent-Type: application/octet-stream\r\nContent\
-Location: test.bin\r\n\r\na\r\nb\r\n--MIME_boundary-1--\r\n\r\n'
fp = BytesIO(mime_file_byte_array)
parser = BytesParser(policy=default)
msg = parser.parse(fp)

parts = [part for part in msg.walk()]
binary_data = parts[1].get_payload(decode=True)

print('===== Beginning of Original MIME File =====')
print(mime_file_byte_array.decode())
print('===== End of Original MIME File =====')
print('')
print('===== test.bin after parse =====')
print(binary_data)
print('===== test.bin after parse =====')

As can be seen in the fifth line, the multipart MIME file includes a binary file "test.bin". The contents of the binary file is b"a\r\nb".
Therefore, the variable binary_data is supposed to contain b"a\r\nb", but it was actually b"a\nb".

It is probably because TextIOWrapper in BytesParser.parse() translates CR+LF to LF on Linux.

cpython/Lib/email/parser.py

Line 103 in 767c89b

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape')

When I replaced the above line with the line below, this problem was fixed. However, this fix may have a side effect which I cannot foresee.

fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')

CPython versions tested on:

3.10

Operating systems tested on:

Linux

The text was updated successfully, but these errors were encountered:

tnakamot added the type-bug An unexpected behavior, bug, or error label Jan 17, 2025

picnixz added stdlib Python modules in the Lib dir topic-email labels Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

tnakamot commented Jan 17, 2025 •

edited

Loading

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

email.parser.BytesParser.parse() cannot handle binary data that include \x0d \x0a correctly. #128949

Comments

tnakamot commented Jan 17, 2025 • edited Loading

Bug report

Bug description:

CPython versions tested on:

Operating systems tested on:

tnakamot commented Jan 17, 2025 •

edited

Loading