You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would like to extract a binary file in a multipart MIME file using email.parser.BytesParser, but a byte sequence "0x0d 0x0a" (CR + LF) in the binary file is replaced by "0x0a" (LF). Below is a minimal reproducible example.
fromemail.parserimportBytesParserfromemail.policyimportdefaultfromioimportBytesIOmime_file_byte_array=b'MIME-Version: 1.0\r\nContent-Type: multipart/mixed; boundary="MIME\_boundary-1";\r\n\r\n--MIME_boundary-1\r\nContent-Type: application/octet-stream\r\nContent\-Location: test.bin\r\n\r\na\r\nb\r\n--MIME_boundary-1--\r\n\r\n'fp=BytesIO(mime_file_byte_array)
parser=BytesParser(policy=default)
msg=parser.parse(fp)
parts= [partforpartinmsg.walk()]
binary_data=parts[1].get_payload(decode=True)
print('===== Beginning of Original MIME File =====')
print(mime_file_byte_array.decode())
print('===== End of Original MIME File =====')
print('')
print('===== test.bin after parse =====')
print(binary_data)
print('===== test.bin after parse =====')
As can be seen in the fifth line, the multipart MIME file includes a binary file "test.bin". The contents of the binary file is b"a\r\nb".
Therefore, the variable binary_data is supposed to contain b"a\r\nb", but it was actually b"a\nb".
It is probably because TextIOWrapper in BytesParser.parse() translates CR+LF to LF on Linux.
Bug report
Bug description:
I would like to extract a binary file in a multipart MIME file using email.parser.BytesParser, but a byte sequence "0x0d 0x0a" (CR + LF) in the binary file is replaced by "0x0a" (LF). Below is a minimal reproducible example.
As can be seen in the fifth line, the multipart MIME file includes a binary file "test.bin". The contents of the binary file is b"a\r\nb".
Therefore, the variable
binary_data
is supposed to contain b"a\r\nb", but it was actually b"a\nb".It is probably because TextIOWrapper in BytesParser.parse() translates CR+LF to LF on Linux.
cpython/Lib/email/parser.py
Line 103 in 767c89b
When I replaced the above line with the line below, this problem was fixed. However, this fix may have a side effect which I cannot foresee.
fp = TextIOWrapper(fp, encoding='ascii', errors='surrogateescape', newline='')
CPython versions tested on:
3.10
Operating systems tested on:
Linux
The text was updated successfully, but these errors were encountered: