-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get unused data from end of DecompressionStream? #39
Comments
This a difficult problem. In the special case of the packfile format, we could solve it with a mode which continues decompressing after it reaches the end. In the general case, something like |
We also got internal feedback about this API: https://bugzilla.mozilla.org/show_bug.cgi?id=1901316#c3 Could we throw something like DecompressionError that extends TypeError with additional fields with decompressed data so far + unused data? |
I feel like there should be an enum, something like
where My idea for "something-else" is that For example, suppose your input consisted of gzip-compressed data followed by uncompressed data and you just wanted to concatenate it. You would do something like const ds = new DecompressionStream('gzip', { extraData: 'use' });
response.body.pipeTo(ds.writable);
await ds.readable.pipeTo(destination, { preventClose: true });
ds.extraData.pipeTo(destination); What do you think? We could potentially implement the first three options while still bikeshedding the fourth option. |
Hmm, throwing exception certainly wouldn't work well with pipes. I wonder we could make pipes reusable; allowing destination to be closed gracely so that the source side can still be piped to elsewhere. Could be more general than decompression specific and also could deal with "exotic" case you mentioned. But then it's not clear what would be done with the partially consumed chunk. |
I'm not very familiar with streams and compression, but hopefully this is understandable.
For
deflate
, the spec states "It is an error if there is additional input data after the ADLER32 checksum."For
gzip
, the spec says "It is an error if there is additional input data after the end of the "member"."As expected, Chrome's current implimentation throws a TypeError ("Junk found after end of compressed data.") when extra data is written to a DecompressionStream.
This error can be caught and ignored, but there doesn't seem to be a way of retrieving the already-written-but-not-used "junk" data. There seems to be an assumption here that developers already know the length of the compressed data, and can provide exactly that data and nothing more. On the contrary, this "junk" data can be very important in cases where the compressed data is embedded in another stream and you don't know the length of the compressed data.
A good example of this is Git's PackFile format, which only tells you the size of the uncompressed data, not the compressed size. In such a case you must rely on the decompressor to tell you when it's done decompressing data, and then handle the remaining data.
My attempt at putting together an example:
Now, as a workaround, I could write the data to my first stream one byte at a time, saving the most recently written byte and carrying it over when the writer throws that specific exception - But writing one byte at a time feels very inefficient and adds a lot of complexity, and checking for that specific error message seems fragile (it might chage, and other implimentations might use a different message.)
Zlib itself provides a way to know what bytes weren't used (though I don't know any details about how.)
Python's zlib api provides an
unused_data
property that contains the unused bytes.Node's zlib api provides a
bytesWritten
property that can be used to calculate the unused data.It would be great to have something similar available in the DecompressionStream api.
The text was updated successfully, but these errors were encountered: