chore(dataobj): Download pages in 16MB batches #16689
Open
+506
−106
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Previously, each page in a call to
ReadPages
would result in one request to storage. This added a lot of latency when data objects were backed by object storage, with the roundtrip time accumulating.This PR enables pages in a call to ReadPages to be batched into 16MB windows (from S3's recommendation of using 8MB or 16MB chunks; 16MB was chosen to further reduce roundtrips). Windows are currently downloaded sequentially, though this could be updated to use concurrency if desired.
The effectiveness of this code depends on reading multiple columns and pages at once; this only happens when using dataset.Reader from #16429.
Additionally: