Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Problem when using optimized Reader #396

Open
tom-slb opened this issue Oct 31, 2022 · 2 comments
Open

Problem when using optimized Reader #396

tom-slb opened this issue Oct 31, 2022 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@tom-slb
Copy link

tom-slb commented Oct 31, 2022

Hi, I'm trying to optimize reading from Parquet files as outlined here: https://github.com/segmentio/parquet-go#optimizing-reads
I am using a schema-less reading approach, so there are no data classes or schemas defined in my application. I am trying to read columns (pages) of data in the form of Go slices, e.g. []float64.

My problem is this: In my parquet file, all columns are defined as optional, so I'm getting *parquet.optionalPageValues when reading a page. This does not implement DoubleReader and there does not seem to be any way to get the underlying "base" ValueReader (which supposedly is a DoubleReader). So at present, it is not possible to use optimized reads into Go slices directly.

Are there any overrides that could be used, for example to ignore that the columns are optional in parquet?

Many thanks.

@achille-roussel achille-roussel self-assigned this Oct 31, 2022
@achille-roussel
Copy link

Hello @tom-slb, thanks for opening the issue.

This is a known limitation of the current APIs, it only works for required columns, not optional nor repeated.

We have plans to revisit these APIs but nothing has yet been implemented.

@achille-roussel achille-roussel added the question Further information is requested label Oct 31, 2022
@hhoughgg
Copy link
Contributor

Is the same issue true for optimizing writes? Seems that its not possible to write optional fields even with parquet.Value since the underlying type of the ColumnBuffers for an int64 is []int64 instead of []*int64.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants