-
Notifications
You must be signed in to change notification settings - Fork 57
Question: Too much memory to write parquet files #412
Comments
Are you familiar with using pprof to profile a Go program? The first thing we would do is try to reproduce the results you're seeing and then analyze the quantity and size of memory allocations, but given you can reproduce it consistently, it may be easier for you to generate a profile. https://pkg.go.dev/runtime/pprof |
Actually, I have used pprof to analyze the program. The alloc space of parquet.Write is as follows. The alloc space of red file is 700MB. |
Had similar issues, and were never able to determine where the memory went even with those traces. Never managed to fix them, either. See #118 Indeed playing with GOGC environment variables did seem to help a bit, which indicate maybe that the problem lies in how the go runtime deal with garbage collection rather than this library. Definitely not an expert in this. But eventually, the amount of RAM needed was on the order of magnitude of the combined size of all the data (uncompressed!) that had to go in the file, whereas I thought that you could theoretically write a parquet file using very little memory by flushing things often. |
Thank you, that's really helpful. |
I believe the issue may come from using We could try modifying the I'm also curious whether you are calling |
When I write a new parquet file(about 100 MB data before compression), 1GB of memory is requested. I'm wondering that why it takes up too much memory.
I use the following method to write file and the struct of data is as bellow:
err := parquet.Write[*model.x](data, rawParquetData, compressionType)
The text was updated successfully, but these errors were encountered: