You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/11/03 11:34:49 UTC

[GitHub] [arrow-rs] pier-oliviert commented on pull request #3002: Parquet Writer: Make column descriptor public on the writer

pier-oliviert commented on PR #3002:
URL: https://github.com/apache/arrow-rs/pull/3002#issuecomment-1301970648

   @tustvold Maybe it's possible to do what I'm trying to do using a different API, but currently, what I'm working on is a way for me to aggregate events that comes in from a different thread, and package data into a parquet file using the SerializedColumnWriter.
   
   What this means, is that I have an in-memory cache that I'm holding, and then constructing the parquet file (row group) when it reaches a certain size.
   
   Since the ColumnDescriptor is not really available, as far as I can tell, I have to basically create an index myself to figure out which column I am referring to when I call `writer.next_column()`.
   
   If I had the possibility to gather some kind of information from the columnWriter, it would help understand the context I'm in and avoid some bookkeeping on my side. 
   
   Does that make sense?
   
   I apologize for the breaking change, I guess I misinterpreted what "User-facing changes" meant.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org