You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/02/02 23:13:00 UTC

[GitHub] [arrow] westonpace commented on issue #33710: [C++][Parquet] Add WriteRecordBatchAsync to parquet writer

westonpace commented on issue #33710:
URL: https://github.com/apache/arrow/issues/33710#issuecomment-1414497267

   > In short, WriteRecordBatch is subject to max number of rows allowed in a row group. So it may slice the input record batch and write the sliced batches into different row groups in order.
   
   I'm personally not too worried about that feature as a lot of that behavior can be obtained with the [dataset writer](https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/dataset_writer.cc) which has `max_rows_per_file` and `max_rows_per_group` and is independent of format.  It handles multiple parallel writes across multiple files.
   
   > CMIW, we don't have the utility to support this yet.
   
   We have decent utilities for working with async tasks.  For example, you could use a [throttled async task scheduler](https://github.com/apache/arrow/blob/master/cpp/src/arrow/util/async_util.h) if you want to execute them in order.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org