You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/18 01:13:26 UTC

[GitHub] [arrow] westonpace commented on issue #12653: Conversion from one dataset to another that will not fit in memory?

westonpace commented on issue #12653:
URL: https://github.com/apache/arrow/issues/12653#issuecomment-1071923418


   At the moment we generally use too much memory when scanning parquet.  This is because the scanner's readahead is unfortunately based on the row group size and not the batch size.  Using smaller row groups in your source files will help.  #12228 changes the readahead to be based on the batch size but it's been on my back burner for a bit.  I'm still optimistic I will get to it for the 8.0.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org