You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/19 22:02:18 UTC

[GitHub] [arrow-rs] tustvold commented on issue #2110: Parallel fetching of column chunks in ParquetRecordBatchStream

tustvold commented on issue #2110:
URL: https://github.com/apache/arrow-rs/issues/2110#issuecomment-1189594490

   I think prior to fetching in parallel I would suggest the following:
   
   * Add a min_fetch_bytes parameter, with it falling back to fetching the entire row group or file, if it is smaller than this threshold
   * Add the metadata_size_hint you suggest
   * Coalesce adjacent byte ranges into a single request, potentially allowing gaps of some configurable threshold
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org