You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/13 02:35:47 UTC

[GitHub] [arrow-datafusion] andygrove opened a new issue #867: ParquetExec should parallelize statistics scan operations

andygrove opened a new issue #867:
URL: https://github.com/apache/arrow-datafusion/issues/867


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   ParquetExec currently uses a single thread to iterate over all input files and collect statistics. This can be very slow at scale (> 30 seconds) and it would be good to parallelize this work.
   
   **Describe the solution you'd like**
   Use tokio to scan each file in parallel.
   
   **Describe alternatives you've considered**
   N/A
   
   **Additional context**
   N/A
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org