You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/24 18:52:55 UTC

[GitHub] [arrow-rs] tustvold commented on issue #2916: Perf about ParquetRecordBatchStream vs ParquetRecordBatchReader

tustvold commented on issue #2916:
URL: https://github.com/apache/arrow-rs/issues/2916#issuecomment-1289456560

   This is expected, see the investigation under https://github.com/apache/arrow-rs/issues/1473.
   
   The TLDR is that in the absence of resource contention, synchronous blocking code will often outperform the corresponding asynchronous code. This is especially true of file IO, where there aren't stable non-blocking operating system APIs, and so tokio implements this by offloading the task of reading from the files to a separate blocking thread pool. Eventually projects like [tokio-uring](https://github.com/tokio-rs/tokio-uring) may address this.
   
   The advantage of async comes where either:
   
   * You are communicating over some network connection, e.g. to object storage
   * There is resource contention, where instead of blocking the thread on IO, you could be getting on with processing some other part of the query
   
   Async is about efficiently multiplexing work, if you don't have anything to multiplex, you aren't going to see a return from it
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org