You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/08 16:25:12 UTC

[GitHub] [arrow-datafusion] tustvold commented on pull request #2677: Switch to object_store crate (#2489)

tustvold commented on PR #2677:
URL: https://github.com/apache/arrow-datafusion/pull/2677#issuecomment-1150132924

   I think this is now ready for review, I've created https://github.com/apache/arrow-datafusion/pull/2711 which uses currently unreleased functionality in arrow-rs to do byte range fetches to object storage.
   
   This PR does represent a 10-20% performance regression in the parquet SQL benchmarks when operating on local files. This largely results from moving from spawn_blocking and the corresponding scheduler implications documented in https://github.com/apache/arrow-rs/issues/1473. 
   
   However, I am inclined to think this is fine for a couple of reasons:
   
   * The work on the new scheduler, which is currently blocked by this PR, was specifically created to address this scheduling disparity
   * The difference becomes inconsequential for any non-trivial queries
   * The ongoing work by @Ted-Jiang will help to reduce the IO costs of parquet
   * I think this lays a solid foundation on which we can iterate


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org