You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/11 11:25:46 UTC

[GitHub] [arrow-datafusion] tustvold commented on issue #1532: Discussion: Switch DataFusion to using arrow2?

tustvold commented on issue #1532:
URL: https://github.com/apache/arrow-datafusion/issues/1532#issuecomment-1009868992


   > Will arrow-rs eventually support async file IO? Requiring a synchronous ChuckReader is currently a major limitation in supporting alternate ObjectStores
   
   FWIW it would be relatively straightforward to support async IO within the context of arrow-rs. You need buffered fetching in order to get reasonable IO performance anyway, and so you just do an async fetch into a buffer and then use the sync decoders to decode it. I believe this is what arrow2 is doing anyway?? I quickly cobbled something together showing how this can be done with parquet [here](https://github.com/apache/arrow-rs/pull/1154).
   
   FWIW I have some optimisations to the arrow-rs parquet reader in flight that yield some pretty significant speedups https://github.com/apache/arrow-rs/pull/1054, https://github.com/apache/arrow-rs/pull/1082. And I am planning to work on dictionary preservation next which should yield orders of magnitude speedups for string dictionaries.
   
   I would _personally_ prefer an approach that sees the great work on arrow2 cherry-picked into arrow-rs, with `arrow2` serving as an incubator for new ideas. I am happy to help out with this if there are things people would particularly like to see ported across? The current ecosystem fragmentation is just unfortunate for both users and contributors imo...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org