You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/31 14:44:53 UTC

[GitHub] [arrow-datafusion] tustvold commented on issue #3214: Don't scan first column on empty projection

tustvold commented on issue #3214:
URL: https://github.com/apache/arrow-datafusion/issues/3214#issuecomment-1233034720

   I think there are two different optimisations being discussed here:
   
   * Skip interacting with the file based on catalog statistics if available
   * Remove projection "hack" and delegate to file readers
   
   Parquet has supported the latter since https://github.com/apache/arrow-rs/pull/1560, and CSV/JSON will support it once https://github.com/apache/arrow-rs/pull/2604 is released. I think it should be then be possible to remove the workaround, as it will be no longer necessary.
   
   As to the former, I think it should be fairly straightforward to implement a physical optimiser pass that uses statistics to simplify counts into projections based on statistics. I had thought we had already implemented this tbh... :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org