You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "tustvold (via GitHub)" <gi...@apache.org> on 2023/04/13 17:04:55 UTC

[GitHub] [arrow-datafusion] tustvold commented on issue #5942: Poor reported performance of DataFusion against DuckDB and Hyper

tustvold commented on issue #5942:
URL: https://github.com/apache/arrow-datafusion/issues/5942#issuecomment-1507307503

   I wonder if running [parquet-layout](https://github.com/apache/arrow-rs/blob/master/parquet/src/bin/parquet-layout.rs) against the parquet file might prove insightful. 
   
   DataFusion is currently limited to row group level parallelism, and there certainly are parquet writers that write very large row groups which would cause issues for this - https://github.com/apache/arrow/issues/34280


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org