You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/24 09:54:23 UTC

[GitHub] [arrow-datafusion] tustvold commented on issue #1652: ARROW2: Performance benchmark

tustvold commented on issue #1652:
URL: https://github.com/apache/arrow-datafusion/issues/1652#issuecomment-1019913842


   Big :+1: to this, getting some concrete numbers would be really nice.
   
   FWIW some ideas for whoever picks this up that I at least would be very interested in:
   
   * Performance against current arrow-rs master, a number of non-trivial performance improvements have landed in the last month, with more currently in progress
   
   * Performance of floating point aggregates, I seem to remember all the TPCH queries testing such fail to run correctly, but I could be mistaken
   
   * Performance of dictionary arrays, there is a substantial amount of work completed and in-flight to improve this situation as it has historically been poor (and still is WIP)- https://github.com/apache/arrow-datafusion/issues/1610, https://github.com/apache/arrow-datafusion/issues/1474, https://github.com/apache/arrow-rs/pull/1180, etc...
   
   * Performance on parquet files with reasonable row group sizes, the OOM would suggest they aren't teeny but wanted to clarify - there is currently a limitation of arrow-rs's parquet writer that makes it produce impractically small row groups - https://github.com/apache/arrow-rs/issues/1211
   
   * Performance of specific operators, e.g. FilterExec or SortPreservingMerge or ParquetExec, I'd basically be interested in where the performance gains are, and where we might gain or lose performance a potential switch. Perhaps some execution plan metrics, or a perf dump or something :thinking:
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org