You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/01/24 01:10:16 UTC

[GitHub] [arrow-datafusion] houqp commented on issue #1652: ARROW2: Performance benchmark

houqp commented on issue #1652:
URL: https://github.com/apache/arrow-datafusion/issues/1652#issuecomment-1019622028


   Here are some of the TPCH results I got from running our tpch benchmark suite on an 8 cores x86-64 Linux machine.
   
   The base commit from I used as baseline for arrow-rs is 2008b1dc06d5030f572634c7f8f2ba48562fa636. The commit for arrow2 is c0c9c7231f9c5685fda5fc9294fdc1711384b6fb.
   
   Default single partition CSV files generated from our [tpch gen script](https://github.com/apache/arrow-datafusion/blob/6ec18bb4a53f684efd8d97443c55035eb37bda14/benchmarks/entrypoint.sh#L21) (--batch-size 4096):
   
   ![image](https://user-images.githubusercontent.com/670302/150705996-a61ab73e-6be6-4734-917d-7423b4df7f32.png)
   
   CSV tables partitioned into 16 files and processed with 8 datafusion partitions (--batch-size 4096 --partitions 8):
   
   ![image](https://user-images.githubusercontent.com/670302/150706108-710bdb9c-cf48-478c-8851-40b2ea688af6.png)
   
   Parquet tables partitioned in 8 files and processed with 8 datafusion partitions (--batch-size 4096 --partitions 8):
   
   ![image](https://user-images.githubusercontent.com/670302/150706222-381ea1ef-9061-41e1-aa2b-5d2912cdbe22.png)
   
   Note query 7 not able to complete with arrow2 due to OOM. Arrow2 parquet reader currently consumes almost double the memory is a known issue.
   
   Q1 is significantly slower in arrow2 compared to the other queies.
   
   Both of these two regressions require deep dive.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org