You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/07 03:45:32 UTC

[GitHub] [arrow-datafusion] artorias1024 opened a new issue, #2706: Questions about benchmarking performance

artorias1024 opened a new issue, #2706:
URL: https://github.com/apache/arrow-datafusion/issues/2706

   ###  Hardware Resources
   ## CPU and Memory
   24(vCPUs)96 GiB
   ##  Operating System
   CentOS 7.9 64-bit
   
   ### Data set size
   ```
   $ du -h
   100K	./nation
   5.3G	./lineitem
   266M	./customer
   1.3G	./orders
   866M	./partsupp
   19M	./supplier
   100K	./region
   139M	./part
   7.8G	.
   ```
   
   ###Performance Test Results
   ```
   cargo run --release --features "simd mimalloc" --bin tpch -- benchmark datafusion --iterations 10 --path ./tpch-parquet --format parquet --query 1 --batch-size 8192 --partitions 20
       Finished release [optimized] target(s) in 0.23s
        Running `/home/zhoupu/yukunpeng/arrow-ballista-master/target/release/tpch benchmark datafusion --iterations 10 --path ./tpch-parquet --format parquet --query 1 --batch-size 8192 --partitions 20`
   Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 1, debug: false, iterations: 10, partitions: 20, batch_size: 8192, path: "./tpch-parquet", file_format: "parquet", mem_table: false, output_path: None }
   Query 1 iteration 0 took 2630.6 ms and returned 4 rows
   Query 1 iteration 1 took 2581.2 ms and returned 4 rows
   Query 1 iteration 2 took 2593.9 ms and returned 4 rows
   Query 1 iteration 3 took 2585.9 ms and returned 4 rows
   Query 1 iteration 4 took 2559.2 ms and returned 4 rows
   Query 1 iteration 5 took 2570.1 ms and returned 4 rows
   Query 1 iteration 6 took 2554.7 ms and returned 4 rows
   Query 1 iteration 7 took 2567.0 ms and returned 4 rows
   Query 1 iteration 8 took 2592.3 ms and returned 4 rows
   Query 1 iteration 9 took 2653.5 ms and returned 4 rows
   Query 1 avg time: 2588.83 ms
   ```
   
   ### Questions 
   1. Do the results of this benchmark performance test meet expectations?
   2. What are some ways to optimize performance?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #2706: Questions about benchmarking performance

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #2706: Questions about benchmarking performance
URL: https://github.com/apache/arrow-datafusion/issues/2706


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2706: Questions about benchmarking performance

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2706:
URL: https://github.com/apache/arrow-datafusion/issues/2706#issuecomment-1153121960

   > Do the results of this benchmark performance test meet expectations?
   
   Hi @artorias1024  -- these results look like they are approximately inline with what I have seen
   
   > What are some ways to optimize performance?
   
   There are some thoughts here:
   
   https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/docs/source/user-guide/library.md#optimized-configuration
   
   Beyond that I think 'profile and then improve the bottlenecks' is what we have


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2706: Questions about benchmarking performance

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #2706:
URL: https://github.com/apache/arrow-datafusion/issues/2706#issuecomment-1458360376

   I think this question has been answered and has been inactive for some time, so closing it down


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org