You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/07 03:45:32 UTC
[GitHub] [arrow-datafusion] artorias1024 opened a new issue, #2706: Questions about benchmarking performance
artorias1024 opened a new issue, #2706:
URL: https://github.com/apache/arrow-datafusion/issues/2706
### Hardware Resources
## CPU and Memory
24(vCPUs)96 GiB
## Operating System
CentOS 7.9 64-bit
### Data set size
```
$ du -h
100K ./nation
5.3G ./lineitem
266M ./customer
1.3G ./orders
866M ./partsupp
19M ./supplier
100K ./region
139M ./part
7.8G .
```
###Performance Test Results
```
cargo run --release --features "simd mimalloc" --bin tpch -- benchmark datafusion --iterations 10 --path ./tpch-parquet --format parquet --query 1 --batch-size 8192 --partitions 20
Finished release [optimized] target(s) in 0.23s
Running `/home/zhoupu/yukunpeng/arrow-ballista-master/target/release/tpch benchmark datafusion --iterations 10 --path ./tpch-parquet --format parquet --query 1 --batch-size 8192 --partitions 20`
Running benchmarks with the following options: DataFusionBenchmarkOpt { query: 1, debug: false, iterations: 10, partitions: 20, batch_size: 8192, path: "./tpch-parquet", file_format: "parquet", mem_table: false, output_path: None }
Query 1 iteration 0 took 2630.6 ms and returned 4 rows
Query 1 iteration 1 took 2581.2 ms and returned 4 rows
Query 1 iteration 2 took 2593.9 ms and returned 4 rows
Query 1 iteration 3 took 2585.9 ms and returned 4 rows
Query 1 iteration 4 took 2559.2 ms and returned 4 rows
Query 1 iteration 5 took 2570.1 ms and returned 4 rows
Query 1 iteration 6 took 2554.7 ms and returned 4 rows
Query 1 iteration 7 took 2567.0 ms and returned 4 rows
Query 1 iteration 8 took 2592.3 ms and returned 4 rows
Query 1 iteration 9 took 2653.5 ms and returned 4 rows
Query 1 avg time: 2588.83 ms
```
### Questions
1. Do the results of this benchmark performance test meet expectations?
2. What are some ways to optimize performance?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb closed issue #2706: Questions about benchmarking performance
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #2706: Questions about benchmarking performance
URL: https://github.com/apache/arrow-datafusion/issues/2706
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2706: Questions about benchmarking performance
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2706:
URL: https://github.com/apache/arrow-datafusion/issues/2706#issuecomment-1153121960
> Do the results of this benchmark performance test meet expectations?
Hi @artorias1024 -- these results look like they are approximately inline with what I have seen
> What are some ways to optimize performance?
There are some thoughts here:
https://github.com/apache/arrow-datafusion/blob/b9f6e6b7c353c1109bd7b306008e006db29b46f8/docs/source/user-guide/library.md#optimized-configuration
Beyond that I think 'profile and then improve the bottlenecks' is what we have
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2706: Questions about benchmarking performance
Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #2706:
URL: https://github.com/apache/arrow-datafusion/issues/2706#issuecomment-1458360376
I think this question has been answered and has been inactive for some time, so closing it down
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org