You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/13 15:09:29 UTC

[GitHub] [arrow] alamb commented on a change in pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

alamb commented on a change in pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#discussion_r470023745



##########
File path: rust/benchmarks/README.md
##########
@@ -19,12 +19,27 @@
 
 # Apache Arrow Rust Benchmarks
 
-This crate contains benchmarks based on the [New York Taxi and Limousine Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) data set.
+This crate contains benchmarks based on popular public data sets and open source benchmark suites, making it easy to
+run real-world benchmarks to help with performance and scalability testing and for comparing performance with other Arrow
+implementations as well as other query engines.
 
-Currently, only DataFusion benchmarks exist, but the plan is to add benchmarks for the arrow, flight, and parquet crates as well.
+Currently, only DataFusion benchmarks exist, but the plan is to add benchmarks for the arrow, flight, and parquet 
+crates as well. 
+
+## Benchmark derived from TPC-H
+
+These benchmarks are derived from the [TPC-H](http://www.tpc.org/tpch/) benchmark.
+
+```bash
+cargo run --release --bin tpch -- --iterations 3 --path /mnt/tpch/csv --format csv --query 1 --batch-size 4096

Review comment:
       I may have missed it, but if you could include instructions / links to instructions on how to actually create the TPCH data that would be super helpful




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org