You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/08/12 21:32:38 UTC

[GitHub] [arrow] andygrove opened a new pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H [DRAFT]

andygrove opened a new pull request #7946:
URL: https://github.com/apache/arrow/pull/7946


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on a change in pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#discussion_r470023745



##########
File path: rust/benchmarks/README.md
##########
@@ -19,12 +19,27 @@
 
 # Apache Arrow Rust Benchmarks
 
-This crate contains benchmarks based on the [New York Taxi and Limousine Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) data set.
+This crate contains benchmarks based on popular public data sets and open source benchmark suites, making it easy to
+run real-world benchmarks to help with performance and scalability testing and for comparing performance with other Arrow
+implementations as well as other query engines.
 
-Currently, only DataFusion benchmarks exist, but the plan is to add benchmarks for the arrow, flight, and parquet crates as well.
+Currently, only DataFusion benchmarks exist, but the plan is to add benchmarks for the arrow, flight, and parquet 
+crates as well. 
+
+## Benchmark derived from TPC-H
+
+These benchmarks are derived from the [TPC-H](http://www.tpc.org/tpch/) benchmark.
+
+```bash
+cargo run --release --bin tpch -- --iterations 3 --path /mnt/tpch/csv --format csv --query 1 --batch-size 4096

Review comment:
       I may have missed it, but if you could include instructions / links to instructions on how to actually create the TPCH data that would be super helpful




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H [DRAFT]

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#issuecomment-673126940


   https://issues.apache.org/jira/browse/ARROW-9711


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H [DRAFT]

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#issuecomment-673126018


   @jorgecarleitao @alamb fyi


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on a change in pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

Posted by GitBox <gi...@apache.org>.
andygrove commented on a change in pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#discussion_r470312676



##########
File path: rust/benchmarks/README.md
##########
@@ -19,12 +19,27 @@
 
 # Apache Arrow Rust Benchmarks
 
-This crate contains benchmarks based on the [New York Taxi and Limousine Commission](https://www1.nyc.gov/site/tlc/about/tlc-trip-record-data.page) data set.
+This crate contains benchmarks based on popular public data sets and open source benchmark suites, making it easy to
+run real-world benchmarks to help with performance and scalability testing and for comparing performance with other Arrow
+implementations as well as other query engines.
 
-Currently, only DataFusion benchmarks exist, but the plan is to add benchmarks for the arrow, flight, and parquet crates as well.
+Currently, only DataFusion benchmarks exist, but the plan is to add benchmarks for the arrow, flight, and parquet 
+crates as well. 
+
+## Benchmark derived from TPC-H
+
+These benchmarks are derived from the [TPC-H](http://www.tpc.org/tpch/) benchmark.
+
+```bash
+cargo run --release --bin tpch -- --iterations 3 --path /mnt/tpch/csv --format csv --query 1 --batch-size 4096

Review comment:
       Done.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove closed pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #7946:
URL: https://github.com/apache/arrow/pull/7946


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#issuecomment-673156833


   @wesm I believe that we can use TPC tests under their fair use policy [1] but we need to be careful to refer to them as "derived from" TPC since they are not official TPC tests.
   
   [1] http://www.tpc.org/tpc_documents_current_versions/pdf/tpc_fair_use_quick_reference_v1.0.0.pdf


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#issuecomment-673157623


   @jorgecarleitao @alamb fyi


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#issuecomment-674191354


   @jorgecarleitao I can document it more .. it is actually pretty simple, but a good topic for a blog post perhaps.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove commented on pull request #7946: ARROW-9711: [Rust] Add new benchmark derived from TPC-H

Posted by GitBox <gi...@apache.org>.
andygrove commented on pull request #7946:
URL: https://github.com/apache/arrow/pull/7946#issuecomment-674162568


   @alamb @jorgecarleitao This is ready for review and is my last planned PR for the moment.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org