You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Andy Grove (Jira)" <ji...@apache.org> on 2020/05/13 13:00:12 UTC

[jira] [Created] (ARROW-8782) [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set

Andy Grove created ARROW-8782:
---------------------------------

             Summary: [Rust] [DataFusion] Add benchmarks based on NYC Taxi data set
                 Key: ARROW-8782
                 URL: https://issues.apache.org/jira/browse/ARROW-8782
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Rust, Rust - DataFusion
            Reporter: Andy Grove
            Assignee: Andy Grove
             Fix For: 1.0.0


I plan on adding a new benchmarks folder beneatch the datafusion crate, containing benchmarks based on the NYC Taxi data set. The benchmark will be a CLI and will support running a number of different queries against CSV and Parquet.

The README will contain instructions for downloading the data set.

The benchmark will produce CSV files containing results.

These benchmarks will allow us to manually verify performance before major releases and on an ongoing basis as we make changes to Arrow/Parquet/DataFusion.

I will be basing this on existing benchmarks I recently built in Ballista [1] (I am the only contributor to these benchmarks so far).

A dockerfile will be provided, making it easy to restrict CPU and RAM when running these benchmarks.

[1] https://github.com/ballista-compute/ballista/tree/master/rust/benchmarks


 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)