You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/09 22:38:19 UTC
[GitHub] [arrow] andygrove commented on pull request #8409: ARROW-10240: [Rust] Optionally load data into memory before running benchmark query
andygrove commented on pull request #8409:
URL: https://github.com/apache/arrow/pull/8409#issuecomment-706429236
The results are pretty interesting for me.
Without `--mem-table`:
```
Running benchmarks with the following options: TpchOpt { query: 1, debug: false, iterations: 3, concurrency: 24, batch_size: 4096, path: "/mnt/tpch/s1/parquet", file_format: "parquet", mem_table: false }
Query 1 iteration 0 took 241 ms
Query 1 iteration 1 took 164 ms
Query 1 iteration 2 took 167 ms
```
With `--mem-table`:
```
Running benchmarks with the following options: TpchOpt { query: 1, debug: false, iterations: 3, concurrency: 24, batch_size: 4096, path: "/mnt/tpch/s1/parquet", file_format: "parquet", mem_table: true }
Loading data into memory
Loaded data into memory in 11240 ms
Query 1 iteration 0 took 353 ms
Query 1 iteration 1 took 302 ms
Query 1 iteration 2 took 322 ms
```
I filed https://issues.apache.org/jira/browse/ARROW-10251 to fix the single-threaded loading in MemTable but I'm not sure why the actual query time is slower for mem tables than for Parquet.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org