You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/10/09 22:38:19 UTC

[GitHub] [arrow] andygrove commented on pull request #8409: ARROW-10240: [Rust] Optionally load data into memory before running benchmark query

andygrove commented on pull request #8409:
URL: https://github.com/apache/arrow/pull/8409#issuecomment-706429236


   The results are pretty interesting for me.
   
   Without `--mem-table`:
   
   ```
   Running benchmarks with the following options: TpchOpt { query: 1, debug: false, iterations: 3, concurrency: 24, batch_size: 4096, path: "/mnt/tpch/s1/parquet", file_format: "parquet", mem_table: false }
   Query 1 iteration 0 took 241 ms
   Query 1 iteration 1 took 164 ms
   Query 1 iteration 2 took 167 ms
   ```
   
   With `--mem-table`:
   
   ```
   Running benchmarks with the following options: TpchOpt { query: 1, debug: false, iterations: 3, concurrency: 24, batch_size: 4096, path: "/mnt/tpch/s1/parquet", file_format: "parquet", mem_table: true }
   Loading data into memory
   Loaded data into memory in 11240 ms
   Query 1 iteration 0 took 353 ms
   Query 1 iteration 1 took 302 ms
   Query 1 iteration 2 took 322 ms
   ```
   
   I filed https://issues.apache.org/jira/browse/ARROW-10251 to fix the single-threaded loading in MemTable but I'm not sure why the actual query time is slower for mem tables than for Parquet.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org