You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/27 13:58:25 UTC

[GitHub] [arrow] Dandandan opened a new pull request #9021: ARROW-11042: [Rust][DataFusion] Increase default batch size

Dandandan opened a new pull request #9021:
URL: https://github.com/apache/arrow/pull/9021


   This increases the default batch size 8x from `4096` to `32768` as it improves performance of quite some operations.
   I just increased the size until performance didn't increase on my machine. Note that CSV reading also is faster on bigger batches.
   
   This PR
   ```
   Loading table 'part' into memory
   Loaded table 'part' into memory in 125 ms
   Loading table 'supplier' into memory
   Loaded table 'supplier' into memory in 10 ms
   Loading table 'partsupp' into memory
   Loaded table 'partsupp' into memory in 381 ms
   Loading table 'customer' into memory
   Loaded table 'customer' into memory in 126 ms
   Loading table 'orders' into memory
   Loaded table 'orders' into memory in 961 ms
   Loading table 'lineitem' into memory
   Loaded table 'lineitem' into memory in 6382 ms
   Loading table 'nation' into memory
   Loaded table 'nation' into memory in 2 ms
   Loading table 'region' into memory
   Loaded table 'region' into memory in 2 ms
   Query 12 iteration 0 took 220.2 ms
   Query 12 iteration 1 took 223.2 ms
   Query 12 iteration 2 took 222.4 ms
   Query 12 iteration 3 took 222.2 ms
   Query 12 iteration 4 took 221.8 ms
   Query 12 iteration 5 took 222.0 ms
   Query 12 iteration 6 took 223.1 ms
   Query 12 iteration 7 took 223.7 ms
   Query 12 iteration 8 took 222.5 ms
   Query 12 iteration 9 took 222.9 ms
   Query 12 avg time: 222.40 ms
   ```
   
   Master
   ```
   Loading table 'part' into memory
   Loaded table 'part' into memory in 116 ms
   Loading table 'supplier' into memory
   Loaded table 'supplier' into memory in 7 ms
   Loading table 'partsupp' into memory
   Loaded table 'partsupp' into memory in 386 ms
   Loading table 'customer' into memory
   Loaded table 'customer' into memory in 115 ms
   Loading table 'orders' into memory
   Loaded table 'orders' into memory in 1048 ms
   Loading table 'lineitem' into memory
   Loaded table 'lineitem' into memory in 7673 ms
   Loading table 'nation' into memory
   Loaded table 'nation' into memory in 0 ms
   Loading table 'region' into memory
   Loaded table 'region' into memory in 0 ms
   Query 12 iteration 0 took 596.1 ms
   Query 12 iteration 1 took 602.0 ms
   Query 12 iteration 2 took 608.1 ms
   Query 12 iteration 3 took 607.9 ms
   Query 12 iteration 4 took 613.5 ms
   Query 12 iteration 5 took 615.3 ms
   Query 12 iteration 6 took 611.6 ms
   Query 12 iteration 7 took 609.8 ms
   Query 12 iteration 8 took 615.7 ms
   Query 12 iteration 9 took 616.9 ms
   Query 12 avg time: 609.68 ms
   ```
   
   Query 1 also improves a bit (but smaller improvement)
   
   Master:
   
   ```
   Query 1 iteration 0 took 708.8 ms
   Query 1 iteration 1 took 714.5 ms
   Query 1 iteration 2 took 700.4 ms
   Query 1 iteration 3 took 713.7 ms
   Query 1 iteration 4 took 707.5 ms
   Query 1 iteration 5 took 727.8 ms
   Query 1 iteration 6 took 727.9 ms
   Query 1 iteration 7 took 721.3 ms
   Query 1 iteration 8 took 717.3 ms
   Query 1 iteration 9 took 729.4 ms
   Query 1 avg time: 716.85 ms
   ```
   
   PR.
   ```
   Query 1 iteration 0 took 653.0 ms
   Query 1 iteration 1 took 653.4 ms
   Query 1 iteration 2 took 652.3 ms
   Query 1 iteration 3 took 658.9 ms
   Query 1 iteration 4 took 655.1 ms
   Query 1 iteration 5 took 662.0 ms
   Query 1 iteration 6 took 659.7 ms
   Query 1 iteration 7 took 662.7 ms
   Query 1 iteration 8 took 669.0 ms
   Query 1 iteration 9 took 665.7 ms
   Query 1 avg time: 659.19 ms
   ```
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] andygrove closed pull request #9021: ARROW-11042: [Rust][DataFusion] Increase default batch size

Posted by GitBox <gi...@apache.org>.
andygrove closed pull request #9021:
URL: https://github.com/apache/arrow/pull/9021


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9021: ARROW-11042: [Rust][DataFusion] Increase default batch size

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9021:
URL: https://github.com/apache/arrow/pull/9021#issuecomment-751471914


   https://issues.apache.org/jira/browse/ARROW-11042


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org