You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "rtpsw (via GitHub)" <gi...@apache.org> on 2023/06/17 12:02:39 UTC
[GitHub] [arrow] rtpsw opened a new pull request, #36146: GH-36144: [C++] Improve future as-of-join algorithmic complexity
rtpsw opened a new pull request, #36146:
URL: https://github.com/apache/arrow/pull/36146
### What changes are included in this PR?
The `MemoStore` algorithms are improved to be linear-time using updated data-structures. Also, the benchmark is enhanced to cover various tolerance values.
### Are these changes tested?
Yes, by existing tests. In addition, the benchmark shows improved running time.
### Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rtpsw commented on pull request #36146: GH-36144: [C++] Improve future as-of-join algorithmic complexity
Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #36146:
URL: https://github.com/apache/arrow/pull/36146#issuecomment-1596099853
<details>
<summary>Baseline results:</summary>
```
Running ./release/arrow-acero-asof-join-benchmark
Run on (8 X 2304 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 16384 KiB (x8)
Load Average: 0.07, 0.21, 1.47
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AsOfJoinOverhead/left_freq:200/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:200/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 211922079 ns 660678 ns 3 bytes_per_second=130.671M/s input_rows_per_second=759.713k/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 114079272 ns 460190 ns 6 bytes_per_second=122.126M/s input_rows_per_second=710.033k/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:1000/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:1000/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 48980588 ns 271786 ns 12 bytes_per_second=115.883M/s input_rows_per_second=673.736k/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:10/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:10/right_ids:500/right_batch_size:4000/tolerance:0/real_time 96047373 ns 337427 ns 6 bytes_per_second=77.5867M/s input_rows_per_second=843.334k/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 110910426 ns 455473 ns 5 bytes_per_second=125.615M/s input_rows_per_second=730.319k/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:100/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:100/right_ids:500/right_batch_size:4000/tolerance:0/real_time 230659601 ns 1459631 ns 3 bytes_per_second=285.147M/s input_rows_per_second=351.167k/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:100/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:100/right_batch_size:4000/tolerance:0/real_time 8913949 ns 199234 ns 67 bytes_per_second=312.589M/s input_rows_per_second=1.81738M/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 116265624 ns 477030 ns 6 bytes_per_second=119.829M/s input_rows_per_second=696.681k/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:1000/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:1000/right_batch_size:4000/tolerance:0/real_time 358051162 ns 745842 ns 2 bytes_per_second=77.8213M/s input_rows_per_second=452.449k/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 108821232 ns 559738 ns 6 bytes_per_second=128.026M/s input_rows_per_second=744.34k/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:10/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 1064261524 ns 2046008 ns 1 bytes_per_second=71.9992M/s input_rows_per_second=418.6k/s maximum_peak_memory=83.9027M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:50/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 5091578748 ns 10242650 ns 1 bytes_per_second=69.7752M/s input_rows_per_second=405.67k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:1000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:1000/tolerance:0/real_time 116616300 ns 1283586 ns 6 bytes_per_second=119.469M/s input_rows_per_second=694.586k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 104528444 ns 547224 ns 6 bytes_per_second=133.284M/s input_rows_per_second=774.909k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:32000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:32000/tolerance:0/real_time 153004093 ns 249713 ns 4 bytes_per_second=91.0564M/s input_rows_per_second=529.398k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:-1500/real_time 115176504 ns 469413 ns 6 bytes_per_second=120.962M/s input_rows_per_second=703.268k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:-500/real_time 114855370 ns 555388 ns 6 bytes_per_second=121.3M/s input_rows_per_second=705.235k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 117198144 ns 470146 ns 6 bytes_per_second=118.876M/s input_rows_per_second=691.137k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:500/real_time 109962907 ns 487912 ns 7 bytes_per_second=126.697M/s input_rows_per_second=736.612k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:1500/real_time 117599868 ns 501521 ns 6 bytes_per_second=118.47M/s input_rows_per_second=688.776k/s maximum_peak_memory=388.827M
```
</details>
<details>
<summary>PR results</summary>
```
Running ./release/arrow-acero-asof-join-benchmark
Run on (8 X 2304 MHz CPU s)
CPU Caches:
L1 Data 32 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 256 KiB (x8)
L3 Unified 16384 KiB (x8)
Load Average: 1.95, 6.53, 5.09
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations UserCounters...
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
AsOfJoinOverhead/left_freq:200/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:200/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 76385205 ns 691089 ns 7 bytes_per_second=362.531M/s input_rows_per_second=2.10774M/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 40956330 ns 460700 ns 18 bytes_per_second=340.167M/s input_rows_per_second=1.97772M/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:1000/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:1000/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 18057196 ns 322803 ns 38 bytes_per_second=314.335M/s input_rows_per_second=1.82753M/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:10/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:10/right_ids:500/right_batch_size:4000/tolerance:0/real_time 26092684 ns 344036 ns 25 bytes_per_second=285.597M/s input_rows_per_second=3.10432M/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 39741214 ns 472783 ns 18 bytes_per_second=350.568M/s input_rows_per_second=2.03819M/s maximum_peak_memory=29.0547M
AsOfJoinOverhead/left_freq:400/left_cols:100/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:100/right_ids:500/right_batch_size:4000/tolerance:0/real_time 150683536 ns 1454621 ns 5 bytes_per_second=436.491M/s input_rows_per_second=537.55k/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:100/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:100/right_batch_size:4000/tolerance:0/real_time 8025929 ns 208403 ns 78 bytes_per_second=347.175M/s input_rows_per_second=2.01846M/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 39462927 ns 454937 ns 18 bytes_per_second=353.04M/s input_rows_per_second=2.05256M/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:1000/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:1000/right_batch_size:4000/tolerance:0/real_time 76068165 ns 781966 ns 8 bytes_per_second=366.303M/s input_rows_per_second=2.12967M/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 38436892 ns 483746 ns 18 bytes_per_second=362.464M/s input_rows_per_second=2.10735M/s maximum_peak_memory=72.2598M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:10/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 235748713 ns 2376327 ns 3 bytes_per_second=325.033M/s input_rows_per_second=1.88972M/s maximum_peak_memory=83.9027M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:50/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 1183713083 ns 10726631 ns 1 bytes_per_second=300.128M/s input_rows_per_second=1.74493M/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:1000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:1000/tolerance:0/real_time 41843732 ns 1371985 ns 17 bytes_per_second=332.953M/s input_rows_per_second=1.93577M/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 38562434 ns 458523 ns 18 bytes_per_second=361.284M/s input_rows_per_second=2.10049M/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:32000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:32000/tolerance:0/real_time 76335078 ns 240058 ns 8 bytes_per_second=182.511M/s input_rows_per_second=1061.11k/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:-1500/real_time 38884782 ns 461708 ns 18 bytes_per_second=358.289M/s input_rows_per_second=2.08308M/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:-500/real_time 38414866 ns 497694 ns 18 bytes_per_second=362.672M/s input_rows_per_second=2.10856M/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:0/real_time 39200118 ns 473774 ns 18 bytes_per_second=355.407M/s input_rows_per_second=2.06632M/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:500/real_time 39321300 ns 492033 ns 15 bytes_per_second=354.312M/s input_rows_per_second=2.05995M/s maximum_peak_memory=388.827M
AsOfJoinOverhead/left_freq:400/left_cols:20/left_ids:500/left_batch_size:4000/num_right_tables:1/right_freq:400/right_cols:20/right_ids:500/right_batch_size:4000/tolerance:1500/real_time 39055427 ns 509834 ns 18 bytes_per_second=356.724M/s input_rows_per_second=2.07398M/s maximum_peak_memory=388.827M
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] icexelloss commented on pull request #36146: GH-36144: [C++] Improve future as-of-join algorithmic complexity
Posted by "icexelloss (via GitHub)" <gi...@apache.org>.
icexelloss commented on PR #36146:
URL: https://github.com/apache/arrow/pull/36146#issuecomment-1613443042
Agree with @pitrou. (We are actually doing some exploring/work internally)
@rtpsw Let us close this for now and open PR when it's ready.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rtpsw commented on pull request #36146: GH-36144: [C++] Improve future as-of-join algorithmic complexity
Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #36146:
URL: https://github.com/apache/arrow/pull/36146#issuecomment-1595724081
cc @westonpace, @icexelloss
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] pitrou commented on pull request #36146: GH-36144: [C++] Improve future as-of-join algorithmic complexity
Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on PR #36146:
URL: https://github.com/apache/arrow/pull/36146#issuecomment-1613438367
While the perf improvements look impressive, can we prioritize making the tests more reliable?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] rtpsw commented on pull request #36146: GH-36144: [C++] Improve future as-of-join algorithmic complexity
Posted by "rtpsw (via GitHub)" <gi...@apache.org>.
rtpsw commented on PR #36146:
URL: https://github.com/apache/arrow/pull/36146#issuecomment-1613471759
I converted to a draft until it's picked up.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org