You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/21 21:32:06 UTC
[GitHub] [arrow-rs] tustvold commented on pull request #2123: Faster parquet DictEncoder
tustvold commented on PR #2123:
URL: https://github.com/apache/arrow-rs/pull/2123#issuecomment-1191955893
Running benchmarks with just the change to ahash show no significant performance change. This is not entirely surprising as the current implementation uses crc32 which is very cheap to compute (although not DOS resistant).
The change to hashbrown nets a non-trivial return where value encoding is the major bottleneck.
```
write_batch primitive/4096 values primitive
time: [1.5325 ms 1.5331 ms 1.5338 ms]
thrpt: [115.02 MiB/s 115.07 MiB/s 115.12 MiB/s]
change:
time: [-20.677% -20.632% -20.590%] (p = 0.00 < 0.05)
thrpt: [+25.929% +25.995% +26.068%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
Benchmarking write_batch primitive/4096 values primitive non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 7.5s, enable flat sampling, or reduce sample count to 50.
write_batch primitive/4096 values primitive non-null
time: [1.4838 ms 1.4847 ms 1.4857 ms]
thrpt: [116.44 MiB/s 116.52 MiB/s 116.59 MiB/s]
change:
time: [-12.080% -12.017% -11.954%] (p = 0.00 < 0.05)
thrpt: [+13.577% +13.659% +13.739%]
Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
2 (2.00%) high mild
2 (2.00%) high severe
write_batch primitive/4096 values bool
time: [111.01 us 111.09 us 111.19 us]
thrpt: [10.224 MiB/s 10.233 MiB/s 10.240 MiB/s]
change:
time: [-0.8794% -0.6831% -0.4488%] (p = 0.00 < 0.05)
thrpt: [+0.4508% +0.6878% +0.8872%]
Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
write_batch primitive/4096 values bool non-null
time: [52.931 us 53.012 us 53.094 us]
thrpt: [21.411 MiB/s 21.444 MiB/s 21.477 MiB/s]
change:
time: [-2.2177% -2.1085% -1.9913%] (p = 0.00 < 0.05)
thrpt: [+2.0318% +2.1539% +2.2680%]
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
5 (5.00%) high mild
10 (10.00%) high severe
write_batch primitive/4096 values string
time: [891.20 us 891.52 us 891.88 us]
thrpt: [89.239 MiB/s 89.275 MiB/s 89.306 MiB/s]
change:
time: [-8.4838% -8.4391% -8.3955%] (p = 0.00 < 0.05)
thrpt: [+9.1650% +9.2170% +9.2703%]
Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
3 (3.00%) high mild
Benchmarking write_batch primitive/4096 values string non-null: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 60.
write_batch primitive/4096 values string non-null
time: [1.0208 ms 1.0213 ms 1.0218 ms]
thrpt: [77.889 MiB/s 77.931 MiB/s 77.970 MiB/s]
change:
time: [+0.0730% +0.1746% +0.2545%] (p = 0.00 < 0.05)
thrpt: [-0.2538% -0.1743% -0.0730%]
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking write_batch nested/4096 values primitive list: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.8s, enable flat sampling, or reduce sample count to 50.
write_batch nested/4096 values primitive list
time: [1.9798 ms 2.0064 ms 2.0368 ms]
thrpt: [80.409 MiB/s 81.627 MiB/s 82.725 MiB/s]
change:
time: [+0.9435% +1.8832% +3.0013%] (p = 0.00 < 0.05)
thrpt: [-2.9139% -1.8484% -0.9347%]
Change within noise threshold.
Found 19 outliers among 100 measurements (19.00%)
1 (1.00%) high mild
18 (18.00%) high severe
write_batch nested/4096 values primitive list non-null
time: [2.4385 ms 2.4696 ms 2.5038 ms]
thrpt: [76.896 MiB/s 77.959 MiB/s 78.952 MiB/s]
change:
time: [-0.1096% +1.1302% +2.5102%] (p = 0.10 > 0.05)
thrpt: [-2.4488% -1.1176% +0.1097%]
No change in performance detected.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org