You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/22 23:37:10 UTC
[GitHub] [arrow] wesm edited a comment on pull request #7521: ARROW-9210: [C++] Use BitBlockCounter in array/visitor_inline.h
wesm edited a comment on pull request #7521:
URL: https://github.com/apache/arrow/pull/7521#issuecomment-647821487
Here's a benchmark run with gcc-8
```
---------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------
BuildDictionary 3219443 ns 3219440 ns 218 1.21215GB/s
BuildStringDictionary 3692881 ns 3692881 ns 192 81.7532MB/s
UniqueInt64/0 14413456 ns 14413251 ns 48 null_percent=0 2.16814GB/s
UniqueInt64/1 15516052 ns 15515737 ns 45 null_percent=0.1 2.01408GB/s
UniqueInt64/2 17031282 ns 17031266 ns 41 null_percent=1 1.83486GB/s
UniqueInt64/3 20680114 ns 20680064 ns 34 null_percent=10 1.51112GB/s
UniqueInt64/4 12018069 ns 12017844 ns 57 null_percent=99 2.6003GB/s
UniqueInt64/5 9179953 ns 9179946 ns 77 null_percent=100 3.40416GB/s
UniqueInt64/6 15501523 ns 15501496 ns 45 null_percent=0 2.01593GB/s
UniqueInt64/7 16482935 ns 16482300 ns 41 null_percent=0.1 1.89597GB/s
UniqueInt64/8 18349988 ns 18349317 ns 38 null_percent=1 1.70306GB/s
UniqueInt64/9 21439268 ns 21439244 ns 32 null_percent=10 1.45761GB/s
UniqueInt64/10 12530067 ns 12529871 ns 55 null_percent=99 2.49404GB/s
UniqueInt64/11 9167314 ns 9167365 ns 75 null_percent=100 3.40883GB/s
UniqueString10bytes/0 43535899 ns 43535846 ns 16 null_percent=0 918.783MB/s
UniqueString10bytes/1 45130595 ns 45129634 ns 16 null_percent=0.1 886.336MB/s
UniqueString10bytes/2 45249034 ns 45247983 ns 15 null_percent=1 884.017MB/s
UniqueString10bytes/3 45101533 ns 45100209 ns 16 null_percent=10 886.914MB/s
UniqueString10bytes/4 4316048 ns 4316019 ns 163 null_percent=99 9.05059GB/s
UniqueString10bytes/5 1435781 ns 1435763 ns 485 null_percent=100 27.2068GB/s
UniqueString10bytes/6 59100344 ns 59098817 ns 12 null_percent=0 676.832MB/s
UniqueString10bytes/7 59797544 ns 59795857 ns 12 null_percent=0.1 668.943MB/s
UniqueString10bytes/8 61024697 ns 61023090 ns 11 null_percent=1 655.49MB/s
UniqueString10bytes/9 59817211 ns 59816339 ns 12 null_percent=10 668.714MB/s
UniqueString10bytes/10 4950387 ns 4950242 ns 134 null_percent=99 7.89103GB/s
UniqueString10bytes/11 1443482 ns 1443434 ns 446 null_percent=100 27.0622GB/s
UniqueString100bytes/0 95609006 ns 95606132 ns 7 null_percent=0 4.08577GB/s
UniqueString100bytes/1 96850582 ns 96849441 ns 7 null_percent=0.1 4.03332GB/s
UniqueString100bytes/2 95404742 ns 95404634 ns 7 null_percent=1 4.0944GB/s
UniqueString100bytes/3 89401775 ns 89401006 ns 8 null_percent=10 4.36936GB/s
UniqueString100bytes/4 4705868 ns 4705746 ns 148 null_percent=99 83.0102GB/s
UniqueString100bytes/5 1434077 ns 1434055 ns 486 null_percent=100 272.392GB/s
UniqueString100bytes/6 206155133 ns 206148425 ns 3 null_percent=0 1.89487GB/s
UniqueString100bytes/7 204661287 ns 204653659 ns 3 null_percent=0.1 1.90871GB/s
UniqueString100bytes/8 205941884 ns 205941271 ns 3 null_percent=1 1.89678GB/s
UniqueString100bytes/9 192074501 ns 192073431 ns 4 null_percent=10 2.03373GB/s
UniqueString100bytes/10 6180349 ns 6180227 ns 111 null_percent=99 63.2056GB/s
UniqueString100bytes/11 1474565 ns 1474564 ns 482 null_percent=100 264.909GB/s
UniqueUInt8/0 1990025 ns 1990023 ns 348 null_percent=0 1.96292GB/s
UniqueUInt8/1 2594146 ns 2594089 ns 272 null_percent=0.1 1.50583GB/s
UniqueUInt8/2 4726027 ns 4726053 ns 145 null_percent=1 846.372MB/s
UniqueUInt8/3 9465222 ns 9465126 ns 75 null_percent=10 422.604MB/s
UniqueUInt8/4 3557141 ns 3557135 ns 195 null_percent=99 1124.5MB/s
UniqueUInt8/5 2259664 ns 2259664 ns 314 null_percent=100 1.72869GB/s
```
Here is the % diff versus the baseline.
* Cases 1 and 7 are the mostly-not-null cases. This shows a 15-20% perf improvement
* Cases 5 and 11 are the all-null cases.
* Case 4 and 10 are the 99% null cases
* The "BuildDictionary" case at the bottom with the perf regression is one of the "worst case scenarios". 89% of the values are null and so we almost never observe an all-null or all-not-null block. The use of `BitUtil::GetBit` over BitmapReader causes this slightly regression since nearly every validity bit must be checked separately. I don't think it's worth optimizing for this case since the others are more empirically representative of real world data
```
benchmark baseline contender change % regression
8 UniqueString100bytes/5 40.668 GiB/sec 272.392 GiB/sec 569.787 False
37 UniqueString10bytes/5 4.064 GiB/sec 27.207 GiB/sec 569.456 False
33 UniqueString10bytes/11 4.065 GiB/sec 27.062 GiB/sec 565.751 False
12 UniqueString100bytes/11 40.578 GiB/sec 264.909 GiB/sec 552.841 False
0 UniqueString10bytes/4 3.568 GiB/sec 9.051 GiB/sec 153.692 False
36 UniqueString100bytes/4 34.408 GiB/sec 83.010 GiB/sec 141.252 False
19 UniqueString10bytes/10 3.375 GiB/sec 7.891 GiB/sec 133.794 False
24 UniqueUInt8/1 677.981 MiB/sec 1.506 GiB/sec 127.435 False
5 UniqueString100bytes/10 30.775 GiB/sec 63.206 GiB/sec 105.381 False
27 UniqueUInt8/5 1000.163 MiB/sec 1.729 GiB/sec 76.989 False
13 UniqueUInt8/2 650.819 MiB/sec 846.372 MiB/sec 30.047 False
29 UniqueInt64/11 2.703 GiB/sec 3.409 GiB/sec 26.126 False
7 UniqueInt64/5 2.704 GiB/sec 3.404 GiB/sec 25.903 False
18 UniqueUInt8/4 932.926 MiB/sec 1.098 GiB/sec 20.535 False
23 UniqueInt64/1 1.681 GiB/sec 2.014 GiB/sec 19.840 False
21 UniqueInt64/7 1.628 GiB/sec 1.896 GiB/sec 16.476 False
31 UniqueInt64/2 1.658 GiB/sec 1.835 GiB/sec 10.651 False
20 UniqueString10bytes/7 612.647 MiB/sec 668.943 MiB/sec 9.189 False
16 UniqueInt64/3 1.386 GiB/sec 1.511 GiB/sec 9.053 False
38 UniqueString10bytes/8 601.259 MiB/sec 655.490 MiB/sec 9.019 False
1 UniqueUInt8/0 1.808 GiB/sec 1.963 GiB/sec 8.588 False
41 UniqueInt64/9 1.355 GiB/sec 1.458 GiB/sec 7.562 False
14 UniqueString10bytes/1 830.614 MiB/sec 886.336 MiB/sec 6.709 False
4 UniqueInt64/8 1.603 GiB/sec 1.703 GiB/sec 6.260 False
32 UniqueString10bytes/2 847.018 MiB/sec 884.017 MiB/sec 4.368 False
42 UniqueInt64/4 2.508 GiB/sec 2.600 GiB/sec 3.701 False
39 UniqueString10bytes/3 855.985 MiB/sec 886.914 MiB/sec 3.613 False
28 UniqueInt64/10 2.413 GiB/sec 2.494 GiB/sec 3.360 False
34 UniqueString100bytes/3 4.254 GiB/sec 4.369 GiB/sec 2.722 False
11 UniqueString100bytes/2 3.993 GiB/sec 4.094 GiB/sec 2.544 False
9 UniqueString10bytes/9 654.257 MiB/sec 668.714 MiB/sec 2.210 False
35 UniqueString10bytes/6 662.915 MiB/sec 676.832 MiB/sec 2.099 False
6 BuildStringDictionary 80.971 MiB/sec 81.753 MiB/sec 0.966 False
22 UniqueString100bytes/1 4.002 GiB/sec 4.033 GiB/sec 0.783 False
25 UniqueInt64/0 2.153 GiB/sec 2.168 GiB/sec 0.697 False
17 UniqueString10bytes/0 917.726 MiB/sec 918.783 MiB/sec 0.115 False
43 UniqueInt64/6 2.017 GiB/sec 2.016 GiB/sec -0.071 False
40 UniqueString100bytes/0 4.091 GiB/sec 4.086 GiB/sec -0.130 False
3 UniqueString100bytes/7 1.938 GiB/sec 1.909 GiB/sec -1.519 False
26 UniqueString100bytes/8 1.954 GiB/sec 1.897 GiB/sec -2.935 False
2 UniqueString100bytes/9 2.114 GiB/sec 2.034 GiB/sec -3.782 False
30 UniqueString100bytes/6 2.008 GiB/sec 1.895 GiB/sec -5.649 True
10 UniqueUInt8/3 474.468 MiB/sec 422.604 MiB/sec -10.931 True
15 BuildDictionary 1.776 GiB/sec 1.212 GiB/sec -31.742 True
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org