You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/06/01 07:55:33 UTC

[GitHub] [arrow] jianxind commented on pull request #7314: ARROW-8996: [C++] SSE runtime support for aggregate sum dense kernel

jianxind commented on pull request #7314:
URL: https://github.com/apache/arrow/pull/7314#issuecomment-636677517


   Benchmark data:
   
   Before:
   ```
   SumKernelFloat/32768/0         2.96 us         2.96 us       236912 bytes_per_second=10.3227G/s null_percent=0 size=32.768k
   SumKernelFloat/32768/1         4.88 us         4.88 us       143527 bytes_per_second=6.25439G/s null_percent=1 size=32.768k
   SumKernelFloat/32768/10        5.13 us         5.13 us       136839 bytes_per_second=5.95117G/s null_percent=10 size=32.768k
   SumKernelFloat/32768/50        7.82 us         7.81 us        87129 bytes_per_second=3.9054G/s null_percent=50 size=32.768k
   SumKernelDouble/32768/0        1.97 us         1.97 us       356786 bytes_per_second=15.4906G/s null_percent=0 size=32.768k
   SumKernelDouble/32768/1        2.11 us         2.11 us       331511 bytes_per_second=14.4975G/s null_percent=1 size=32.768k
   SumKernelDouble/32768/10       2.39 us         2.38 us       291292 bytes_per_second=12.7966G/s null_percent=10 size=32.768k
   SumKernelDouble/32768/50       2.60 us         2.60 us       268800 bytes_per_second=11.7462G/s null_percent=50 size=32.768k
   SumKernelInt8/32768/0          11.7 us         11.7 us        59926 bytes_per_second=2.61569G/s null_percent=0 size=32.768k
   SumKernelInt8/32768/1          11.0 us         10.9 us        63640 bytes_per_second=2.78831G/s null_percent=1 size=32.768k
   SumKernelInt8/32768/10         14.8 us         14.8 us        46573 bytes_per_second=2.05848G/s null_percent=10 size=32.768k
   SumKernelInt8/32768/50         14.6 us         14.6 us        47840 bytes_per_second=2.08905G/s null_percent=50 size=32.768k
   SumKernelInt16/32768/0         7.06 us         7.06 us        99354 bytes_per_second=4.3245G/s null_percent=0 size=32.768k
   SumKernelInt16/32768/1         4.76 us         4.75 us       147305 bytes_per_second=6.41928G/s null_percent=1 size=32.768k
   SumKernelInt16/32768/10        5.64 us         5.63 us       122737 bytes_per_second=5.42002G/s null_percent=10 size=32.768k
   SumKernelInt16/32768/50        6.71 us         6.70 us       104192 bytes_per_second=4.55206G/s null_percent=50 size=32.768k
   SumKernelInt32/32768/0         3.92 us         3.92 us       178798 bytes_per_second=7.79042G/s null_percent=0 size=32.768k
   SumKernelInt32/32768/1         3.27 us         3.27 us       214296 bytes_per_second=9.332G/s null_percent=1 size=32.768k
   SumKernelInt32/32768/10        3.41 us         3.40 us       204944 bytes_per_second=8.9683G/s null_percent=10 size=32.768k
   SumKernelInt32/32768/50        3.69 us         3.69 us       190248 bytes_per_second=8.27705G/s null_percent=50 size=32.768k
   SumKernelInt64/32768/0         1.92 us         1.91 us       368662 bytes_per_second=15.9508G/s null_percent=0 size=32.768k
   SumKernelInt64/32768/1         2.05 us         2.05 us       340168 bytes_per_second=14.8684G/s null_percent=1 size=32.768k
   SumKernelInt64/32768/10        2.16 us         2.16 us       323585 bytes_per_second=14.1164G/s null_percent=10 size=32.768k
   SumKernelInt64/32768/50        2.41 us         2.41 us       291073 bytes_per_second=12.6873G/s null_percent=50 size=32.768k
   ```
   
   After:
   ```
   SumKernelFloat/32768/0         2.27 us         2.27 us       307928 bytes_per_second=13.438G/s null_percent=0 size=32.768k
   SumKernelFloat/32768/1         4.59 us         4.59 us       152827 bytes_per_second=6.6508G/s null_percent=1 size=32.768k
   SumKernelFloat/32768/10        5.30 us         5.29 us       132106 bytes_per_second=5.76658G/s null_percent=10 size=32.768k
   SumKernelFloat/32768/50        5.80 us         5.80 us       114378 bytes_per_second=5.26584G/s null_percent=50 size=32.768k
   SumKernelDouble/32768/0        1.42 us         1.42 us       494426 bytes_per_second=21.5265G/s null_percent=0 size=32.768k
   SumKernelDouble/32768/1        2.12 us         2.12 us       330890 bytes_per_second=14.4268G/s null_percent=1 size=32.768k
   SumKernelDouble/32768/10       2.44 us         2.43 us       286310 bytes_per_second=12.5441G/s null_percent=10 size=32.768k
   SumKernelDouble/32768/50       2.72 us         2.71 us       257105 bytes_per_second=11.2507G/s null_percent=50 size=32.768k
   SumKernelInt8/32768/0          5.35 us         5.34 us       130751 bytes_per_second=5.71315G/s null_percent=0 size=32.768k
   SumKernelInt8/32768/1          9.80 us         9.79 us        71384 bytes_per_second=3.11589G/s null_percent=1 size=32.768k
   SumKernelInt8/32768/10         13.9 us         13.9 us        49729 bytes_per_second=2.19116G/s null_percent=10 size=32.768k
   SumKernelInt8/32768/50         12.5 us         12.5 us        55929 bytes_per_second=2.43479G/s null_percent=50 size=32.768k
   SumKernelInt16/32768/0         3.20 us         3.19 us       218923 bytes_per_second=9.55594G/s null_percent=0 size=32.768k
   SumKernelInt16/32768/1         5.31 us         5.31 us       131394 bytes_per_second=5.75174G/s null_percent=1 size=32.768k
   SumKernelInt16/32768/10        6.20 us         6.19 us       113037 bytes_per_second=4.92965G/s null_percent=10 size=32.768k
   SumKernelInt16/32768/50        7.25 us         7.24 us        96604 bytes_per_second=4.21535G/s null_percent=50 size=32.768k
   SumKernelInt32/32768/0         2.18 us         2.18 us       321572 bytes_per_second=14.0037G/s null_percent=0 size=32.768k
   SumKernelInt32/32768/1         3.32 us         3.32 us       209911 bytes_per_second=9.18857G/s null_percent=1 size=32.768k
   SumKernelInt32/32768/10        3.59 us         3.58 us       195106 bytes_per_second=8.51472G/s null_percent=10 size=32.768k
   SumKernelInt32/32768/50        3.83 us         3.82 us       182739 bytes_per_second=7.98056G/s null_percent=50 size=32.768k
   SumKernelInt64/32768/0         1.37 us         1.37 us       514237 bytes_per_second=22.3564G/s null_percent=0 size=32.768k
   SumKernelInt64/32768/1         2.09 us         2.09 us       333678 bytes_per_second=14.5962G/s null_percent=1 size=32.768k
   SumKernelInt64/32768/10        2.18 us         2.18 us       320094 bytes_per_second=13.9904G/s null_percent=10 size=32.768k
   SumKernelInt64/32768/50        2.41 us         2.40 us       289766 bytes_per_second=12.6907G/s null_percent=50 size=32.768k
   ```
   
   All dense part of data types has some improvements , ex Double jump to 21.5265G/s from 15.4906G/s.
   
   The sparse parts I will look into later as it need some additional to remove the invalid value before passing to the SIMD add operations, it need some shuffle op to replace the invalid value to zero.
   
   Also the dense part can be speed up again if using AVX2/AVX512 which is a later job also.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org