You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/25 08:29:42 UTC
[GitHub] [arrow] jianxind commented on pull request #7267: ARROW-8772: [C++] Unrooled aggregate dense for better speculative execution
jianxind commented on pull request #7267:
URL: https://github.com/apache/arrow/pull/7267#issuecomment-633449893
We find the SumKernel benchmark dense(null percent 0) results is relatively low compared to sparse part for float and double type. Below is the result before unrolled the loop.
SumKernelFloat/32768/0 9.12 us 9.11 us 76809 bytes_per_second=3.35021G/s null_percent=0 size=32.768k
SumKernelFloat/32768/1 4.38 us 4.38 us 159861 bytes_per_second=6.97134G/s null_percent=1 size=32.768k
SumKernelDouble/32768/0 4.99 us 4.99 us 140190 bytes_per_second=6.11937G/s null_percent=0 size=32.768k
SumKernelDouble/32768/1 2.14 us 2.14 us 327897 bytes_per_second=14.246G/s null_percent=1 size=32.768k
With the unroll change, the dense sumkernel benchmark get 3.7x for float and 2.6x for double.
SumKernelFloat/32768/0 2.46 us 2.46 us 285185 bytes_per_second=12.4269G/s null_percent=0 size=32.768k
SumKernelDouble/32768/0 1.90 us 1.90 us 370921 bytes_per_second=16.0846G/s null_percent=0 size=32.768k
Anyway, it can get more higher performance if using intrinsic, I'd like to work at later point.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org