You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/05/25 08:29:42 UTC

[GitHub] [arrow] jianxind commented on pull request #7267: ARROW-8772: [C++] Unrooled aggregate dense for better speculative execution

jianxind commented on pull request #7267:
URL: https://github.com/apache/arrow/pull/7267#issuecomment-633449893


   We find the SumKernel benchmark dense(null percent 0) results is relatively low compared to sparse part for float and double type. Below is the result before unrolled the loop.
   SumKernelFloat/32768/0         9.12 us         9.11 us        76809 bytes_per_second=3.35021G/s null_percent=0 size=32.768k
   SumKernelFloat/32768/1         4.38 us         4.38 us       159861 bytes_per_second=6.97134G/s null_percent=1 size=32.768k
   SumKernelDouble/32768/0        4.99 us         4.99 us       140190 bytes_per_second=6.11937G/s null_percent=0 size=32.768k
   SumKernelDouble/32768/1        2.14 us         2.14 us       327897 bytes_per_second=14.246G/s null_percent=1 size=32.768k
   
   With the unroll change,  the dense sumkernel benchmark get 3.7x for float and 2.6x for double.
   SumKernelFloat/32768/0         2.46 us         2.46 us       285185 bytes_per_second=12.4269G/s null_percent=0 size=32.768k
   SumKernelDouble/32768/0        1.90 us         1.90 us       370921 bytes_per_second=16.0846G/s null_percent=0 size=32.768k
   
   Anyway, it can get more higher performance if using intrinsic, I'd like to work at later point.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org