You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/10 06:58:41 UTC

[GitHub] [arrow-rs] tatsuya6502 commented on issue #1829: AVX512 + simd binary and/or kernels slower than autovectorized version

tatsuya6502 commented on issue #1829:
URL: https://github.com/apache/arrow-rs/issues/1829#issuecomment-1152036764

   > For some reason the second benchmark is always significantly slower when run together, running them separately gives the same (higher) performance and the assembly looks identical except for the and/or. I'm guessing branch predictor or allocator related.
   
   You might want to do some sampling on CPU frequencies using `lscpu -e` or something while running these benchmarks. Since AVX-512 SIMD instructions consume much more power than regular 64 byte instructions (registers are eight times longer), they can produce more heat and CPU cores can reduce the base frequencies.
   
   This article by Cloudflare explains about accidental AVX-512 throttling:
   https://blog.cloudflare.com/on-the-dangers-of-intels-frequency-scaling/
   
   Note that the article was written in 2017, and today's processors could do better than that. I am not sure if it still applies or not.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org