You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/21 20:11:30 UTC

[GitHub] [arrow] save-buffer commented on pull request #13661: [C++][DONOTMERGE] Use -O2 instead of -O3 for RELEASE builds

save-buffer commented on PR #13661:
URL: https://github.com/apache/arrow/pull/13661#issuecomment-1191888800

I've been trying to get caught up on the context here - I took a look at #13654. My current understanding is:
- The problem we are trying to solve are insanely large functions generated by the codegen framework when using -O3
- The theory is that it has to do with -O3 applying tons of crazy optimizations that leads to lots of bloat due to too much vectorized code
Does that sound right?

So looking at the results, -O3 adds about 1MB (to ~22MB) to the total binary size, so I think that's not an issue itself. However, there is something to be said about bloating individual kernels. Reading the other PR, it seems like one of the kernels was 40 KB big? That's quite alarming as chips these days have about 32 KB of icache. In the worst case, that's quite a bit of thrashing.
That particular disassembly looks to me like the compiler is vectorizing _and_ unrolling the loop after vectorizing it.

As for solutions: Looking at the benchmarks, it seems like the current code is pretty unstable with regards to what the compiler generates when it comes to flags. I'm not sure messing with compiler flags will be one-size-fits-all as each combination of flags causes large changes in the generated code. I did like the changes in #13654.

I really liked this point, which very much aligns with my experience and intuition that abstract templates lead to unstable code generation:
> our approach (so much for "zero cost abstractions") for generalizing to abstract between writing to an array versus packing a bitmap is causing too much code to be generated.

So two solutions we could have are:
- Keep existing code and compilation flags but explicitly disable them for problematic kernels (using something like `#pragma GCC push_options` and `#pragma GCC pop_options`, though I'm not sure if there's a way to do this on MSVC.
- Change the code to use fewer templates and more raw for loops. If we're feeling really adventurous, we could write a Python or Jinja script that generates the kernels as the simplest possible for loop (I know this is the approach used in a lot of databases). I have never seen a problem with this style of code even on -O3.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org