You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2020/06/22 04:27:17 UTC

[DISCUSS][C++] Performance work and compiler standardization for linux

There has been significant effort recently trying to optimize our C++
code.  One  thing that seems to come up frequently is different benchmark
results between GCC and Clang.  Even different versions of the same
compiler can yield significantly different results on the same code.

I would like to propose that we choose a specific compiler and version on
Linux for evaluating performance related PRs.  PRs would only be accepted
if they improve the benchmarks under the selected version. Other
performance related PRs would still be acceptable if they improve
benchmarks for a different compiler as long as they don't make the primary
one worse and don't hurt maintainability of the code.  I also don't think
this should change our CI matrix in any way.

There are other variables that potentially come into play for evaluating
benchmarks (architecture, OS Version, etc) but if we need to limit these in
a similar way, I think they should be handled as separate discussions.

I'm not clear on the limitations that our distributions place on compilers,
but ideally we'd pick a compiler that can produce most/all of our binary
artifacts.

Could someone more familiar with the release toolchain comment on good
options to consider?

Any other criteria we should consider? Other thoughts?

Thanks,
Micah

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Antoine Pitrou <an...@python.org>.

Also, in my experience, it is much easier to install another clang
version than another gcc version on Linux (gcc is more or less married
to a given libstdc++ version, AFAICT).

Regards

Antoine.


Le 22/06/2020 à 09:06, Uwe L. Korn a écrit :
> With my conda-forge background, I would suggest to use clang as a performance baseline, because it's currently the only compiler that works reliably on all platforms. Most Linux distributions are nowadays built with gcc, also making a strong argument, but on OSX and Windows the picture is a bit different.
> 
> Linux: One can use gcc or clang, as long as you use the right flags, they are interchangable.
> macOS: There are gcc builds but I have seen in recent time people solely using clang, either the one coming with Xcode or a newer build.
> Windows (MSVC-toolchain): You can use MSVC or clang, while most people prefer MSVC, clang is well-tested (e.g. Chrome uses it)
> Windows (MSYS2-toolchain): Mostly uses GCC but clang is also supported. There were some informal mentions that clang performs better here when using AVX2.
> 
> Cheers
> Uwe 
> 
> 
> On Mon, Jun 22, 2020, at 6:27 AM, Micah Kornfield wrote:
>> There has been significant effort recently trying to optimize our C++
>> code.  One  thing that seems to come up frequently is different benchmark
>> results between GCC and Clang.  Even different versions of the same
>> compiler can yield significantly different results on the same code.
>>
>> I would like to propose that we choose a specific compiler and version on
>> Linux for evaluating performance related PRs.  PRs would only be accepted
>> if they improve the benchmarks under the selected version. Other
>> performance related PRs would still be acceptable if they improve
>> benchmarks for a different compiler as long as they don't make the primary
>> one worse and don't hurt maintainability of the code.  I also don't think
>> this should change our CI matrix in any way.
>>
>> There are other variables that potentially come into play for evaluating
>> benchmarks (architecture, OS Version, etc) but if we need to limit these in
>> a similar way, I think they should be handled as separate discussions.
>>
>> I'm not clear on the limitations that our distributions place on compilers,
>> but ideally we'd pick a compiler that can produce most/all of our binary
>> artifacts.
>>
>> Could someone more familiar with the release toolchain comment on good
>> options to consider?
>>
>> Any other criteria we should consider? Other thoughts?
>>
>> Thanks,
>> Micah
>>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by "Uwe L. Korn" <uw...@xhochy.com>.

With my conda-forge background, I would suggest to use clang as a performance baseline, because it's currently the only compiler that works reliably on all platforms. Most Linux distributions are nowadays built with gcc, also making a strong argument, but on OSX and Windows the picture is a bit different.

Linux: One can use gcc or clang, as long as you use the right flags, they are interchangable.
macOS: There are gcc builds but I have seen in recent time people solely using clang, either the one coming with Xcode or a newer build.
Windows (MSVC-toolchain): You can use MSVC or clang, while most people prefer MSVC, clang is well-tested (e.g. Chrome uses it)
Windows (MSYS2-toolchain): Mostly uses GCC but clang is also supported. There were some informal mentions that clang performs better here when using AVX2.

Cheers
Uwe 

On Mon, Jun 22, 2020, at 6:27 AM, Micah Kornfield wrote:
> There has been significant effort recently trying to optimize our C++
> code.  One  thing that seems to come up frequently is different benchmark
> results between GCC and Clang.  Even different versions of the same
> compiler can yield significantly different results on the same code.
> 
> I would like to propose that we choose a specific compiler and version on
> Linux for evaluating performance related PRs.  PRs would only be accepted
> if they improve the benchmarks under the selected version. Other
> performance related PRs would still be acceptable if they improve
> benchmarks for a different compiler as long as they don't make the primary
> one worse and don't hurt maintainability of the code.  I also don't think
> this should change our CI matrix in any way.
> 
> There are other variables that potentially come into play for evaluating
> benchmarks (architecture, OS Version, etc) but if we need to limit these in
> a similar way, I think they should be handled as separate discussions.
> 
> I'm not clear on the limitations that our distributions place on compilers,
> but ideally we'd pick a compiler that can produce most/all of our binary
> artifacts.
> 
> Could someone more familiar with the release toolchain comment on good
> options to consider?
> 
> Any other criteria we should consider? Other thoughts?
> 
> Thanks,
> Micah
>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Micah Kornfield <em...@gmail.com>.

I think there is a lot of good discussion on this thread.  Let me try to
try to summarize and give my thoughts:

1.  Will this be a hard rule?
No, based on the feedback, I think there will always be some subjectivity
(e.g. tradeoff in increase in code size and maintainability).  My intent
here was to try to reduce cognitive load on reviewers and give clear
guidance for contributors on how PRs for performance will be evaluated.

2.  Why don't we track all of them?
We should be tracking all of them.  I think the question is what happens
when they diverge on benchmarks/real world use?  How do we decide if the PR
should be merged?  This goes back to point #1, having some basic guidelines
is useful. I would imagine we might already be "nerfing" users on older
linux distros with older C++ toolchains.  Going back to the point above, it
would be good to have a set of criteria on how much effort we invest to
avoid or fix regressions in this case.

I would hope that the pattern we will see is that modern compilers will
likely converge.  Looking into GCC/Clang specifically, it appears GCC
8/Clang 8 is perhaps the wrong comparison.  Clang 8  appears to have been
released 8 months after GCC 8.  GCC 9 was released 2 months after Clang 8.

3.  Microbenchmarks don't tell the whole story? (we shouldn't overfit)
I agree, but we should have at least a few more macro style benchmarks that
can help judge overall impact.  This is also where the other criteria come
into play on deciding whether to merge.

4.  Which compiler to choose?
I agree with the sentiment that we should be choosing a compiler that can
reflect performance in our prepackaged binaries.  Based on this thread, I'm
a little bit confused as to whether we can build more of our code with a
single compiler for the distribution channels (it sounds like it might be
viable with clang).  But IIUC most of our builds on, linux at least, use
GCC.  Are we even using consistent versions of GCC across our packages?  I
think it is a bad outcome for our users (and potentially a headache for us)
if we announce X features improved by Y% and users actually see a
regression.

Thanks,
Micah

On Mon, Jun 22, 2020 at 10:08 AM Wes McKinney <we...@gmail.com> wrote:

> For the curious, I ran benchmarks for both gcc-8 and clang-8 on my
> laptop for the ARROW-9197 patch
>
> * Clang https://github.com/apache/arrow/pull/7506#issuecomment-647633470
> * GCC https://github.com/apache/arrow/pull/7506#issuecomment-647649670
>
> It's apparent in this particular instance that GCC was applying some
> optimizations pretty well outside of what Clang was doing. The
> baseline performance was 5-6x faster on GCC for certain cases than
> Clang. Very strange indeed
>
> On Mon, Jun 22, 2020 at 11:48 AM Neal Richardson
> <ne...@gmail.com> wrote:
> >
> > FTR, by default R on macOS uses system clang, not homebrew; for Windows,
> it
> > is gcc (unless you're adventurous like Uwe and can hack it to work with
> > clang ;); and on Linux, CRAN checks both gcc and clang.
> >
> > Is it unreasonable to check *both* gcc and clang when there's a
> potentially
> > significant performance change? It sounded like recently there have been
> > cases where a change makes a big speedup with one compiler but a slowdown
> > with the other, so if we only test with one, we'll be degrading with the
> > other over time. But if we test both (simplifying of course, since there
> > are multiple versions, etc.), then at least we'd be more aware of what
> the
> > tradeoff is.
> >
> > Neal
> >
> >
> > On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques <
> > fsaintjacques@gmail.com> wrote:
> >
> > > We should aim to improve the performance of the most widely used
> > > *default* packages, which are python pip, python conda and R (all
> > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> > > (haven't checked) that clang is used on OSX via brew. Thus, by
> > > default, almost all users are going to use a gcc compiled version of
> > > arrow on Linux.
> > >
> > > François
> > >
> > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <we...@gmail.com>
> wrote:
> > > >
> > > > Based on some of my performance work recently, I'm growing
> > > > uncomfortable with using gcc as the performance baseline since the
> > > > results can be significantly different (sometimes 3-4x or more on
> > > > certain fast algorithms) from clang and MSVC. The perf results on
> > > > https://github.com/apache/arrow/pull/7506 were really surprising --
> > > > some benchmarks that showed 2-5x performance improvement on both
> clang
> > > > and MSVC shows small regressions (20-30%) with gcc.
> > > >
> > > > I don't think we need a hard-and-fast rule about whether to accept
> PRs
> > > > based on benchmarks but there are a few guiding criteria:
> > > >
> > > > * How much binary size does the new code add? I think many of us
> would
> > > > agree that a 20% performance increase on some algorithm might not be
> > > > worth adding 500KB to libarrow.so
> > > > * Is the code generally faster across the major compiler targets
> (gcc,
> > > > clang, MSVC)?
> > > >
> > > > I think that using clang as a baseline for informational benchmarks
> > > > would be good, but ultimately we need to be systematically collecting
> > > > data on all the major compiilers. Some time ago I proposed building a
> > > > Continuous Benchmarking framework
> > > > (
> https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > > > for use with Arrow (and outside of Arrow, too) so I hope that this
> > > > will be able to help.
> > > >
> > > > - Wes
> > > >
> > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
> > > > >
> > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > > > >
> > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > > > >> There has been significant effort recently trying to optimize
> our
> > > C++
> > > > > >> code.  One  thing that seems to come up frequently is different
> > > benchmark
> > > > > >> results between GCC and Clang.  Even different versions of the
> same
> > > > > >> compiler can yield significantly different results on the same
> code.
> > > > > >>
> > > > > >> I would like to propose that we choose a specific compiler and
> > > version on
> > > > > >> Linux for evaluating performance related PRs.  PRs would only be
> > > accepted
> > > > > >> if they improve the benchmarks under the selected version.
> > > > > >
> > > > > > Would this be a hard rule or just a guideline?  There are many
> ways
> > > in
> > > > > > which benchmark numbers can be improved or deteriorated by a PR,
> and
> > > in
> > > > > > some cases that doesn't matter (benchmarks are not always
> realistic,
> > > and
> > > > > > they are not representative of every workload).
> > > > > >
> > > > >
> > > > > I agree that microbenchmark is not always useful, focusing too
> much on
> > > > > improving microbenchmark result gives me feeling of "overfit" (to
> some
> > > > > specific microarchitecture, compiler, or use case).
> > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > >
>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Wes McKinney <we...@gmail.com>.

For the curious, I ran benchmarks for both gcc-8 and clang-8 on my
laptop for the ARROW-9197 patch

* Clang https://github.com/apache/arrow/pull/7506#issuecomment-647633470
* GCC https://github.com/apache/arrow/pull/7506#issuecomment-647649670

It's apparent in this particular instance that GCC was applying some
optimizations pretty well outside of what Clang was doing. The
baseline performance was 5-6x faster on GCC for certain cases than
Clang. Very strange indeed

On Mon, Jun 22, 2020 at 11:48 AM Neal Richardson
<ne...@gmail.com> wrote:
>
> FTR, by default R on macOS uses system clang, not homebrew; for Windows, it
> is gcc (unless you're adventurous like Uwe and can hack it to work with
> clang ;); and on Linux, CRAN checks both gcc and clang.
>
> Is it unreasonable to check *both* gcc and clang when there's a potentially
> significant performance change? It sounded like recently there have been
> cases where a change makes a big speedup with one compiler but a slowdown
> with the other, so if we only test with one, we'll be degrading with the
> other over time. But if we test both (simplifying of course, since there
> are multiple versions, etc.), then at least we'd be more aware of what the
> tradeoff is.
>
> Neal
>
>
> On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques <
> fsaintjacques@gmail.com> wrote:
>
> > We should aim to improve the performance of the most widely used
> > *default* packages, which are python pip, python conda and R (all
> > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> > (haven't checked) that clang is used on OSX via brew. Thus, by
> > default, almost all users are going to use a gcc compiled version of
> > arrow on Linux.
> >
> > François
> >
> > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <we...@gmail.com> wrote:
> > >
> > > Based on some of my performance work recently, I'm growing
> > > uncomfortable with using gcc as the performance baseline since the
> > > results can be significantly different (sometimes 3-4x or more on
> > > certain fast algorithms) from clang and MSVC. The perf results on
> > > https://github.com/apache/arrow/pull/7506 were really surprising --
> > > some benchmarks that showed 2-5x performance improvement on both clang
> > > and MSVC shows small regressions (20-30%) with gcc.
> > >
> > > I don't think we need a hard-and-fast rule about whether to accept PRs
> > > based on benchmarks but there are a few guiding criteria:
> > >
> > > * How much binary size does the new code add? I think many of us would
> > > agree that a 20% performance increase on some algorithm might not be
> > > worth adding 500KB to libarrow.so
> > > * Is the code generally faster across the major compiler targets (gcc,
> > > clang, MSVC)?
> > >
> > > I think that using clang as a baseline for informational benchmarks
> > > would be good, but ultimately we need to be systematically collecting
> > > data on all the major compiilers. Some time ago I proposed building a
> > > Continuous Benchmarking framework
> > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > > for use with Arrow (and outside of Arrow, too) so I hope that this
> > > will be able to help.
> > >
> > > - Wes
> > >
> > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
> > > >
> > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > > >
> > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > > >> There has been significant effort recently trying to optimize our
> > C++
> > > > >> code.  One  thing that seems to come up frequently is different
> > benchmark
> > > > >> results between GCC and Clang.  Even different versions of the same
> > > > >> compiler can yield significantly different results on the same code.
> > > > >>
> > > > >> I would like to propose that we choose a specific compiler and
> > version on
> > > > >> Linux for evaluating performance related PRs.  PRs would only be
> > accepted
> > > > >> if they improve the benchmarks under the selected version.
> > > > >
> > > > > Would this be a hard rule or just a guideline?  There are many ways
> > in
> > > > > which benchmark numbers can be improved or deteriorated by a PR, and
> > in
> > > > > some cases that doesn't matter (benchmarks are not always realistic,
> > and
> > > > > they are not representative of every workload).
> > > > >
> > > >
> > > > I agree that microbenchmark is not always useful, focusing too much on
> > > > improving microbenchmark result gives me feeling of "overfit" (to some
> > > > specific microarchitecture, compiler, or use case).
> > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> >

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Neal Richardson <ne...@gmail.com>.

FTR, by default R on macOS uses system clang, not homebrew; for Windows, it
is gcc (unless you're adventurous like Uwe and can hack it to work with
clang ;); and on Linux, CRAN checks both gcc and clang.

Is it unreasonable to check *both* gcc and clang when there's a potentially
significant performance change? It sounded like recently there have been
cases where a change makes a big speedup with one compiler but a slowdown
with the other, so if we only test with one, we'll be degrading with the
other over time. But if we test both (simplifying of course, since there
are multiple versions, etc.), then at least we'd be more aware of what the
tradeoff is.

Neal


On Mon, Jun 22, 2020 at 8:42 AM Francois Saint-Jacques <
fsaintjacques@gmail.com> wrote:

> We should aim to improve the performance of the most widely used
> *default* packages, which are python pip, python conda and R (all
> platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> (haven't checked) that clang is used on OSX via brew. Thus, by
> default, almost all users are going to use a gcc compiled version of
> arrow on Linux.
>
> François
>
> On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > Based on some of my performance work recently, I'm growing
> > uncomfortable with using gcc as the performance baseline since the
> > results can be significantly different (sometimes 3-4x or more on
> > certain fast algorithms) from clang and MSVC. The perf results on
> > https://github.com/apache/arrow/pull/7506 were really surprising --
> > some benchmarks that showed 2-5x performance improvement on both clang
> > and MSVC shows small regressions (20-30%) with gcc.
> >
> > I don't think we need a hard-and-fast rule about whether to accept PRs
> > based on benchmarks but there are a few guiding criteria:
> >
> > * How much binary size does the new code add? I think many of us would
> > agree that a 20% performance increase on some algorithm might not be
> > worth adding 500KB to libarrow.so
> > * Is the code generally faster across the major compiler targets (gcc,
> > clang, MSVC)?
> >
> > I think that using clang as a baseline for informational benchmarks
> > would be good, but ultimately we need to be systematically collecting
> > data on all the major compiilers. Some time ago I proposed building a
> > Continuous Benchmarking framework
> > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > for use with Arrow (and outside of Arrow, too) so I hope that this
> > will be able to help.
> >
> > - Wes
> >
> > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
> > >
> > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > >
> > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > >> There has been significant effort recently trying to optimize our
> C++
> > > >> code.  One  thing that seems to come up frequently is different
> benchmark
> > > >> results between GCC and Clang.  Even different versions of the same
> > > >> compiler can yield significantly different results on the same code.
> > > >>
> > > >> I would like to propose that we choose a specific compiler and
> version on
> > > >> Linux for evaluating performance related PRs.  PRs would only be
> accepted
> > > >> if they improve the benchmarks under the selected version.
> > > >
> > > > Would this be a hard rule or just a guideline?  There are many ways
> in
> > > > which benchmark numbers can be improved or deteriorated by a PR, and
> in
> > > > some cases that doesn't matter (benchmarks are not always realistic,
> and
> > > > they are not representative of every workload).
> > > >
> > >
> > > I agree that microbenchmark is not always useful, focusing too much on
> > > improving microbenchmark result gives me feeling of "overfit" (to some
> > > specific microarchitecture, compiler, or use case).
> > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Wes McKinney <we...@gmail.com>.

On Wed, Jun 24, 2020 at 9:48 PM Micah Kornfield <em...@gmail.com> wrote:
>
> In that case I would propose the following:
> 1.  Standardize on clang for performance generating numbers for performance
> related PRs
> 2.  Adjust our binary artifact builds to use clang where feasible (I think
> should wait until after our next release).

Note that the manylinux* standards mandate gcc so we'll be stuck on
gcc for the Python wheels and perhaps some other builds. Otherwise I'd
be in favor of using clang when possible.

> 3.  Add to the contributors guide summarizing the discussion.
>
> Sound reasonable?
>
> Thanks,
> Micah
>
> On Tue, Jun 23, 2020 at 12:53 AM Uwe L. Korn <uw...@xhochy.com> wrote:
>
> > FTR: We can use the latest(!) clang for all platform for conda and wheels.
> > It isn't probably even that much of a complicated setup.
> >
> > On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote:
> > > We should aim to improve the performance of the most widely used
> > > *default* packages, which are python pip, python conda and R (all
> > > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> > > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> > > (haven't checked) that clang is used on OSX via brew. Thus, by
> > > default, almost all users are going to use a gcc compiled version of
> > > arrow on Linux.
> > >
> > > François
> > >
> > > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <we...@gmail.com>
> > wrote:
> > > >
> > > > Based on some of my performance work recently, I'm growing
> > > > uncomfortable with using gcc as the performance baseline since the
> > > > results can be significantly different (sometimes 3-4x or more on
> > > > certain fast algorithms) from clang and MSVC. The perf results on
> > > > https://github.com/apache/arrow/pull/7506 were really surprising --
> > > > some benchmarks that showed 2-5x performance improvement on both clang
> > > > and MSVC shows small regressions (20-30%) with gcc.
> > > >
> > > > I don't think we need a hard-and-fast rule about whether to accept PRs
> > > > based on benchmarks but there are a few guiding criteria:
> > > >
> > > > * How much binary size does the new code add? I think many of us would
> > > > agree that a 20% performance increase on some algorithm might not be
> > > > worth adding 500KB to libarrow.so
> > > > * Is the code generally faster across the major compiler targets (gcc,
> > > > clang, MSVC)?
> > > >
> > > > I think that using clang as a baseline for informational benchmarks
> > > > would be good, but ultimately we need to be systematically collecting
> > > > data on all the major compiilers. Some time ago I proposed building a
> > > > Continuous Benchmarking framework
> > > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > > > for use with Arrow (and outside of Arrow, too) so I hope that this
> > > > will be able to help.
> > > >
> > > > - Wes
> > > >
> > > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
> > > > >
> > > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > > > >
> > > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > > > >> There has been significant effort recently trying to optimize our
> > C++
> > > > > >> code.  One  thing that seems to come up frequently is different
> > benchmark
> > > > > >> results between GCC and Clang.  Even different versions of the
> > same
> > > > > >> compiler can yield significantly different results on the same
> > code.
> > > > > >>
> > > > > >> I would like to propose that we choose a specific compiler and
> > version on
> > > > > >> Linux for evaluating performance related PRs.  PRs would only be
> > accepted
> > > > > >> if they improve the benchmarks under the selected version.
> > > > > >
> > > > > > Would this be a hard rule or just a guideline?  There are many
> > ways in
> > > > > > which benchmark numbers can be improved or deteriorated by a PR,
> > and in
> > > > > > some cases that doesn't matter (benchmarks are not always
> > realistic, and
> > > > > > they are not representative of every workload).
> > > > > >
> > > > >
> > > > > I agree that microbenchmark is not always useful, focusing too much
> > on
> > > > > improving microbenchmark result gives me feeling of "overfit" (to
> > some
> > > > > specific microarchitecture, compiler, or use case).
> > > > >
> > > > > > Regards
> > > > > >
> > > > > > Antoine.
> > > > > >
> > >
> >

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Micah Kornfield <em...@gmail.com>.

In that case I would propose the following:
1.  Standardize on clang for performance generating numbers for performance
related PRs
2.  Adjust our binary artifact builds to use clang where feasible (I think
should wait until after our next release).
3.  Add to the contributors guide summarizing the discussion.

Sound reasonable?

Thanks,
Micah

On Tue, Jun 23, 2020 at 12:53 AM Uwe L. Korn <uw...@xhochy.com> wrote:

> FTR: We can use the latest(!) clang for all platform for conda and wheels.
> It isn't probably even that much of a complicated setup.
>
> On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote:
> > We should aim to improve the performance of the most widely used
> > *default* packages, which are python pip, python conda and R (all
> > platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> > default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> > (haven't checked) that clang is used on OSX via brew. Thus, by
> > default, almost all users are going to use a gcc compiled version of
> > arrow on Linux.
> >
> > François
> >
> > On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > > Based on some of my performance work recently, I'm growing
> > > uncomfortable with using gcc as the performance baseline since the
> > > results can be significantly different (sometimes 3-4x or more on
> > > certain fast algorithms) from clang and MSVC. The perf results on
> > > https://github.com/apache/arrow/pull/7506 were really surprising --
> > > some benchmarks that showed 2-5x performance improvement on both clang
> > > and MSVC shows small regressions (20-30%) with gcc.
> > >
> > > I don't think we need a hard-and-fast rule about whether to accept PRs
> > > based on benchmarks but there are a few guiding criteria:
> > >
> > > * How much binary size does the new code add? I think many of us would
> > > agree that a 20% performance increase on some algorithm might not be
> > > worth adding 500KB to libarrow.so
> > > * Is the code generally faster across the major compiler targets (gcc,
> > > clang, MSVC)?
> > >
> > > I think that using clang as a baseline for informational benchmarks
> > > would be good, but ultimately we need to be systematically collecting
> > > data on all the major compiilers. Some time ago I proposed building a
> > > Continuous Benchmarking framework
> > > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > > for use with Arrow (and outside of Arrow, too) so I hope that this
> > > will be able to help.
> > >
> > > - Wes
> > >
> > > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
> > > >
> > > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > > >
> > > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > > >> There has been significant effort recently trying to optimize our
> C++
> > > > >> code.  One  thing that seems to come up frequently is different
> benchmark
> > > > >> results between GCC and Clang.  Even different versions of the
> same
> > > > >> compiler can yield significantly different results on the same
> code.
> > > > >>
> > > > >> I would like to propose that we choose a specific compiler and
> version on
> > > > >> Linux for evaluating performance related PRs.  PRs would only be
> accepted
> > > > >> if they improve the benchmarks under the selected version.
> > > > >
> > > > > Would this be a hard rule or just a guideline?  There are many
> ways in
> > > > > which benchmark numbers can be improved or deteriorated by a PR,
> and in
> > > > > some cases that doesn't matter (benchmarks are not always
> realistic, and
> > > > > they are not representative of every workload).
> > > > >
> > > >
> > > > I agree that microbenchmark is not always useful, focusing too much
> on
> > > > improving microbenchmark result gives me feeling of "overfit" (to
> some
> > > > specific microarchitecture, compiler, or use case).
> > > >
> > > > > Regards
> > > > >
> > > > > Antoine.
> > > > >
> >
>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by "Uwe L. Korn" <uw...@xhochy.com>.

FTR: We can use the latest(!) clang for all platform for conda and wheels. It isn't probably even that much of a complicated setup. 

On Mon, Jun 22, 2020, at 5:42 PM, Francois Saint-Jacques wrote:
> We should aim to improve the performance of the most widely used
> *default* packages, which are python pip, python conda and R (all
> platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
> default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
> (haven't checked) that clang is used on OSX via brew. Thus, by
> default, almost all users are going to use a gcc compiled version of
> arrow on Linux.
> 
> François
> 
> On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <we...@gmail.com> wrote:
> >
> > Based on some of my performance work recently, I'm growing
> > uncomfortable with using gcc as the performance baseline since the
> > results can be significantly different (sometimes 3-4x or more on
> > certain fast algorithms) from clang and MSVC. The perf results on
> > https://github.com/apache/arrow/pull/7506 were really surprising --
> > some benchmarks that showed 2-5x performance improvement on both clang
> > and MSVC shows small regressions (20-30%) with gcc.
> >
> > I don't think we need a hard-and-fast rule about whether to accept PRs
> > based on benchmarks but there are a few guiding criteria:
> >
> > * How much binary size does the new code add? I think many of us would
> > agree that a 20% performance increase on some algorithm might not be
> > worth adding 500KB to libarrow.so
> > * Is the code generally faster across the major compiler targets (gcc,
> > clang, MSVC)?
> >
> > I think that using clang as a baseline for informational benchmarks
> > would be good, but ultimately we need to be systematically collecting
> > data on all the major compiilers. Some time ago I proposed building a
> > Continuous Benchmarking framework
> > (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> > for use with Arrow (and outside of Arrow, too) so I hope that this
> > will be able to help.
> >
> > - Wes
> >
> > On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
> > >
> > > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > > >
> > > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > > >> There has been significant effort recently trying to optimize our C++
> > > >> code.  One  thing that seems to come up frequently is different benchmark
> > > >> results between GCC and Clang.  Even different versions of the same
> > > >> compiler can yield significantly different results on the same code.
> > > >>
> > > >> I would like to propose that we choose a specific compiler and version on
> > > >> Linux for evaluating performance related PRs.  PRs would only be accepted
> > > >> if they improve the benchmarks under the selected version.
> > > >
> > > > Would this be a hard rule or just a guideline?  There are many ways in
> > > > which benchmark numbers can be improved or deteriorated by a PR, and in
> > > > some cases that doesn't matter (benchmarks are not always realistic, and
> > > > they are not representative of every workload).
> > > >
> > >
> > > I agree that microbenchmark is not always useful, focusing too much on
> > > improving microbenchmark result gives me feeling of "overfit" (to some
> > > specific microarchitecture, compiler, or use case).
> > >
> > > > Regards
> > > >
> > > > Antoine.
> > > >
>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Francois Saint-Jacques <fs...@gmail.com>.

We should aim to improve the performance of the most widely used
*default* packages, which are python pip, python conda and R (all
platforms). AFAIK, both pip (manywheel) and conda use gcc on Linux by
default. R uses gcc on Linux and mingw (gcc) on Windows. I suppose
(haven't checked) that clang is used on OSX via brew. Thus, by
default, almost all users are going to use a gcc compiled version of
arrow on Linux.

François

On Mon, Jun 22, 2020 at 9:47 AM Wes McKinney <we...@gmail.com> wrote:
>
> Based on some of my performance work recently, I'm growing
> uncomfortable with using gcc as the performance baseline since the
> results can be significantly different (sometimes 3-4x or more on
> certain fast algorithms) from clang and MSVC. The perf results on
> https://github.com/apache/arrow/pull/7506 were really surprising --
> some benchmarks that showed 2-5x performance improvement on both clang
> and MSVC shows small regressions (20-30%) with gcc.
>
> I don't think we need a hard-and-fast rule about whether to accept PRs
> based on benchmarks but there are a few guiding criteria:
>
> * How much binary size does the new code add? I think many of us would
> agree that a 20% performance increase on some algorithm might not be
> worth adding 500KB to libarrow.so
> * Is the code generally faster across the major compiler targets (gcc,
> clang, MSVC)?
>
> I think that using clang as a baseline for informational benchmarks
> would be good, but ultimately we need to be systematically collecting
> data on all the major compiilers. Some time ago I proposed building a
> Continuous Benchmarking framework
> (https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
> for use with Arrow (and outside of Arrow, too) so I hope that this
> will be able to help.
>
> - Wes
>
> On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
> >
> > On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> > >
> > > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> > >> There has been significant effort recently trying to optimize our C++
> > >> code.  One  thing that seems to come up frequently is different benchmark
> > >> results between GCC and Clang.  Even different versions of the same
> > >> compiler can yield significantly different results on the same code.
> > >>
> > >> I would like to propose that we choose a specific compiler and version on
> > >> Linux for evaluating performance related PRs.  PRs would only be accepted
> > >> if they improve the benchmarks under the selected version.
> > >
> > > Would this be a hard rule or just a guideline?  There are many ways in
> > > which benchmark numbers can be improved or deteriorated by a PR, and in
> > > some cases that doesn't matter (benchmarks are not always realistic, and
> > > they are not representative of every workload).
> > >
> >
> > I agree that microbenchmark is not always useful, focusing too much on
> > improving microbenchmark result gives me feeling of "overfit" (to some
> > specific microarchitecture, compiler, or use case).
> >
> > > Regards
> > >
> > > Antoine.
> > >

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Wes McKinney <we...@gmail.com>.

Based on some of my performance work recently, I'm growing
uncomfortable with using gcc as the performance baseline since the
results can be significantly different (sometimes 3-4x or more on
certain fast algorithms) from clang and MSVC. The perf results on
https://github.com/apache/arrow/pull/7506 were really surprising --
some benchmarks that showed 2-5x performance improvement on both clang
and MSVC shows small regressions (20-30%) with gcc.

I don't think we need a hard-and-fast rule about whether to accept PRs
based on benchmarks but there are a few guiding criteria:

* How much binary size does the new code add? I think many of us would
agree that a 20% performance increase on some algorithm might not be
worth adding 500KB to libarrow.so
* Is the code generally faster across the major compiler targets (gcc,
clang, MSVC)?

I think that using clang as a baseline for informational benchmarks
would be good, but ultimately we need to be systematically collecting
data on all the major compiilers. Some time ago I proposed building a
Continuous Benchmarking framework
(https://github.com/conbench/conbench/blob/master/doc/REQUIREMENTS.md)
for use with Arrow (and outside of Arrow, too) so I hope that this
will be able to help.

- Wes

On Mon, Jun 22, 2020 at 5:12 AM Yibo Cai <yi...@arm.com> wrote:
>
> On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> >
> > Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> >> There has been significant effort recently trying to optimize our C++
> >> code.  One  thing that seems to come up frequently is different benchmark
> >> results between GCC and Clang.  Even different versions of the same
> >> compiler can yield significantly different results on the same code.
> >>
> >> I would like to propose that we choose a specific compiler and version on
> >> Linux for evaluating performance related PRs.  PRs would only be accepted
> >> if they improve the benchmarks under the selected version.
> >
> > Would this be a hard rule or just a guideline?  There are many ways in
> > which benchmark numbers can be improved or deteriorated by a PR, and in
> > some cases that doesn't matter (benchmarks are not always realistic, and
> > they are not representative of every workload).
> >
>
> I agree that microbenchmark is not always useful, focusing too much on
> improving microbenchmark result gives me feeling of "overfit" (to some
> specific microarchitecture, compiler, or use case).
>
> > Regards
> >
> > Antoine.
> >

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Yibo Cai <yi...@arm.com>.

On 6/22/20 5:07 PM, Antoine Pitrou wrote:
> 
> Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
>> There has been significant effort recently trying to optimize our C++
>> code.  One  thing that seems to come up frequently is different benchmark
>> results between GCC and Clang.  Even different versions of the same
>> compiler can yield significantly different results on the same code.
>>
>> I would like to propose that we choose a specific compiler and version on
>> Linux for evaluating performance related PRs.  PRs would only be accepted
>> if they improve the benchmarks under the selected version.
> 
> Would this be a hard rule or just a guideline?  There are many ways in
> which benchmark numbers can be improved or deteriorated by a PR, and in
> some cases that doesn't matter (benchmarks are not always realistic, and
> they are not representative of every workload).
> 

I agree that microbenchmark is not always useful, focusing too much on
improving microbenchmark result gives me feeling of "overfit" (to some
specific microarchitecture, compiler, or use case).

> Regards
> 
> Antoine.
>

Re: [DISCUSS][C++] Performance work and compiler standardization for linux

Posted by Antoine Pitrou <an...@python.org>.

Le 22/06/2020 à 06:27, Micah Kornfield a écrit :
> There has been significant effort recently trying to optimize our C++
> code.  One  thing that seems to come up frequently is different benchmark
> results between GCC and Clang.  Even different versions of the same
> compiler can yield significantly different results on the same code.
> 
> I would like to propose that we choose a specific compiler and version on
> Linux for evaluating performance related PRs.  PRs would only be accepted
> if they improve the benchmarks under the selected version.

Would this be a hard rule or just a guideline?  There are many ways in
which benchmark numbers can be improved or deteriorated by a PR, and in
some cases that doesn't matter (benchmarks are not always realistic, and
they are not representative of every workload).

Regards

Antoine.