You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mxnet.apache.org by Anton Chernov <me...@gmail.com> on 2018/11/22 12:35:55 UTC

[Discussion] Remove bundled llvm OpenMP

Dear MXNet community,

I would like to drive attention to an important issue that is present in
the MXNet CMake build: usage of bundled llvm OpenMP library.

I have opened a PR to remove it:
https://github.com/apache/incubator-mxnet/pull/12160

The issue was closed, but I am strong in my oppinion that it's the right
thing to do.

*Background*
If you want to use OpenMP pragmas in your code for parallelization you
would supply a special flag to the compiler:

- Clang / -fopenmp
https://openmp.llvm.org/

- GCC / -fopenmp
https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html

- Intel / [Q]openmp
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio: /openmp (Enable OpenMP 2.0 Support)
https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx

Each of the compilers would enable the '#pragma omp' directive during C/C++
compilation and arrange for automatic linking of the OpenMP runtime library
supplied by each complier separately.

Thus, to use the advantages of an OpenMP implementation one has to compile
the code with the corresponding compiler.

Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is
used ([1] and [2]) to replace the OpenMP library supplied by the compiler.

I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library
for Deep Neural Networks):

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
library to work. As different OpenMP runtimes may not be binary compatible
it's important to ensure that only one OpenMP runtime is used throughout
the application. Having more than one OpenMP runtime initialized may lead
to undefined behavior resulting in incorrect results or crashes." [3]

And:

"Using GNU compiler with -fopenmp and -liomp5 options will link the
application with both Intel and GNU OpenMP runtime libraries. This will
lead to undefined behavior of the application." [4]

As can be seen from ldd for MXNet:

$ ldd build/tests/mxnet_unit_tests | grep omp
    libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
(0x00007f697bc55000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
(0x00007f69660cd000)

*Performance*

The only performance data related to OpenMP in MXNet I was able to find is
here:
https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172

Which in my understanding is testing imact of different environment
variables for the same setup (using same bundled OpenMP library).

The libraries may differ in implementation and the Thread Affinity
Interface [5] may have significant impact on performance.

All compliers support it:

- Clang / KMP_AFFINITY
https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp

- GCC / GOMP_CPU_AFFINITY
https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html

- Intel / KMP_AFFINITY
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio / SetThreadAffinityMask
https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask

*Issues*

Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
https://github.com/apache/incubator-mxnet/issues/10856

libomp.so dependency (need REAL fix)
https://github.com/apache/incubator-mxnet/issues/11417

mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
https://github.com/apache/incubator-mxnet/issues/8532

Performance regression when OMP_NUM_THREADS environment variable is not set
https://github.com/apache/incubator-mxnet/issues/9744

Poor concat CPU performance on CUDA builds
https://github.com/apache/incubator-mxnet/issues/11905

I would appreciate hearing your thoughts.


Best
Anton

[1]
https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
[2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
[3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
[4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
[5] https://software.intel.com/en-us/node/522691

RE: [Discussion] Remove bundled llvm OpenMP

Posted by "Lv, Tao A" <ta...@intel.com>.

Thanks for the great summary, Anton. I'm curious that is there anybody builds mxnet successfully with ICC/ICPC?

-----Original Message-----
From: Anton Chernov [mailto:mechernov@gmail.com] 
Sent: Thursday, November 22, 2018 8:36 PM
To: dev@mxnet.apache.org
Subject: [Discussion] Remove bundled llvm OpenMP

Dear MXNet community,

I would like to drive attention to an important issue that is present in the MXNet CMake build: usage of bundled llvm OpenMP library.

I have opened a PR to remove it:
https://github.com/apache/incubator-mxnet/pull/12160

The issue was closed, but I am strong in my oppinion that it's the right thing to do.

*Background*
If you want to use OpenMP pragmas in your code for parallelization you would supply a special flag to the compiler:

- Clang / -fopenmp
https://openmp.llvm.org/

- GCC / -fopenmp
https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html

- Intel / [Q]openmp
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio: /openmp (Enable OpenMP 2.0 Support) https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx

Each of the compilers would enable the '#pragma omp' directive during C/C++ compilation and arrange for automatic linking of the OpenMP runtime library supplied by each complier separately.

Thus, to use the advantages of an OpenMP implementation one has to compile the code with the corresponding compiler.

Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is used ([1] and [2]) to replace the OpenMP library supplied by the compiler.

I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library for Deep Neural Networks):

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime library to work. As different OpenMP runtimes may not be binary compatible it's important to ensure that only one OpenMP runtime is used throughout the application. Having more than one OpenMP runtime initialized may lead to undefined behavior resulting in incorrect results or crashes." [3]

And:

"Using GNU compiler with -fopenmp and -liomp5 options will link the application with both Intel and GNU OpenMP runtime libraries. This will lead to undefined behavior of the application." [4]

As can be seen from ldd for MXNet:

$ ldd build/tests/mxnet_unit_tests | grep omp
    libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
(0x00007f697bc55000)
    libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
(0x00007f69660cd000)

*Performance*

The only performance data related to OpenMP in MXNet I was able to find is
here:
https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172

Which in my understanding is testing imact of different environment variables for the same setup (using same bundled OpenMP library).

The libraries may differ in implementation and the Thread Affinity Interface [5] may have significant impact on performance.

All compliers support it:

- Clang / KMP_AFFINITY
https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp

- GCC / GOMP_CPU_AFFINITY
https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html

- Intel / KMP_AFFINITY
https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1

- Visual Studio / SetThreadAffinityMask
https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask

*Issues*

Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
https://github.com/apache/incubator-mxnet/issues/10856

libomp.so dependency (need REAL fix)
https://github.com/apache/incubator-mxnet/issues/11417

mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
https://github.com/apache/incubator-mxnet/issues/8532

Performance regression when OMP_NUM_THREADS environment variable is not set
https://github.com/apache/incubator-mxnet/issues/9744

Poor concat CPU performance on CUDA builds
https://github.com/apache/incubator-mxnet/issues/11905

I would appreciate hearing your thoughts.

Best
Anton

[1]
https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
[2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
[3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
[4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
[5] https://software.intel.com/en-us/node/522691

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

We are now waiting for a committer's review and merge.

ср, 22 мая 2019 г. в 22:14, Pedro Larroy <pe...@gmail.com>:

> Thanks Aaron and Anton!   Can we rebase to update the PR?  Let me know
> how can I help further if you find some problems.
>
> On Wed, May 22, 2019 at 6:49 AM Aaron Markham <aa...@gmail.com>
> wrote:
> >
> > I reopened it for you.
> >
> > On Wed, May 22, 2019, 05:25 Anton Chernov <me...@gmail.com> wrote:
> >
> > > I don't have necessary rights to reopen this PR.
> > >
> > > пн, 20 мая 2019 г. в 08:00, Pedro Larroy <pedro.larroy.lists@gmail.com
> >:
> > >
> > > > Hi Anton, Stas.
> > > >
> > > > Can we reopen this PR and get it merged as per the data collected by
> > > Stas?
> > > >
> > > > https://github.com/apache/incubator-mxnet/pull/12160
> > > >
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
> > > >
> > > > There are multiple issues that will be fixed by solving this problem.
> > > >
> > > >
> > > > Pedro
> > > >
> > > > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov <me...@gmail.com>
> > > wrote:
> > > > >
> > > > > I would like to propose a possible alternative solution for
> > > > consideration.
> > > > >
> > > > > If keeping llvm OpenMP as a submodule is inevitable one could make
> > > > > following adjustments:
> > > > >
> > > > > Since compilers try to find their own OpenMP library implicitly,
> MXNet
> > > > > needs to ensure that only the bundled version is found. Therefore
> > > during
> > > > > the build and also during deployment this library has to provide
> > > symlinks
> > > > > for each possible compiler that would link to the built artifact
> ie.
> > > > >
> > > > > libiomp.so -> libgomp.so -> libomp.so
> > > > >
> > > > > The MKLML iomp would need to be hidden and removed as well.
> > > > >
> > > > > On Windows it would be a different story, but as can be seen [1]
> > > bundled
> > > > > OpenMP was not included in the Windows build anyway.
> > > > >
> > > > > Alternatively: always use iomp (with same symlinking trick though)
> > > > provided
> > > > > by MKLML distribution [2]. This potentially could work on Windows
> as
> > > > well.
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > > > > [2] https://github.com/intel/mkl-dnn/releases
> > > > >
> > > > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov <me...@gmail.com>:
> > > > >
> > > > > > Recent benchmarking results have been published here [1].
> Experiments
> > > > > > compare different OpenMP implementations as well as binaries
> compiled
> > > > with
> > > > > > different compilers including GCC, Clang and ICC.
> > > > > >
> > > > > > During experimentation another issues with mixing up libraries
> was
> > > > > > identified and described here [2].
> > > > > >
> > > > > > Best
> > > > > > Anton
> > > > > >
> > > > > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > > > > [2]
> > > > > >
> > > >
> > >
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > > > > >
> > > > > >
> > > > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:
> > > > > >
> > > > > >> Hi Chris,
> > > > > >>
> > > > > >> Following up on the issue, are all things resolved in the
> > > discussion?
> > > > > >>
> > > > > >> If yes, I kindly ask you to reopen this PR and remove
> ‘requesting
> > > > > >> changes’ status:
> > > > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > > >>
> > > > > >> Thank you.
> > > > > >>
> > > > > >>
> > > > > >> Best
> > > > > >> Anton
> > > > > >>
> > > > > >>
> > > > > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <
> mechernov@gmail.com>:
> > > > > >>
> > > > > >>> Another thing to take into consideration:
> > > > > >>>
> > > > > >>> All python artefacts that are created (PyPi) are built with
> make
> > > and
> > > > are
> > > > > >>> not using the bundled OpenMP library.
> > > > > >>>
> > > > > >>> One step for the switch to CMake to happen is the approval and
> > > > merging
> > > > > >>> of the mentioned PR:
> > > > > >>>
> > > > > >>> https://github.com/apache/incubator-mxnet/pull/12160
> > > > > >>>
> > > > > >>> If there are no other objections I kindly ask Chris Olivier to
> > > remove
> > > > > >>> his 'requesting changes' veto on it to unblock the CMake
> overhaul
> > > > work.
> > > > > >>>
> > > > > >>> Thank you.
> > > > > >>>
> > > > > >>> Best
> > > > > >>> Anton
> > > > > >>>
> > > > > >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <
> mechernov@gmail.com>:
> > > > > >>>
> > > > > >>>>
> > > > > >>>> Thank you for you answer, Chris.
> > > > > >>>>
> > > > > >>>> > The whole “mixing omp libraries” is something that occurs in
> > > > > >>>> production
> > > > > >>>> every day and certainly in everything that uses mkl.
> > > > > >>>>
> > > > > >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly
> ensures
> > > > that
> > > > > >>>> this mixture is not happening:
> > > > > >>>>
> > > > > >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an
> OpenMP
> > > > > >>>> runtime library to work. As different OpenMP runtimes may not
> be
> > > > binary
> > > > > >>>> compatible it's important to ensure that only one OpenMP
> runtime
> > > is
> > > > used
> > > > > >>>> throughout the application. Having more than one OpenMP
> runtime
> > > > initialized
> > > > > >>>> may lead to undefined behavior resulting in incorrect results
> or
> > > > crashes."
> > > > > >>>> [1]
> > > > > >>>>
> > > > > >>>> That is why 2 different MKLML libraries are provided:
> > > > > >>>>
> > > > > >>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP
> > > > runtime
> > > > > >>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R)
> > > OpenMP
> > > > > >>>> runtime
> > > > > >>>>
> > > > > >>>> > is the suggestion that libiomp be removed from mkl?
> > > > > >>>>
> > > > > >>>> That is certainly not my suggestion.
> > > > > >>>>
> > > > > >>>> > have you spoken with intel? have you consulted Intel at all?
> > > > > >>>>
> > > > > >>>> Yes, I have asked for comments on the issue.
> > > > > >>>>
> > > > > >>>> > “hard to debug random crash”. you’re seeing an assertion
> which
> > > is
> > > > > >>>> probably ...
> > > > > >>>>
> > > > > >>>> I'm seeing the result of undefined behaviour. And I want to
> put
> > > > > >>>> emphasis on the following statement:
> > > > > >>>>
> > > > > >>>> I disregards of whether there is a particular reason for the
> > > assert
> > > > -
> > > > > >>>> it is a result of behaviour that should not happen. There are
> > > valid
> > > > ways
> > > > > >>>> how to use llvm OpenMP in MXNet and the current way is not
> one of
> > > > them.
> > > > > >>>>
> > > > > >>>> > The lack of root-causing the problem and knee-jerk solution
> here
> > > > > >>>> makes me
> > > > > >>>> uncomfortable.
> > > > > >>>>
> > > > > >>>> I hope that my efforts highlighting the problems reach you to
> > > > mitigate
> > > > > >>>> your uncomfort.
> > > > > >>>>
> > > > > >>>> > if you want to see performance differences there’s an
> > > environment
> > > > > >>>> variable
> > > > > >>>> you can set in the mxnet omp tuning code that will print
> overhead
> > > > and
> > > > > >>>> execution times for the current omp library.
> > > > > >>>>
> > > > > >>>> I don't want to see performance differences in the current
> OpenMP
> > > > > >>>> library. I want to remove the current OpenMP library and use
> the
> > > one
> > > > > >>>> provided by the compiler.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>> Best
> > > > > >>>> Anton
> > > > > >>>>
> > > > > >>>> [1]
> > > > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > > > >>>>
> > > > > >>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <
> > > cjolivier01@apache.org
> > > > >:
> > > > > >>>>
> > > > > >>>>> Do you not work on CI mostly? My apologies for thinking that
> was
> > > > some
> > > > > >>>>> sort
> > > > > >>>>> of team effort between you and a few others that were
> passionate
> > > > about
> > > > > >>>>> CI
> > > > > >>>>> keeping the CI system running smoothly.
> > > > > >>>>>
> > > > > >>>>> You have source code, you have the line the assertion is on.
> If
> > > you
> > > > > >>>>> can’t
> > > > > >>>>> describe what’s going wrong that causes the assertion, then I
> > > don’t
> > > > > >>>>> really
> > > > > >>>>> have anything more to add to this conversation beyond what’s
> > > below:
> > > > > >>>>>
> > > > > >>>>> The whole “mixing omp libraries” is something that occurs in
> > > > production
> > > > > >>>>> every day and certainly in everything that uses mkl.  It may
> > > > > >>>>> occasionally
> > > > > >>>>> cause problems for some edge cases when there is
> super-complex
> > > > linking
> > > > > >>>>> strategies and dynamic loading.  But this is not one of those
> > > edge
> > > > > >>>>> cases.
> > > > > >>>>> Mostly blaming this is a red herring for other thread-related
> > > > problems
> > > > > >>>>> and
> > > > > >>>>> people switch omp library and the timing of their code
> changes
> > > and
> > > > they
> > > > > >>>>> stop seeing the problem. I’ve spent my entire career doing
> > > heavily
> > > > > >>>>> multiphreaded c++ development and i’ve seen that a million
> times.
> > > > is
> > > > > >>>>> the
> > > > > >>>>> suggestion that libiomp be removed from mkl? have you spoken
> with
> > > > > >>>>> intel?
> > > > > >>>>> have you consulted Intel at all?
> > > > > >>>>>
> > > > > >>>>> and what you are seeing isn’t some “hard to debug random
> crash”.
> > > > you’re
> > > > > >>>>> seeing an assertion which is probably related to omp trying
> to
> > > > create a
> > > > > >>>>> thread pool after a fork and something was done in the mxnet
> code
> > > > to
> > > > > >>>>> make
> > > > > >>>>> that sketchy to do. I’d suggest filing an issue with the llvm
> > > > openmp
> > > > > >>>>> just
> > > > > >>>>> like you’d file with any other not-well-understood behavior
> in
> > > > mxnet.
> > > > > >>>>>
> > > > > >>>>> The lack of root-causing the problem and knee-jerk solution
> here
> > > > makes
> > > > > >>>>> me
> > > > > >>>>> uncomfortable.
> > > > > >>>>>
> > > > > >>>>> if you want to see performance differences there’s an
> environment
> > > > > >>>>> variable
> > > > > >>>>> you can set in the mxnet omp tuning code that will print
> overhead
> > > > and
> > > > > >>>>> execution times for the current omp library.
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>>
> > > > > >>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <
> > > mechernov@gmail.com
> > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>> > Hi Chris,
> > > > > >>>>> >
> > > > > >>>>> > Thank you for your answer. If you have noticed the initial
> > > email
> > > > > >>>>> comes from
> > > > > >>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal
> is
> > > not
> > > > > >>>>> from any
> > > > > >>>>> > 'Ci' team that you've mentioned, but from me personally.
> > > > > >>>>> >
> > > > > >>>>> > You are writing:
> > > > > >>>>> >
> > > > > >>>>> > > someone is doing something unhealthy when they fork ...
> > > > > >>>>> >
> > > > > >>>>> > I'm missing any context to understand what you mean.
> > > > > >>>>> >
> > > > > >>>>> > > we get a lot of performance gain from OMP ...
> > > > > >>>>> >
> > > > > >>>>> > There is no data that would prove this statement and
> therefore
> > > > it is
> > > > > >>>>> a
> > > > > >>>>> > random guess.
> > > > > >>>>> >
> > > > > >>>>> > > in many months, no investigation has occurred as to WHY
> the
> > > > > >>>>> assertion is
> > > > > >>>>> > failing.
> > > > > >>>>> >
> > > > > >>>>> > The investigation has concluded that this is happening due
> to
> > > > > >>>>> undefined
> > > > > >>>>> > behaviour which is, in my opinion, a suffient answer that
> does
> > > > not
> > > > > >>>>> require
> > > > > >>>>> > to go any deeper.
> > > > > >>>>> >
> > > > > >>>>> > > the pr is vetoed until such a time that the actual root
> cause
> > > > of
> > > > > >>>>> the
> > > > > >>>>> > problem is known.
> > > > > >>>>> >
> > > > > >>>>> > And considering the statements above there is no valid
> reason
> > > to
> > > > > >>>>> veto the
> > > > > >>>>> > PR.
> > > > > >>>>> >
> > > > > >>>>> >
> > > > > >>>>> > Best
> > > > > >>>>> > Anton
> > > > > >>>>> >
> > > > > >>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <
> > > > cjolivier01@gmail.com>:
> > > > > >>>>> >
> > > > > >>>>> > > 3x less overhead*
> > > > > >>>>> > >
> > > > > >>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
> > > > > >>>>> cjolivier01@gmail.com>
> > > > > >>>>> > > wrote:
> > > > > >>>>> > >
> > > > > >>>>> > > > someone is doing something unhealthy when they fork,
> which
> > > is
> > > > > >>>>> causing
> > > > > >>>>> > an
> > > > > >>>>> > > > assertion in the openmp library. the same assertion
> that
> > > > would
> > > > > >>>>> fire in
> > > > > >>>>> > > mkl,
> > > > > >>>>> > > > which is linked to libiomp5 (exact same omp library).
> this
> > > > is new
> > > > > >>>>> > > behavior
> > > > > >>>>> > > > and most likely due to an error or suboptimal approach
> in
> > > the
> > > > > >>>>> forking
> > > > > >>>>> > > logic
> > > > > >>>>> > > > in mxnet.
> > > > > >>>>> > > >
> > > > > >>>>> > > > in order to circumvent the assert, the Ci team is
> proposing
> > > > to
> > > > > >>>>> remove
> > > > > >>>>> > the
> > > > > >>>>> > > > library completely which is equivalent to cutting off
> your
> > > > leg
> > > > > >>>>> to make
> > > > > >>>>> > > the
> > > > > >>>>> > > > pain from stubbing your toe go away.
> > > > > >>>>> > > >
> > > > > >>>>> > > > we get a lot of performance gain from OMP. is has
> about a
> > > 1/3
> > > > > >>>>> less
> > > > > >>>>> > > > overhead for entering omp regions and also supports omp
> > > > regions
> > > > > >>>>> after a
> > > > > >>>>> > > > fork, which libgomp does not.
> > > > > >>>>> > > >
> > > > > >>>>> > > > in many months, no investigation has occurred as to
> WHY the
> > > > > >>>>> assertion
> > > > > >>>>> > is
> > > > > >>>>> > > > failing.
> > > > > >>>>> > > >
> > > > > >>>>> > > > the pr is vetoed until such a time that the actual root
> > > > cause of
> > > > > >>>>> the
> > > > > >>>>> > > > problem is known.
> > > > > >>>>> > > >
> > > > > >>>>> > > >
> > > > > >>>>> > > > thanks,
> > > > > >>>>> > > >
> > > > > >>>>> > > > -Chris.
> > > > > >>>>> > > >
> > > > > >>>>> > > >
> > > > > >>>>> > > >
> > > > > >>>>> > > >
> > > > > >>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
> > > > > >>>>> mechernov@gmail.com>
> > > > > >>>>> > > wrote:
> > > > > >>>>> > > >
> > > > > >>>>> > > >> Dear MXNet community,
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> I would like to drive attention to an important issue
> that
> > > > is
> > > > > >>>>> present
> > > > > >>>>> > in
> > > > > >>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP
> > > library.
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> I have opened a PR to remove it:
> > > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> The issue was closed, but I am strong in my oppinion
> that
> > > > it's
> > > > > >>>>> the
> > > > > >>>>> > right
> > > > > >>>>> > > >> thing to do.
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> *Background*
> > > > > >>>>> > > >> If you want to use OpenMP pragmas in your code for
> > > > > >>>>> parallelization you
> > > > > >>>>> > > >> would supply a special flag to the compiler:
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - Clang / -fopenmp
> > > > > >>>>> > > >> https://openmp.llvm.org/
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - GCC / -fopenmp
> > > > > >>>>> > > >>
> > > https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - Intel / [Q]openmp
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > >
> > >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > > > > >>>>> > > >>
> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Each of the compilers would enable the '#pragma omp'
> > > > directive
> > > > > >>>>> during
> > > > > >>>>> > > >> C/C++
> > > > > >>>>> > > >> compilation and arrange for automatic linking of the
> > > OpenMP
> > > > > >>>>> runtime
> > > > > >>>>> > > >> library
> > > > > >>>>> > > >> supplied by each complier separately.
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Thus, to use the advantages of an OpenMP
> implementation
> > > one
> > > > has
> > > > > >>>>> to
> > > > > >>>>> > > compile
> > > > > >>>>> > > >> the code with the corresponding compiler.
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Currently, in MXNet CMake build scripts a bundled
> version
> > > of
> > > > > >>>>> llvm
> > > > > >>>>> > OpenMP
> > > > > >>>>> > > >> is
> > > > > >>>>> > > >> used ([1] and [2]) to replace the OpenMP library
> supplied
> > > > by the
> > > > > >>>>> > > compiler.
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> I will quote here the README from the MKL-DNN
> (Intel(R)
> > > Math
> > > > > >>>>> Kernel
> > > > > >>>>> > > >> Library
> > > > > >>>>> > > >> for Deep Neural Networks):
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and
> requires
> > > an
> > > > > >>>>> OpenMP
> > > > > >>>>> > > runtime
> > > > > >>>>> > > >> library to work. As different OpenMP runtimes may not
> be
> > > > binary
> > > > > >>>>> > > compatible
> > > > > >>>>> > > >> it's important to ensure that only one OpenMP runtime
> is
> > > > used
> > > > > >>>>> > throughout
> > > > > >>>>> > > >> the application. Having more than one OpenMP runtime
> > > > > >>>>> initialized may
> > > > > >>>>> > > lead
> > > > > >>>>> > > >> to undefined behavior resulting in incorrect results
> or
> > > > > >>>>> crashes." [3]
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> And:
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options
> will
> > > > link
> > > > > >>>>> the
> > > > > >>>>> > > >> application with both Intel and GNU OpenMP runtime
> > > > libraries.
> > > > > >>>>> This
> > > > > >>>>> > will
> > > > > >>>>> > > >> lead to undefined behavior of the application." [4]
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> As can be seen from ldd for MXNet:
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > > > > >>>>> > > >>     libomp.so =>
> > > > > >>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > > > > >>>>> > > >> (0x00007f697bc55000)
> > > > > >>>>> > > >>     libgomp.so.1 =>
> /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > > > > >>>>> > > >> (0x00007f69660cd000)
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> *Performance*
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> The only performance data related to OpenMP in MXNet
> I was
> > > > able
> > > > > >>>>> to
> > > > > >>>>> > find
> > > > > >>>>> > > is
> > > > > >>>>> > > >> here:
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > >
> > >
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Which in my understanding is testing imact of
> different
> > > > > >>>>> environment
> > > > > >>>>> > > >> variables for the same setup (using same bundled
> OpenMP
> > > > > >>>>> library).
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> The libraries may differ in implementation and the
> Thread
> > > > > >>>>> Affinity
> > > > > >>>>> > > >> Interface [5] may have significant impact on
> performance.
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> All compliers support it:
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - Clang / KMP_AFFINITY
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > >
> > >
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - GCC / GOMP_CPU_AFFINITY
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > >
> > >
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - Intel / KMP_AFFINITY
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > >
> > >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> - Visual Studio / SetThreadAffinityMask
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > >
> > >
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> *Issues*
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Failed OpenMP assertion when loading MXNet compiled
> with
> > > > DEBUG=1
> > > > > >>>>> > > >>
> https://github.com/apache/incubator-mxnet/issues/10856
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> libomp.so dependency (need REAL fix)
> > > > > >>>>> > > >>
> https://github.com/apache/incubator-mxnet/issues/11417
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed)
> > > numpy
> > > > > >>>>> with MKL
> > > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Performance regression when OMP_NUM_THREADS
> environment
> > > > > >>>>> variable is
> > > > > >>>>> > not
> > > > > >>>>> > > >> set
> > > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Poor concat CPU performance on CUDA builds
> > > > > >>>>> > > >>
> https://github.com/apache/incubator-mxnet/issues/11905
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> I would appreciate hearing your thoughts.
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> Best
> > > > > >>>>> > > >> Anton
> > > > > >>>>> > > >>
> > > > > >>>>> > > >> [1]
> > > > > >>>>> > > >>
> > > > > >>>>> > > >>
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > >
> > >
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > > > > >>>>> > > >> [2]
> > > > > >>>>>
> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > > > > >>>>> > > >> [3]
> > > > > >>>>>
> > > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > > > >>>>> > > >> [4]
> > > > > >>>>>
> > > https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > > > > >>>>> > > >> [5] https://software.intel.com/en-us/node/522691
> > > > > >>>>> > > >>
> > > > > >>>>> > > >
> > > > > >>>>> > >
> > > > > >>>>> >
> > > > > >>>>>
> > > > > >>>>
> > > >
> > >
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Pedro Larroy <pe...@gmail.com>.

Thanks Aaron and Anton!   Can we rebase to update the PR?  Let me know
how can I help further if you find some problems.

On Wed, May 22, 2019 at 6:49 AM Aaron Markham <aa...@gmail.com> wrote:
>
> I reopened it for you.
>
> On Wed, May 22, 2019, 05:25 Anton Chernov <me...@gmail.com> wrote:
>
> > I don't have necessary rights to reopen this PR.
> >
> > пн, 20 мая 2019 г. в 08:00, Pedro Larroy <pe...@gmail.com>:
> >
> > > Hi Anton, Stas.
> > >
> > > Can we reopen this PR and get it merged as per the data collected by
> > Stas?
> > >
> > > https://github.com/apache/incubator-mxnet/pull/12160
> > >
> > >
> > >
> > https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
> > >
> > > There are multiple issues that will be fixed by solving this problem.
> > >
> > >
> > > Pedro
> > >
> > > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov <me...@gmail.com>
> > wrote:
> > > >
> > > > I would like to propose a possible alternative solution for
> > > consideration.
> > > >
> > > > If keeping llvm OpenMP as a submodule is inevitable one could make
> > > > following adjustments:
> > > >
> > > > Since compilers try to find their own OpenMP library implicitly, MXNet
> > > > needs to ensure that only the bundled version is found. Therefore
> > during
> > > > the build and also during deployment this library has to provide
> > symlinks
> > > > for each possible compiler that would link to the built artifact ie.
> > > >
> > > > libiomp.so -> libgomp.so -> libomp.so
> > > >
> > > > The MKLML iomp would need to be hidden and removed as well.
> > > >
> > > > On Windows it would be a different story, but as can be seen [1]
> > bundled
> > > > OpenMP was not included in the Windows build anyway.
> > > >
> > > > Alternatively: always use iomp (with same symlinking trick though)
> > > provided
> > > > by MKLML distribution [2]. This potentially could work on Windows as
> > > well.
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > [1]
> > > >
> > >
> > https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > > > [2] https://github.com/intel/mkl-dnn/releases
> > > >
> > > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov <me...@gmail.com>:
> > > >
> > > > > Recent benchmarking results have been published here [1]. Experiments
> > > > > compare different OpenMP implementations as well as binaries compiled
> > > with
> > > > > different compilers including GCC, Clang and ICC.
> > > > >
> > > > > During experimentation another issues with mixing up libraries was
> > > > > identified and described here [2].
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > > > [2]
> > > > >
> > >
> > https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > > > >
> > > > >
> > > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:
> > > > >
> > > > >> Hi Chris,
> > > > >>
> > > > >> Following up on the issue, are all things resolved in the
> > discussion?
> > > > >>
> > > > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> > > > >> changes’ status:
> > > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>
> > > > >> Thank you.
> > > > >>
> > > > >>
> > > > >> Best
> > > > >> Anton
> > > > >>
> > > > >>
> > > > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <me...@gmail.com>:
> > > > >>
> > > > >>> Another thing to take into consideration:
> > > > >>>
> > > > >>> All python artefacts that are created (PyPi) are built with make
> > and
> > > are
> > > > >>> not using the bundled OpenMP library.
> > > > >>>
> > > > >>> One step for the switch to CMake to happen is the approval and
> > > merging
> > > > >>> of the mentioned PR:
> > > > >>>
> > > > >>> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>>
> > > > >>> If there are no other objections I kindly ask Chris Olivier to
> > remove
> > > > >>> his 'requesting changes' veto on it to unblock the CMake overhaul
> > > work.
> > > > >>>
> > > > >>> Thank you.
> > > > >>>
> > > > >>> Best
> > > > >>> Anton
> > > > >>>
> > > > >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:
> > > > >>>
> > > > >>>>
> > > > >>>> Thank you for you answer, Chris.
> > > > >>>>
> > > > >>>> > The whole “mixing omp libraries” is something that occurs in
> > > > >>>> production
> > > > >>>> every day and certainly in everything that uses mkl.
> > > > >>>>
> > > > >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures
> > > that
> > > > >>>> this mixture is not happening:
> > > > >>>>
> > > > >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> > > > >>>> runtime library to work. As different OpenMP runtimes may not be
> > > binary
> > > > >>>> compatible it's important to ensure that only one OpenMP runtime
> > is
> > > used
> > > > >>>> throughout the application. Having more than one OpenMP runtime
> > > initialized
> > > > >>>> may lead to undefined behavior resulting in incorrect results or
> > > crashes."
> > > > >>>> [1]
> > > > >>>>
> > > > >>>> That is why 2 different MKLML libraries are provided:
> > > > >>>>
> > > > >>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP
> > > runtime
> > > > >>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R)
> > OpenMP
> > > > >>>> runtime
> > > > >>>>
> > > > >>>> > is the suggestion that libiomp be removed from mkl?
> > > > >>>>
> > > > >>>> That is certainly not my suggestion.
> > > > >>>>
> > > > >>>> > have you spoken with intel? have you consulted Intel at all?
> > > > >>>>
> > > > >>>> Yes, I have asked for comments on the issue.
> > > > >>>>
> > > > >>>> > “hard to debug random crash”. you’re seeing an assertion which
> > is
> > > > >>>> probably ...
> > > > >>>>
> > > > >>>> I'm seeing the result of undefined behaviour. And I want to put
> > > > >>>> emphasis on the following statement:
> > > > >>>>
> > > > >>>> I disregards of whether there is a particular reason for the
> > assert
> > > -
> > > > >>>> it is a result of behaviour that should not happen. There are
> > valid
> > > ways
> > > > >>>> how to use llvm OpenMP in MXNet and the current way is not one of
> > > them.
> > > > >>>>
> > > > >>>> > The lack of root-causing the problem and knee-jerk solution here
> > > > >>>> makes me
> > > > >>>> uncomfortable.
> > > > >>>>
> > > > >>>> I hope that my efforts highlighting the problems reach you to
> > > mitigate
> > > > >>>> your uncomfort.
> > > > >>>>
> > > > >>>> > if you want to see performance differences there’s an
> > environment
> > > > >>>> variable
> > > > >>>> you can set in the mxnet omp tuning code that will print overhead
> > > and
> > > > >>>> execution times for the current omp library.
> > > > >>>>
> > > > >>>> I don't want to see performance differences in the current OpenMP
> > > > >>>> library. I want to remove the current OpenMP library and use the
> > one
> > > > >>>> provided by the compiler.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Best
> > > > >>>> Anton
> > > > >>>>
> > > > >>>> [1]
> > > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > > >>>>
> > > > >>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <
> > cjolivier01@apache.org
> > > >:
> > > > >>>>
> > > > >>>>> Do you not work on CI mostly? My apologies for thinking that was
> > > some
> > > > >>>>> sort
> > > > >>>>> of team effort between you and a few others that were passionate
> > > about
> > > > >>>>> CI
> > > > >>>>> keeping the CI system running smoothly.
> > > > >>>>>
> > > > >>>>> You have source code, you have the line the assertion is on. If
> > you
> > > > >>>>> can’t
> > > > >>>>> describe what’s going wrong that causes the assertion, then I
> > don’t
> > > > >>>>> really
> > > > >>>>> have anything more to add to this conversation beyond what’s
> > below:
> > > > >>>>>
> > > > >>>>> The whole “mixing omp libraries” is something that occurs in
> > > production
> > > > >>>>> every day and certainly in everything that uses mkl.  It may
> > > > >>>>> occasionally
> > > > >>>>> cause problems for some edge cases when there is super-complex
> > > linking
> > > > >>>>> strategies and dynamic loading.  But this is not one of those
> > edge
> > > > >>>>> cases.
> > > > >>>>> Mostly blaming this is a red herring for other thread-related
> > > problems
> > > > >>>>> and
> > > > >>>>> people switch omp library and the timing of their code changes
> > and
> > > they
> > > > >>>>> stop seeing the problem. I’ve spent my entire career doing
> > heavily
> > > > >>>>> multiphreaded c++ development and i’ve seen that a million times.
> > > is
> > > > >>>>> the
> > > > >>>>> suggestion that libiomp be removed from mkl? have you spoken with
> > > > >>>>> intel?
> > > > >>>>> have you consulted Intel at all?
> > > > >>>>>
> > > > >>>>> and what you are seeing isn’t some “hard to debug random crash”.
> > > you’re
> > > > >>>>> seeing an assertion which is probably related to omp trying to
> > > create a
> > > > >>>>> thread pool after a fork and something was done in the mxnet code
> > > to
> > > > >>>>> make
> > > > >>>>> that sketchy to do. I’d suggest filing an issue with the llvm
> > > openmp
> > > > >>>>> just
> > > > >>>>> like you’d file with any other not-well-understood behavior in
> > > mxnet.
> > > > >>>>>
> > > > >>>>> The lack of root-causing the problem and knee-jerk solution here
> > > makes
> > > > >>>>> me
> > > > >>>>> uncomfortable.
> > > > >>>>>
> > > > >>>>> if you want to see performance differences there’s an environment
> > > > >>>>> variable
> > > > >>>>> you can set in the mxnet omp tuning code that will print overhead
> > > and
> > > > >>>>> execution times for the current omp library.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <
> > mechernov@gmail.com
> > > >
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>> > Hi Chris,
> > > > >>>>> >
> > > > >>>>> > Thank you for your answer. If you have noticed the initial
> > email
> > > > >>>>> comes from
> > > > >>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is
> > not
> > > > >>>>> from any
> > > > >>>>> > 'Ci' team that you've mentioned, but from me personally.
> > > > >>>>> >
> > > > >>>>> > You are writing:
> > > > >>>>> >
> > > > >>>>> > > someone is doing something unhealthy when they fork ...
> > > > >>>>> >
> > > > >>>>> > I'm missing any context to understand what you mean.
> > > > >>>>> >
> > > > >>>>> > > we get a lot of performance gain from OMP ...
> > > > >>>>> >
> > > > >>>>> > There is no data that would prove this statement and therefore
> > > it is
> > > > >>>>> a
> > > > >>>>> > random guess.
> > > > >>>>> >
> > > > >>>>> > > in many months, no investigation has occurred as to WHY the
> > > > >>>>> assertion is
> > > > >>>>> > failing.
> > > > >>>>> >
> > > > >>>>> > The investigation has concluded that this is happening due to
> > > > >>>>> undefined
> > > > >>>>> > behaviour which is, in my opinion, a suffient answer that does
> > > not
> > > > >>>>> require
> > > > >>>>> > to go any deeper.
> > > > >>>>> >
> > > > >>>>> > > the pr is vetoed until such a time that the actual root cause
> > > of
> > > > >>>>> the
> > > > >>>>> > problem is known.
> > > > >>>>> >
> > > > >>>>> > And considering the statements above there is no valid reason
> > to
> > > > >>>>> veto the
> > > > >>>>> > PR.
> > > > >>>>> >
> > > > >>>>> >
> > > > >>>>> > Best
> > > > >>>>> > Anton
> > > > >>>>> >
> > > > >>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <
> > > cjolivier01@gmail.com>:
> > > > >>>>> >
> > > > >>>>> > > 3x less overhead*
> > > > >>>>> > >
> > > > >>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
> > > > >>>>> cjolivier01@gmail.com>
> > > > >>>>> > > wrote:
> > > > >>>>> > >
> > > > >>>>> > > > someone is doing something unhealthy when they fork, which
> > is
> > > > >>>>> causing
> > > > >>>>> > an
> > > > >>>>> > > > assertion in the openmp library. the same assertion that
> > > would
> > > > >>>>> fire in
> > > > >>>>> > > mkl,
> > > > >>>>> > > > which is linked to libiomp5 (exact same omp library). this
> > > is new
> > > > >>>>> > > behavior
> > > > >>>>> > > > and most likely due to an error or suboptimal approach in
> > the
> > > > >>>>> forking
> > > > >>>>> > > logic
> > > > >>>>> > > > in mxnet.
> > > > >>>>> > > >
> > > > >>>>> > > > in order to circumvent the assert, the Ci team is proposing
> > > to
> > > > >>>>> remove
> > > > >>>>> > the
> > > > >>>>> > > > library completely which is equivalent to cutting off your
> > > leg
> > > > >>>>> to make
> > > > >>>>> > > the
> > > > >>>>> > > > pain from stubbing your toe go away.
> > > > >>>>> > > >
> > > > >>>>> > > > we get a lot of performance gain from OMP. is has about a
> > 1/3
> > > > >>>>> less
> > > > >>>>> > > > overhead for entering omp regions and also supports omp
> > > regions
> > > > >>>>> after a
> > > > >>>>> > > > fork, which libgomp does not.
> > > > >>>>> > > >
> > > > >>>>> > > > in many months, no investigation has occurred as to WHY the
> > > > >>>>> assertion
> > > > >>>>> > is
> > > > >>>>> > > > failing.
> > > > >>>>> > > >
> > > > >>>>> > > > the pr is vetoed until such a time that the actual root
> > > cause of
> > > > >>>>> the
> > > > >>>>> > > > problem is known.
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > > thanks,
> > > > >>>>> > > >
> > > > >>>>> > > > -Chris.
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
> > > > >>>>> mechernov@gmail.com>
> > > > >>>>> > > wrote:
> > > > >>>>> > > >
> > > > >>>>> > > >> Dear MXNet community,
> > > > >>>>> > > >>
> > > > >>>>> > > >> I would like to drive attention to an important issue that
> > > is
> > > > >>>>> present
> > > > >>>>> > in
> > > > >>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP
> > library.
> > > > >>>>> > > >>
> > > > >>>>> > > >> I have opened a PR to remove it:
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>>>> > > >>
> > > > >>>>> > > >> The issue was closed, but I am strong in my oppinion that
> > > it's
> > > > >>>>> the
> > > > >>>>> > right
> > > > >>>>> > > >> thing to do.
> > > > >>>>> > > >>
> > > > >>>>> > > >> *Background*
> > > > >>>>> > > >> If you want to use OpenMP pragmas in your code for
> > > > >>>>> parallelization you
> > > > >>>>> > > >> would supply a special flag to the compiler:
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Clang / -fopenmp
> > > > >>>>> > > >> https://openmp.llvm.org/
> > > > >>>>> > > >>
> > > > >>>>> > > >> - GCC / -fopenmp
> > > > >>>>> > > >>
> > https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Intel / [Q]openmp
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > > > >>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > > > >>>>> > > >>
> > > > >>>>> > > >> Each of the compilers would enable the '#pragma omp'
> > > directive
> > > > >>>>> during
> > > > >>>>> > > >> C/C++
> > > > >>>>> > > >> compilation and arrange for automatic linking of the
> > OpenMP
> > > > >>>>> runtime
> > > > >>>>> > > >> library
> > > > >>>>> > > >> supplied by each complier separately.
> > > > >>>>> > > >>
> > > > >>>>> > > >> Thus, to use the advantages of an OpenMP implementation
> > one
> > > has
> > > > >>>>> to
> > > > >>>>> > > compile
> > > > >>>>> > > >> the code with the corresponding compiler.
> > > > >>>>> > > >>
> > > > >>>>> > > >> Currently, in MXNet CMake build scripts a bundled version
> > of
> > > > >>>>> llvm
> > > > >>>>> > OpenMP
> > > > >>>>> > > >> is
> > > > >>>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied
> > > by the
> > > > >>>>> > > compiler.
> > > > >>>>> > > >>
> > > > >>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R)
> > Math
> > > > >>>>> Kernel
> > > > >>>>> > > >> Library
> > > > >>>>> > > >> for Deep Neural Networks):
> > > > >>>>> > > >>
> > > > >>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires
> > an
> > > > >>>>> OpenMP
> > > > >>>>> > > runtime
> > > > >>>>> > > >> library to work. As different OpenMP runtimes may not be
> > > binary
> > > > >>>>> > > compatible
> > > > >>>>> > > >> it's important to ensure that only one OpenMP runtime is
> > > used
> > > > >>>>> > throughout
> > > > >>>>> > > >> the application. Having more than one OpenMP runtime
> > > > >>>>> initialized may
> > > > >>>>> > > lead
> > > > >>>>> > > >> to undefined behavior resulting in incorrect results or
> > > > >>>>> crashes." [3]
> > > > >>>>> > > >>
> > > > >>>>> > > >> And:
> > > > >>>>> > > >>
> > > > >>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will
> > > link
> > > > >>>>> the
> > > > >>>>> > > >> application with both Intel and GNU OpenMP runtime
> > > libraries.
> > > > >>>>> This
> > > > >>>>> > will
> > > > >>>>> > > >> lead to undefined behavior of the application." [4]
> > > > >>>>> > > >>
> > > > >>>>> > > >> As can be seen from ldd for MXNet:
> > > > >>>>> > > >>
> > > > >>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > > > >>>>> > > >>     libomp.so =>
> > > > >>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > > > >>>>> > > >> (0x00007f697bc55000)
> > > > >>>>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > > > >>>>> > > >> (0x00007f69660cd000)
> > > > >>>>> > > >>
> > > > >>>>> > > >> *Performance*
> > > > >>>>> > > >>
> > > > >>>>> > > >> The only performance data related to OpenMP in MXNet I was
> > > able
> > > > >>>>> to
> > > > >>>>> > find
> > > > >>>>> > > is
> > > > >>>>> > > >> here:
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> > https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > > > >>>>> > > >>
> > > > >>>>> > > >> Which in my understanding is testing imact of different
> > > > >>>>> environment
> > > > >>>>> > > >> variables for the same setup (using same bundled OpenMP
> > > > >>>>> library).
> > > > >>>>> > > >>
> > > > >>>>> > > >> The libraries may differ in implementation and the Thread
> > > > >>>>> Affinity
> > > > >>>>> > > >> Interface [5] may have significant impact on performance.
> > > > >>>>> > > >>
> > > > >>>>> > > >> All compliers support it:
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Clang / KMP_AFFINITY
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> > https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > > > >>>>> > > >>
> > > > >>>>> > > >> - GCC / GOMP_CPU_AFFINITY
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> > https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Intel / KMP_AFFINITY
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> > https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Visual Studio / SetThreadAffinityMask
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> > https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > > > >>>>> > > >>
> > > > >>>>> > > >> *Issues*
> > > > >>>>> > > >>
> > > > >>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with
> > > DEBUG=1
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
> > > > >>>>> > > >>
> > > > >>>>> > > >> libomp.so dependency (need REAL fix)
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
> > > > >>>>> > > >>
> > > > >>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed)
> > numpy
> > > > >>>>> with MKL
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
> > > > >>>>> > > >>
> > > > >>>>> > > >> Performance regression when OMP_NUM_THREADS environment
> > > > >>>>> variable is
> > > > >>>>> > not
> > > > >>>>> > > >> set
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
> > > > >>>>> > > >>
> > > > >>>>> > > >> Poor concat CPU performance on CUDA builds
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
> > > > >>>>> > > >>
> > > > >>>>> > > >> I would appreciate hearing your thoughts.
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > > >> Best
> > > > >>>>> > > >> Anton
> > > > >>>>> > > >>
> > > > >>>>> > > >> [1]
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> > https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > > > >>>>> > > >> [2]
> > > > >>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > > > >>>>> > > >> [3]
> > > > >>>>>
> > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > > >>>>> > > >> [4]
> > > > >>>>>
> > https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > > > >>>>> > > >> [5] https://software.intel.com/en-us/node/522691
> > > > >>>>> > > >>
> > > > >>>>> > > >
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > > > >>>>
> > >
> >

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

Great! Thank you, Aaron. I have rebased it.

ср, 22 мая 2019 г. в 15:49, Aaron Markham <aa...@gmail.com>:

> I reopened it for you.
>
> On Wed, May 22, 2019, 05:25 Anton Chernov <me...@gmail.com> wrote:
>
> > I don't have necessary rights to reopen this PR.
> >
> > пн, 20 мая 2019 г. в 08:00, Pedro Larroy <pe...@gmail.com>:
> >
> > > Hi Anton, Stas.
> > >
> > > Can we reopen this PR and get it merged as per the data collected by
> > Stas?
> > >
> > > https://github.com/apache/incubator-mxnet/pull/12160
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
> > >
> > > There are multiple issues that will be fixed by solving this problem.
> > >
> > >
> > > Pedro
> > >
> > > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov <me...@gmail.com>
> > wrote:
> > > >
> > > > I would like to propose a possible alternative solution for
> > > consideration.
> > > >
> > > > If keeping llvm OpenMP as a submodule is inevitable one could make
> > > > following adjustments:
> > > >
> > > > Since compilers try to find their own OpenMP library implicitly,
> MXNet
> > > > needs to ensure that only the bundled version is found. Therefore
> > during
> > > > the build and also during deployment this library has to provide
> > symlinks
> > > > for each possible compiler that would link to the built artifact ie.
> > > >
> > > > libiomp.so -> libgomp.so -> libomp.so
> > > >
> > > > The MKLML iomp would need to be hidden and removed as well.
> > > >
> > > > On Windows it would be a different story, but as can be seen [1]
> > bundled
> > > > OpenMP was not included in the Windows build anyway.
> > > >
> > > > Alternatively: always use iomp (with same symlinking trick though)
> > > provided
> > > > by MKLML distribution [2]. This potentially could work on Windows as
> > > well.
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > [1]
> > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > > > [2] https://github.com/intel/mkl-dnn/releases
> > > >
> > > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov <me...@gmail.com>:
> > > >
> > > > > Recent benchmarking results have been published here [1].
> Experiments
> > > > > compare different OpenMP implementations as well as binaries
> compiled
> > > with
> > > > > different compilers including GCC, Clang and ICC.
> > > > >
> > > > > During experimentation another issues with mixing up libraries was
> > > > > identified and described here [2].
> > > > >
> > > > > Best
> > > > > Anton
> > > > >
> > > > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > > > [2]
> > > > >
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > > > >
> > > > >
> > > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:
> > > > >
> > > > >> Hi Chris,
> > > > >>
> > > > >> Following up on the issue, are all things resolved in the
> > discussion?
> > > > >>
> > > > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> > > > >> changes’ status:
> > > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>
> > > > >> Thank you.
> > > > >>
> > > > >>
> > > > >> Best
> > > > >> Anton
> > > > >>
> > > > >>
> > > > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <mechernov@gmail.com
> >:
> > > > >>
> > > > >>> Another thing to take into consideration:
> > > > >>>
> > > > >>> All python artefacts that are created (PyPi) are built with make
> > and
> > > are
> > > > >>> not using the bundled OpenMP library.
> > > > >>>
> > > > >>> One step for the switch to CMake to happen is the approval and
> > > merging
> > > > >>> of the mentioned PR:
> > > > >>>
> > > > >>> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>>
> > > > >>> If there are no other objections I kindly ask Chris Olivier to
> > remove
> > > > >>> his 'requesting changes' veto on it to unblock the CMake overhaul
> > > work.
> > > > >>>
> > > > >>> Thank you.
> > > > >>>
> > > > >>> Best
> > > > >>> Anton
> > > > >>>
> > > > >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <mechernov@gmail.com
> >:
> > > > >>>
> > > > >>>>
> > > > >>>> Thank you for you answer, Chris.
> > > > >>>>
> > > > >>>> > The whole “mixing omp libraries” is something that occurs in
> > > > >>>> production
> > > > >>>> every day and certainly in everything that uses mkl.
> > > > >>>>
> > > > >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly
> ensures
> > > that
> > > > >>>> this mixture is not happening:
> > > > >>>>
> > > > >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an
> OpenMP
> > > > >>>> runtime library to work. As different OpenMP runtimes may not be
> > > binary
> > > > >>>> compatible it's important to ensure that only one OpenMP runtime
> > is
> > > used
> > > > >>>> throughout the application. Having more than one OpenMP runtime
> > > initialized
> > > > >>>> may lead to undefined behavior resulting in incorrect results or
> > > crashes."
> > > > >>>> [1]
> > > > >>>>
> > > > >>>> That is why 2 different MKLML libraries are provided:
> > > > >>>>
> > > > >>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP
> > > runtime
> > > > >>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R)
> > OpenMP
> > > > >>>> runtime
> > > > >>>>
> > > > >>>> > is the suggestion that libiomp be removed from mkl?
> > > > >>>>
> > > > >>>> That is certainly not my suggestion.
> > > > >>>>
> > > > >>>> > have you spoken with intel? have you consulted Intel at all?
> > > > >>>>
> > > > >>>> Yes, I have asked for comments on the issue.
> > > > >>>>
> > > > >>>> > “hard to debug random crash”. you’re seeing an assertion which
> > is
> > > > >>>> probably ...
> > > > >>>>
> > > > >>>> I'm seeing the result of undefined behaviour. And I want to put
> > > > >>>> emphasis on the following statement:
> > > > >>>>
> > > > >>>> I disregards of whether there is a particular reason for the
> > assert
> > > -
> > > > >>>> it is a result of behaviour that should not happen. There are
> > valid
> > > ways
> > > > >>>> how to use llvm OpenMP in MXNet and the current way is not one
> of
> > > them.
> > > > >>>>
> > > > >>>> > The lack of root-causing the problem and knee-jerk solution
> here
> > > > >>>> makes me
> > > > >>>> uncomfortable.
> > > > >>>>
> > > > >>>> I hope that my efforts highlighting the problems reach you to
> > > mitigate
> > > > >>>> your uncomfort.
> > > > >>>>
> > > > >>>> > if you want to see performance differences there’s an
> > environment
> > > > >>>> variable
> > > > >>>> you can set in the mxnet omp tuning code that will print
> overhead
> > > and
> > > > >>>> execution times for the current omp library.
> > > > >>>>
> > > > >>>> I don't want to see performance differences in the current
> OpenMP
> > > > >>>> library. I want to remove the current OpenMP library and use the
> > one
> > > > >>>> provided by the compiler.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> Best
> > > > >>>> Anton
> > > > >>>>
> > > > >>>> [1]
> > > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > > >>>>
> > > > >>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <
> > cjolivier01@apache.org
> > > >:
> > > > >>>>
> > > > >>>>> Do you not work on CI mostly? My apologies for thinking that
> was
> > > some
> > > > >>>>> sort
> > > > >>>>> of team effort between you and a few others that were
> passionate
> > > about
> > > > >>>>> CI
> > > > >>>>> keeping the CI system running smoothly.
> > > > >>>>>
> > > > >>>>> You have source code, you have the line the assertion is on. If
> > you
> > > > >>>>> can’t
> > > > >>>>> describe what’s going wrong that causes the assertion, then I
> > don’t
> > > > >>>>> really
> > > > >>>>> have anything more to add to this conversation beyond what’s
> > below:
> > > > >>>>>
> > > > >>>>> The whole “mixing omp libraries” is something that occurs in
> > > production
> > > > >>>>> every day and certainly in everything that uses mkl.  It may
> > > > >>>>> occasionally
> > > > >>>>> cause problems for some edge cases when there is super-complex
> > > linking
> > > > >>>>> strategies and dynamic loading.  But this is not one of those
> > edge
> > > > >>>>> cases.
> > > > >>>>> Mostly blaming this is a red herring for other thread-related
> > > problems
> > > > >>>>> and
> > > > >>>>> people switch omp library and the timing of their code changes
> > and
> > > they
> > > > >>>>> stop seeing the problem. I’ve spent my entire career doing
> > heavily
> > > > >>>>> multiphreaded c++ development and i’ve seen that a million
> times.
> > > is
> > > > >>>>> the
> > > > >>>>> suggestion that libiomp be removed from mkl? have you spoken
> with
> > > > >>>>> intel?
> > > > >>>>> have you consulted Intel at all?
> > > > >>>>>
> > > > >>>>> and what you are seeing isn’t some “hard to debug random
> crash”.
> > > you’re
> > > > >>>>> seeing an assertion which is probably related to omp trying to
> > > create a
> > > > >>>>> thread pool after a fork and something was done in the mxnet
> code
> > > to
> > > > >>>>> make
> > > > >>>>> that sketchy to do. I’d suggest filing an issue with the llvm
> > > openmp
> > > > >>>>> just
> > > > >>>>> like you’d file with any other not-well-understood behavior in
> > > mxnet.
> > > > >>>>>
> > > > >>>>> The lack of root-causing the problem and knee-jerk solution
> here
> > > makes
> > > > >>>>> me
> > > > >>>>> uncomfortable.
> > > > >>>>>
> > > > >>>>> if you want to see performance differences there’s an
> environment
> > > > >>>>> variable
> > > > >>>>> you can set in the mxnet omp tuning code that will print
> overhead
> > > and
> > > > >>>>> execution times for the current omp library.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <
> > mechernov@gmail.com
> > > >
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>> > Hi Chris,
> > > > >>>>> >
> > > > >>>>> > Thank you for your answer. If you have noticed the initial
> > email
> > > > >>>>> comes from
> > > > >>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is
> > not
> > > > >>>>> from any
> > > > >>>>> > 'Ci' team that you've mentioned, but from me personally.
> > > > >>>>> >
> > > > >>>>> > You are writing:
> > > > >>>>> >
> > > > >>>>> > > someone is doing something unhealthy when they fork ...
> > > > >>>>> >
> > > > >>>>> > I'm missing any context to understand what you mean.
> > > > >>>>> >
> > > > >>>>> > > we get a lot of performance gain from OMP ...
> > > > >>>>> >
> > > > >>>>> > There is no data that would prove this statement and
> therefore
> > > it is
> > > > >>>>> a
> > > > >>>>> > random guess.
> > > > >>>>> >
> > > > >>>>> > > in many months, no investigation has occurred as to WHY the
> > > > >>>>> assertion is
> > > > >>>>> > failing.
> > > > >>>>> >
> > > > >>>>> > The investigation has concluded that this is happening due to
> > > > >>>>> undefined
> > > > >>>>> > behaviour which is, in my opinion, a suffient answer that
> does
> > > not
> > > > >>>>> require
> > > > >>>>> > to go any deeper.
> > > > >>>>> >
> > > > >>>>> > > the pr is vetoed until such a time that the actual root
> cause
> > > of
> > > > >>>>> the
> > > > >>>>> > problem is known.
> > > > >>>>> >
> > > > >>>>> > And considering the statements above there is no valid reason
> > to
> > > > >>>>> veto the
> > > > >>>>> > PR.
> > > > >>>>> >
> > > > >>>>> >
> > > > >>>>> > Best
> > > > >>>>> > Anton
> > > > >>>>> >
> > > > >>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <
> > > cjolivier01@gmail.com>:
> > > > >>>>> >
> > > > >>>>> > > 3x less overhead*
> > > > >>>>> > >
> > > > >>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
> > > > >>>>> cjolivier01@gmail.com>
> > > > >>>>> > > wrote:
> > > > >>>>> > >
> > > > >>>>> > > > someone is doing something unhealthy when they fork,
> which
> > is
> > > > >>>>> causing
> > > > >>>>> > an
> > > > >>>>> > > > assertion in the openmp library. the same assertion that
> > > would
> > > > >>>>> fire in
> > > > >>>>> > > mkl,
> > > > >>>>> > > > which is linked to libiomp5 (exact same omp library).
> this
> > > is new
> > > > >>>>> > > behavior
> > > > >>>>> > > > and most likely due to an error or suboptimal approach in
> > the
> > > > >>>>> forking
> > > > >>>>> > > logic
> > > > >>>>> > > > in mxnet.
> > > > >>>>> > > >
> > > > >>>>> > > > in order to circumvent the assert, the Ci team is
> proposing
> > > to
> > > > >>>>> remove
> > > > >>>>> > the
> > > > >>>>> > > > library completely which is equivalent to cutting off
> your
> > > leg
> > > > >>>>> to make
> > > > >>>>> > > the
> > > > >>>>> > > > pain from stubbing your toe go away.
> > > > >>>>> > > >
> > > > >>>>> > > > we get a lot of performance gain from OMP. is has about a
> > 1/3
> > > > >>>>> less
> > > > >>>>> > > > overhead for entering omp regions and also supports omp
> > > regions
> > > > >>>>> after a
> > > > >>>>> > > > fork, which libgomp does not.
> > > > >>>>> > > >
> > > > >>>>> > > > in many months, no investigation has occurred as to WHY
> the
> > > > >>>>> assertion
> > > > >>>>> > is
> > > > >>>>> > > > failing.
> > > > >>>>> > > >
> > > > >>>>> > > > the pr is vetoed until such a time that the actual root
> > > cause of
> > > > >>>>> the
> > > > >>>>> > > > problem is known.
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > > thanks,
> > > > >>>>> > > >
> > > > >>>>> > > > -Chris.
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > >
> > > > >>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
> > > > >>>>> mechernov@gmail.com>
> > > > >>>>> > > wrote:
> > > > >>>>> > > >
> > > > >>>>> > > >> Dear MXNet community,
> > > > >>>>> > > >>
> > > > >>>>> > > >> I would like to drive attention to an important issue
> that
> > > is
> > > > >>>>> present
> > > > >>>>> > in
> > > > >>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP
> > library.
> > > > >>>>> > > >>
> > > > >>>>> > > >> I have opened a PR to remove it:
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > > >>>>> > > >>
> > > > >>>>> > > >> The issue was closed, but I am strong in my oppinion
> that
> > > it's
> > > > >>>>> the
> > > > >>>>> > right
> > > > >>>>> > > >> thing to do.
> > > > >>>>> > > >>
> > > > >>>>> > > >> *Background*
> > > > >>>>> > > >> If you want to use OpenMP pragmas in your code for
> > > > >>>>> parallelization you
> > > > >>>>> > > >> would supply a special flag to the compiler:
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Clang / -fopenmp
> > > > >>>>> > > >> https://openmp.llvm.org/
> > > > >>>>> > > >>
> > > > >>>>> > > >> - GCC / -fopenmp
> > > > >>>>> > > >>
> > https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Intel / [Q]openmp
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > > > >>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > > > >>>>> > > >>
> > > > >>>>> > > >> Each of the compilers would enable the '#pragma omp'
> > > directive
> > > > >>>>> during
> > > > >>>>> > > >> C/C++
> > > > >>>>> > > >> compilation and arrange for automatic linking of the
> > OpenMP
> > > > >>>>> runtime
> > > > >>>>> > > >> library
> > > > >>>>> > > >> supplied by each complier separately.
> > > > >>>>> > > >>
> > > > >>>>> > > >> Thus, to use the advantages of an OpenMP implementation
> > one
> > > has
> > > > >>>>> to
> > > > >>>>> > > compile
> > > > >>>>> > > >> the code with the corresponding compiler.
> > > > >>>>> > > >>
> > > > >>>>> > > >> Currently, in MXNet CMake build scripts a bundled
> version
> > of
> > > > >>>>> llvm
> > > > >>>>> > OpenMP
> > > > >>>>> > > >> is
> > > > >>>>> > > >> used ([1] and [2]) to replace the OpenMP library
> supplied
> > > by the
> > > > >>>>> > > compiler.
> > > > >>>>> > > >>
> > > > >>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R)
> > Math
> > > > >>>>> Kernel
> > > > >>>>> > > >> Library
> > > > >>>>> > > >> for Deep Neural Networks):
> > > > >>>>> > > >>
> > > > >>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires
> > an
> > > > >>>>> OpenMP
> > > > >>>>> > > runtime
> > > > >>>>> > > >> library to work. As different OpenMP runtimes may not be
> > > binary
> > > > >>>>> > > compatible
> > > > >>>>> > > >> it's important to ensure that only one OpenMP runtime is
> > > used
> > > > >>>>> > throughout
> > > > >>>>> > > >> the application. Having more than one OpenMP runtime
> > > > >>>>> initialized may
> > > > >>>>> > > lead
> > > > >>>>> > > >> to undefined behavior resulting in incorrect results or
> > > > >>>>> crashes." [3]
> > > > >>>>> > > >>
> > > > >>>>> > > >> And:
> > > > >>>>> > > >>
> > > > >>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options
> will
> > > link
> > > > >>>>> the
> > > > >>>>> > > >> application with both Intel and GNU OpenMP runtime
> > > libraries.
> > > > >>>>> This
> > > > >>>>> > will
> > > > >>>>> > > >> lead to undefined behavior of the application." [4]
> > > > >>>>> > > >>
> > > > >>>>> > > >> As can be seen from ldd for MXNet:
> > > > >>>>> > > >>
> > > > >>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > > > >>>>> > > >>     libomp.so =>
> > > > >>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > > > >>>>> > > >> (0x00007f697bc55000)
> > > > >>>>> > > >>     libgomp.so.1 =>
> /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > > > >>>>> > > >> (0x00007f69660cd000)
> > > > >>>>> > > >>
> > > > >>>>> > > >> *Performance*
> > > > >>>>> > > >>
> > > > >>>>> > > >> The only performance data related to OpenMP in MXNet I
> was
> > > able
> > > > >>>>> to
> > > > >>>>> > find
> > > > >>>>> > > is
> > > > >>>>> > > >> here:
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > > > >>>>> > > >>
> > > > >>>>> > > >> Which in my understanding is testing imact of different
> > > > >>>>> environment
> > > > >>>>> > > >> variables for the same setup (using same bundled OpenMP
> > > > >>>>> library).
> > > > >>>>> > > >>
> > > > >>>>> > > >> The libraries may differ in implementation and the
> Thread
> > > > >>>>> Affinity
> > > > >>>>> > > >> Interface [5] may have significant impact on
> performance.
> > > > >>>>> > > >>
> > > > >>>>> > > >> All compliers support it:
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Clang / KMP_AFFINITY
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> >
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > > > >>>>> > > >>
> > > > >>>>> > > >> - GCC / GOMP_CPU_AFFINITY
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> >
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Intel / KMP_AFFINITY
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > > >>>>> > > >>
> > > > >>>>> > > >> - Visual Studio / SetThreadAffinityMask
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> >
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > > > >>>>> > > >>
> > > > >>>>> > > >> *Issues*
> > > > >>>>> > > >>
> > > > >>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with
> > > DEBUG=1
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
> > > > >>>>> > > >>
> > > > >>>>> > > >> libomp.so dependency (need REAL fix)
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
> > > > >>>>> > > >>
> > > > >>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed)
> > numpy
> > > > >>>>> with MKL
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
> > > > >>>>> > > >>
> > > > >>>>> > > >> Performance regression when OMP_NUM_THREADS environment
> > > > >>>>> variable is
> > > > >>>>> > not
> > > > >>>>> > > >> set
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
> > > > >>>>> > > >>
> > > > >>>>> > > >> Poor concat CPU performance on CUDA builds
> > > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
> > > > >>>>> > > >>
> > > > >>>>> > > >> I would appreciate hearing your thoughts.
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > > >> Best
> > > > >>>>> > > >> Anton
> > > > >>>>> > > >>
> > > > >>>>> > > >> [1]
> > > > >>>>> > > >>
> > > > >>>>> > > >>
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > > > >>>>> > > >> [2]
> > > > >>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > > > >>>>> > > >> [3]
> > > > >>>>>
> > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > > >>>>> > > >> [4]
> > > > >>>>>
> > https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > > > >>>>> > > >> [5] https://software.intel.com/en-us/node/522691
> > > > >>>>> > > >>
> > > > >>>>> > > >
> > > > >>>>> > >
> > > > >>>>> >
> > > > >>>>>
> > > > >>>>
> > >
> >
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Aaron Markham <aa...@gmail.com>.

I reopened it for you.

On Wed, May 22, 2019, 05:25 Anton Chernov <me...@gmail.com> wrote:

> I don't have necessary rights to reopen this PR.
>
> пн, 20 мая 2019 г. в 08:00, Pedro Larroy <pe...@gmail.com>:
>
> > Hi Anton, Stas.
> >
> > Can we reopen this PR and get it merged as per the data collected by
> Stas?
> >
> > https://github.com/apache/incubator-mxnet/pull/12160
> >
> >
> >
> https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
> >
> > There are multiple issues that will be fixed by solving this problem.
> >
> >
> > Pedro
> >
> > On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov <me...@gmail.com>
> wrote:
> > >
> > > I would like to propose a possible alternative solution for
> > consideration.
> > >
> > > If keeping llvm OpenMP as a submodule is inevitable one could make
> > > following adjustments:
> > >
> > > Since compilers try to find their own OpenMP library implicitly, MXNet
> > > needs to ensure that only the bundled version is found. Therefore
> during
> > > the build and also during deployment this library has to provide
> symlinks
> > > for each possible compiler that would link to the built artifact ie.
> > >
> > > libiomp.so -> libgomp.so -> libomp.so
> > >
> > > The MKLML iomp would need to be hidden and removed as well.
> > >
> > > On Windows it would be a different story, but as can be seen [1]
> bundled
> > > OpenMP was not included in the Windows build anyway.
> > >
> > > Alternatively: always use iomp (with same symlinking trick though)
> > provided
> > > by MKLML distribution [2]. This potentially could work on Windows as
> > well.
> > >
> > > Best
> > > Anton
> > >
> > > [1]
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > > [2] https://github.com/intel/mkl-dnn/releases
> > >
> > > вт, 12 февр. 2019 г. в 11:22, Anton Chernov <me...@gmail.com>:
> > >
> > > > Recent benchmarking results have been published here [1]. Experiments
> > > > compare different OpenMP implementations as well as binaries compiled
> > with
> > > > different compilers including GCC, Clang and ICC.
> > > >
> > > > During experimentation another issues with mixing up libraries was
> > > > identified and described here [2].
> > > >
> > > > Best
> > > > Anton
> > > >
> > > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > > [2]
> > > >
> >
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > > >
> > > >
> > > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:
> > > >
> > > >> Hi Chris,
> > > >>
> > > >> Following up on the issue, are all things resolved in the
> discussion?
> > > >>
> > > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> > > >> changes’ status:
> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > >>
> > > >> Thank you.
> > > >>
> > > >>
> > > >> Best
> > > >> Anton
> > > >>
> > > >>
> > > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <me...@gmail.com>:
> > > >>
> > > >>> Another thing to take into consideration:
> > > >>>
> > > >>> All python artefacts that are created (PyPi) are built with make
> and
> > are
> > > >>> not using the bundled OpenMP library.
> > > >>>
> > > >>> One step for the switch to CMake to happen is the approval and
> > merging
> > > >>> of the mentioned PR:
> > > >>>
> > > >>> https://github.com/apache/incubator-mxnet/pull/12160
> > > >>>
> > > >>> If there are no other objections I kindly ask Chris Olivier to
> remove
> > > >>> his 'requesting changes' veto on it to unblock the CMake overhaul
> > work.
> > > >>>
> > > >>> Thank you.
> > > >>>
> > > >>> Best
> > > >>> Anton
> > > >>>
> > > >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:
> > > >>>
> > > >>>>
> > > >>>> Thank you for you answer, Chris.
> > > >>>>
> > > >>>> > The whole “mixing omp libraries” is something that occurs in
> > > >>>> production
> > > >>>> every day and certainly in everything that uses mkl.
> > > >>>>
> > > >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures
> > that
> > > >>>> this mixture is not happening:
> > > >>>>
> > > >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> > > >>>> runtime library to work. As different OpenMP runtimes may not be
> > binary
> > > >>>> compatible it's important to ensure that only one OpenMP runtime
> is
> > used
> > > >>>> throughout the application. Having more than one OpenMP runtime
> > initialized
> > > >>>> may lead to undefined behavior resulting in incorrect results or
> > crashes."
> > > >>>> [1]
> > > >>>>
> > > >>>> That is why 2 different MKLML libraries are provided:
> > > >>>>
> > > >>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP
> > runtime
> > > >>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R)
> OpenMP
> > > >>>> runtime
> > > >>>>
> > > >>>> > is the suggestion that libiomp be removed from mkl?
> > > >>>>
> > > >>>> That is certainly not my suggestion.
> > > >>>>
> > > >>>> > have you spoken with intel? have you consulted Intel at all?
> > > >>>>
> > > >>>> Yes, I have asked for comments on the issue.
> > > >>>>
> > > >>>> > “hard to debug random crash”. you’re seeing an assertion which
> is
> > > >>>> probably ...
> > > >>>>
> > > >>>> I'm seeing the result of undefined behaviour. And I want to put
> > > >>>> emphasis on the following statement:
> > > >>>>
> > > >>>> I disregards of whether there is a particular reason for the
> assert
> > -
> > > >>>> it is a result of behaviour that should not happen. There are
> valid
> > ways
> > > >>>> how to use llvm OpenMP in MXNet and the current way is not one of
> > them.
> > > >>>>
> > > >>>> > The lack of root-causing the problem and knee-jerk solution here
> > > >>>> makes me
> > > >>>> uncomfortable.
> > > >>>>
> > > >>>> I hope that my efforts highlighting the problems reach you to
> > mitigate
> > > >>>> your uncomfort.
> > > >>>>
> > > >>>> > if you want to see performance differences there’s an
> environment
> > > >>>> variable
> > > >>>> you can set in the mxnet omp tuning code that will print overhead
> > and
> > > >>>> execution times for the current omp library.
> > > >>>>
> > > >>>> I don't want to see performance differences in the current OpenMP
> > > >>>> library. I want to remove the current OpenMP library and use the
> one
> > > >>>> provided by the compiler.
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> Best
> > > >>>> Anton
> > > >>>>
> > > >>>> [1]
> > https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > >>>>
> > > >>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <
> cjolivier01@apache.org
> > >:
> > > >>>>
> > > >>>>> Do you not work on CI mostly? My apologies for thinking that was
> > some
> > > >>>>> sort
> > > >>>>> of team effort between you and a few others that were passionate
> > about
> > > >>>>> CI
> > > >>>>> keeping the CI system running smoothly.
> > > >>>>>
> > > >>>>> You have source code, you have the line the assertion is on. If
> you
> > > >>>>> can’t
> > > >>>>> describe what’s going wrong that causes the assertion, then I
> don’t
> > > >>>>> really
> > > >>>>> have anything more to add to this conversation beyond what’s
> below:
> > > >>>>>
> > > >>>>> The whole “mixing omp libraries” is something that occurs in
> > production
> > > >>>>> every day and certainly in everything that uses mkl.  It may
> > > >>>>> occasionally
> > > >>>>> cause problems for some edge cases when there is super-complex
> > linking
> > > >>>>> strategies and dynamic loading.  But this is not one of those
> edge
> > > >>>>> cases.
> > > >>>>> Mostly blaming this is a red herring for other thread-related
> > problems
> > > >>>>> and
> > > >>>>> people switch omp library and the timing of their code changes
> and
> > they
> > > >>>>> stop seeing the problem. I’ve spent my entire career doing
> heavily
> > > >>>>> multiphreaded c++ development and i’ve seen that a million times.
> > is
> > > >>>>> the
> > > >>>>> suggestion that libiomp be removed from mkl? have you spoken with
> > > >>>>> intel?
> > > >>>>> have you consulted Intel at all?
> > > >>>>>
> > > >>>>> and what you are seeing isn’t some “hard to debug random crash”.
> > you’re
> > > >>>>> seeing an assertion which is probably related to omp trying to
> > create a
> > > >>>>> thread pool after a fork and something was done in the mxnet code
> > to
> > > >>>>> make
> > > >>>>> that sketchy to do. I’d suggest filing an issue with the llvm
> > openmp
> > > >>>>> just
> > > >>>>> like you’d file with any other not-well-understood behavior in
> > mxnet.
> > > >>>>>
> > > >>>>> The lack of root-causing the problem and knee-jerk solution here
> > makes
> > > >>>>> me
> > > >>>>> uncomfortable.
> > > >>>>>
> > > >>>>> if you want to see performance differences there’s an environment
> > > >>>>> variable
> > > >>>>> you can set in the mxnet omp tuning code that will print overhead
> > and
> > > >>>>> execution times for the current omp library.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <
> mechernov@gmail.com
> > >
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>> > Hi Chris,
> > > >>>>> >
> > > >>>>> > Thank you for your answer. If you have noticed the initial
> email
> > > >>>>> comes from
> > > >>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is
> not
> > > >>>>> from any
> > > >>>>> > 'Ci' team that you've mentioned, but from me personally.
> > > >>>>> >
> > > >>>>> > You are writing:
> > > >>>>> >
> > > >>>>> > > someone is doing something unhealthy when they fork ...
> > > >>>>> >
> > > >>>>> > I'm missing any context to understand what you mean.
> > > >>>>> >
> > > >>>>> > > we get a lot of performance gain from OMP ...
> > > >>>>> >
> > > >>>>> > There is no data that would prove this statement and therefore
> > it is
> > > >>>>> a
> > > >>>>> > random guess.
> > > >>>>> >
> > > >>>>> > > in many months, no investigation has occurred as to WHY the
> > > >>>>> assertion is
> > > >>>>> > failing.
> > > >>>>> >
> > > >>>>> > The investigation has concluded that this is happening due to
> > > >>>>> undefined
> > > >>>>> > behaviour which is, in my opinion, a suffient answer that does
> > not
> > > >>>>> require
> > > >>>>> > to go any deeper.
> > > >>>>> >
> > > >>>>> > > the pr is vetoed until such a time that the actual root cause
> > of
> > > >>>>> the
> > > >>>>> > problem is known.
> > > >>>>> >
> > > >>>>> > And considering the statements above there is no valid reason
> to
> > > >>>>> veto the
> > > >>>>> > PR.
> > > >>>>> >
> > > >>>>> >
> > > >>>>> > Best
> > > >>>>> > Anton
> > > >>>>> >
> > > >>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <
> > cjolivier01@gmail.com>:
> > > >>>>> >
> > > >>>>> > > 3x less overhead*
> > > >>>>> > >
> > > >>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
> > > >>>>> cjolivier01@gmail.com>
> > > >>>>> > > wrote:
> > > >>>>> > >
> > > >>>>> > > > someone is doing something unhealthy when they fork, which
> is
> > > >>>>> causing
> > > >>>>> > an
> > > >>>>> > > > assertion in the openmp library. the same assertion that
> > would
> > > >>>>> fire in
> > > >>>>> > > mkl,
> > > >>>>> > > > which is linked to libiomp5 (exact same omp library). this
> > is new
> > > >>>>> > > behavior
> > > >>>>> > > > and most likely due to an error or suboptimal approach in
> the
> > > >>>>> forking
> > > >>>>> > > logic
> > > >>>>> > > > in mxnet.
> > > >>>>> > > >
> > > >>>>> > > > in order to circumvent the assert, the Ci team is proposing
> > to
> > > >>>>> remove
> > > >>>>> > the
> > > >>>>> > > > library completely which is equivalent to cutting off your
> > leg
> > > >>>>> to make
> > > >>>>> > > the
> > > >>>>> > > > pain from stubbing your toe go away.
> > > >>>>> > > >
> > > >>>>> > > > we get a lot of performance gain from OMP. is has about a
> 1/3
> > > >>>>> less
> > > >>>>> > > > overhead for entering omp regions and also supports omp
> > regions
> > > >>>>> after a
> > > >>>>> > > > fork, which libgomp does not.
> > > >>>>> > > >
> > > >>>>> > > > in many months, no investigation has occurred as to WHY the
> > > >>>>> assertion
> > > >>>>> > is
> > > >>>>> > > > failing.
> > > >>>>> > > >
> > > >>>>> > > > the pr is vetoed until such a time that the actual root
> > cause of
> > > >>>>> the
> > > >>>>> > > > problem is known.
> > > >>>>> > > >
> > > >>>>> > > >
> > > >>>>> > > > thanks,
> > > >>>>> > > >
> > > >>>>> > > > -Chris.
> > > >>>>> > > >
> > > >>>>> > > >
> > > >>>>> > > >
> > > >>>>> > > >
> > > >>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
> > > >>>>> mechernov@gmail.com>
> > > >>>>> > > wrote:
> > > >>>>> > > >
> > > >>>>> > > >> Dear MXNet community,
> > > >>>>> > > >>
> > > >>>>> > > >> I would like to drive attention to an important issue that
> > is
> > > >>>>> present
> > > >>>>> > in
> > > >>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP
> library.
> > > >>>>> > > >>
> > > >>>>> > > >> I have opened a PR to remove it:
> > > >>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > >>>>> > > >>
> > > >>>>> > > >> The issue was closed, but I am strong in my oppinion that
> > it's
> > > >>>>> the
> > > >>>>> > right
> > > >>>>> > > >> thing to do.
> > > >>>>> > > >>
> > > >>>>> > > >> *Background*
> > > >>>>> > > >> If you want to use OpenMP pragmas in your code for
> > > >>>>> parallelization you
> > > >>>>> > > >> would supply a special flag to the compiler:
> > > >>>>> > > >>
> > > >>>>> > > >> - Clang / -fopenmp
> > > >>>>> > > >> https://openmp.llvm.org/
> > > >>>>> > > >>
> > > >>>>> > > >> - GCC / -fopenmp
> > > >>>>> > > >>
> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > > >>>>> > > >>
> > > >>>>> > > >> - Intel / [Q]openmp
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > >>>>> > > >>
> > > >>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > > >>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > > >>>>> > > >>
> > > >>>>> > > >> Each of the compilers would enable the '#pragma omp'
> > directive
> > > >>>>> during
> > > >>>>> > > >> C/C++
> > > >>>>> > > >> compilation and arrange for automatic linking of the
> OpenMP
> > > >>>>> runtime
> > > >>>>> > > >> library
> > > >>>>> > > >> supplied by each complier separately.
> > > >>>>> > > >>
> > > >>>>> > > >> Thus, to use the advantages of an OpenMP implementation
> one
> > has
> > > >>>>> to
> > > >>>>> > > compile
> > > >>>>> > > >> the code with the corresponding compiler.
> > > >>>>> > > >>
> > > >>>>> > > >> Currently, in MXNet CMake build scripts a bundled version
> of
> > > >>>>> llvm
> > > >>>>> > OpenMP
> > > >>>>> > > >> is
> > > >>>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied
> > by the
> > > >>>>> > > compiler.
> > > >>>>> > > >>
> > > >>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R)
> Math
> > > >>>>> Kernel
> > > >>>>> > > >> Library
> > > >>>>> > > >> for Deep Neural Networks):
> > > >>>>> > > >>
> > > >>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires
> an
> > > >>>>> OpenMP
> > > >>>>> > > runtime
> > > >>>>> > > >> library to work. As different OpenMP runtimes may not be
> > binary
> > > >>>>> > > compatible
> > > >>>>> > > >> it's important to ensure that only one OpenMP runtime is
> > used
> > > >>>>> > throughout
> > > >>>>> > > >> the application. Having more than one OpenMP runtime
> > > >>>>> initialized may
> > > >>>>> > > lead
> > > >>>>> > > >> to undefined behavior resulting in incorrect results or
> > > >>>>> crashes." [3]
> > > >>>>> > > >>
> > > >>>>> > > >> And:
> > > >>>>> > > >>
> > > >>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will
> > link
> > > >>>>> the
> > > >>>>> > > >> application with both Intel and GNU OpenMP runtime
> > libraries.
> > > >>>>> This
> > > >>>>> > will
> > > >>>>> > > >> lead to undefined behavior of the application." [4]
> > > >>>>> > > >>
> > > >>>>> > > >> As can be seen from ldd for MXNet:
> > > >>>>> > > >>
> > > >>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > > >>>>> > > >>     libomp.so =>
> > > >>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > > >>>>> > > >> (0x00007f697bc55000)
> > > >>>>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > > >>>>> > > >> (0x00007f69660cd000)
> > > >>>>> > > >>
> > > >>>>> > > >> *Performance*
> > > >>>>> > > >>
> > > >>>>> > > >> The only performance data related to OpenMP in MXNet I was
> > able
> > > >>>>> to
> > > >>>>> > find
> > > >>>>> > > is
> > > >>>>> > > >> here:
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> >
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > > >>>>> > > >>
> > > >>>>> > > >> Which in my understanding is testing imact of different
> > > >>>>> environment
> > > >>>>> > > >> variables for the same setup (using same bundled OpenMP
> > > >>>>> library).
> > > >>>>> > > >>
> > > >>>>> > > >> The libraries may differ in implementation and the Thread
> > > >>>>> Affinity
> > > >>>>> > > >> Interface [5] may have significant impact on performance.
> > > >>>>> > > >>
> > > >>>>> > > >> All compliers support it:
> > > >>>>> > > >>
> > > >>>>> > > >> - Clang / KMP_AFFINITY
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> >
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > > >>>>> > > >>
> > > >>>>> > > >> - GCC / GOMP_CPU_AFFINITY
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> >
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > > >>>>> > > >>
> > > >>>>> > > >> - Intel / KMP_AFFINITY
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > >>>>> > > >>
> > > >>>>> > > >> - Visual Studio / SetThreadAffinityMask
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> >
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > > >>>>> > > >>
> > > >>>>> > > >> *Issues*
> > > >>>>> > > >>
> > > >>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with
> > DEBUG=1
> > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
> > > >>>>> > > >>
> > > >>>>> > > >> libomp.so dependency (need REAL fix)
> > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
> > > >>>>> > > >>
> > > >>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed)
> numpy
> > > >>>>> with MKL
> > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
> > > >>>>> > > >>
> > > >>>>> > > >> Performance regression when OMP_NUM_THREADS environment
> > > >>>>> variable is
> > > >>>>> > not
> > > >>>>> > > >> set
> > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
> > > >>>>> > > >>
> > > >>>>> > > >> Poor concat CPU performance on CUDA builds
> > > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
> > > >>>>> > > >>
> > > >>>>> > > >> I would appreciate hearing your thoughts.
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > > >> Best
> > > >>>>> > > >> Anton
> > > >>>>> > > >>
> > > >>>>> > > >> [1]
> > > >>>>> > > >>
> > > >>>>> > > >>
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> >
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > > >>>>> > > >> [2]
> > > >>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > > >>>>> > > >> [3]
> > > >>>>>
> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > >>>>> > > >> [4]
> > > >>>>>
> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > > >>>>> > > >> [5] https://software.intel.com/en-us/node/522691
> > > >>>>> > > >>
> > > >>>>> > > >
> > > >>>>> > >
> > > >>>>> >
> > > >>>>>
> > > >>>>
> >
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

I don't have necessary rights to reopen this PR.

пн, 20 мая 2019 г. в 08:00, Pedro Larroy <pe...@gmail.com>:

> Hi Anton, Stas.
>
> Can we reopen this PR and get it merged as per the data collected by Stas?
>
> https://github.com/apache/incubator-mxnet/pull/12160
>
>
> https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations
>
> There are multiple issues that will be fixed by solving this problem.
>
>
> Pedro
>
> On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov <me...@gmail.com> wrote:
> >
> > I would like to propose a possible alternative solution for
> consideration.
> >
> > If keeping llvm OpenMP as a submodule is inevitable one could make
> > following adjustments:
> >
> > Since compilers try to find their own OpenMP library implicitly, MXNet
> > needs to ensure that only the bundled version is found. Therefore during
> > the build and also during deployment this library has to provide symlinks
> > for each possible compiler that would link to the built artifact ie.
> >
> > libiomp.so -> libgomp.so -> libomp.so
> >
> > The MKLML iomp would need to be hidden and removed as well.
> >
> > On Windows it would be a different story, but as can be seen [1] bundled
> > OpenMP was not included in the Windows build anyway.
> >
> > Alternatively: always use iomp (with same symlinking trick though)
> provided
> > by MKLML distribution [2]. This potentially could work on Windows as
> well.
> >
> > Best
> > Anton
> >
> > [1]
> >
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> > [2] https://github.com/intel/mkl-dnn/releases
> >
> > вт, 12 февр. 2019 г. в 11:22, Anton Chernov <me...@gmail.com>:
> >
> > > Recent benchmarking results have been published here [1]. Experiments
> > > compare different OpenMP implementations as well as binaries compiled
> with
> > > different compilers including GCC, Clang and ICC.
> > >
> > > During experimentation another issues with mixing up libraries was
> > > identified and described here [2].
> > >
> > > Best
> > > Anton
> > >
> > > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > > [2]
> > >
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> > >
> > >
> > > вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:
> > >
> > >> Hi Chris,
> > >>
> > >> Following up on the issue, are all things resolved in the discussion?
> > >>
> > >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> > >> changes’ status:
> > >> https://github.com/apache/incubator-mxnet/pull/12160
> > >>
> > >> Thank you.
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >>
> > >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <me...@gmail.com>:
> > >>
> > >>> Another thing to take into consideration:
> > >>>
> > >>> All python artefacts that are created (PyPi) are built with make and
> are
> > >>> not using the bundled OpenMP library.
> > >>>
> > >>> One step for the switch to CMake to happen is the approval and
> merging
> > >>> of the mentioned PR:
> > >>>
> > >>> https://github.com/apache/incubator-mxnet/pull/12160
> > >>>
> > >>> If there are no other objections I kindly ask Chris Olivier to remove
> > >>> his 'requesting changes' veto on it to unblock the CMake overhaul
> work.
> > >>>
> > >>> Thank you.
> > >>>
> > >>> Best
> > >>> Anton
> > >>>
> > >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:
> > >>>
> > >>>>
> > >>>> Thank you for you answer, Chris.
> > >>>>
> > >>>> > The whole “mixing omp libraries” is something that occurs in
> > >>>> production
> > >>>> every day and certainly in everything that uses mkl.
> > >>>>
> > >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures
> that
> > >>>> this mixture is not happening:
> > >>>>
> > >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> > >>>> runtime library to work. As different OpenMP runtimes may not be
> binary
> > >>>> compatible it's important to ensure that only one OpenMP runtime is
> used
> > >>>> throughout the application. Having more than one OpenMP runtime
> initialized
> > >>>> may lead to undefined behavior resulting in incorrect results or
> crashes."
> > >>>> [1]
> > >>>>
> > >>>> That is why 2 different MKLML libraries are provided:
> > >>>>
> > >>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP
> runtime
> > >>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
> > >>>> runtime
> > >>>>
> > >>>> > is the suggestion that libiomp be removed from mkl?
> > >>>>
> > >>>> That is certainly not my suggestion.
> > >>>>
> > >>>> > have you spoken with intel? have you consulted Intel at all?
> > >>>>
> > >>>> Yes, I have asked for comments on the issue.
> > >>>>
> > >>>> > “hard to debug random crash”. you’re seeing an assertion which is
> > >>>> probably ...
> > >>>>
> > >>>> I'm seeing the result of undefined behaviour. And I want to put
> > >>>> emphasis on the following statement:
> > >>>>
> > >>>> I disregards of whether there is a particular reason for the assert
> -
> > >>>> it is a result of behaviour that should not happen. There are valid
> ways
> > >>>> how to use llvm OpenMP in MXNet and the current way is not one of
> them.
> > >>>>
> > >>>> > The lack of root-causing the problem and knee-jerk solution here
> > >>>> makes me
> > >>>> uncomfortable.
> > >>>>
> > >>>> I hope that my efforts highlighting the problems reach you to
> mitigate
> > >>>> your uncomfort.
> > >>>>
> > >>>> > if you want to see performance differences there’s an environment
> > >>>> variable
> > >>>> you can set in the mxnet omp tuning code that will print overhead
> and
> > >>>> execution times for the current omp library.
> > >>>>
> > >>>> I don't want to see performance differences in the current OpenMP
> > >>>> library. I want to remove the current OpenMP library and use the one
> > >>>> provided by the compiler.
> > >>>>
> > >>>>
> > >>>>
> > >>>> Best
> > >>>> Anton
> > >>>>
> > >>>> [1]
> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > >>>>
> > >>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cjolivier01@apache.org
> >:
> > >>>>
> > >>>>> Do you not work on CI mostly? My apologies for thinking that was
> some
> > >>>>> sort
> > >>>>> of team effort between you and a few others that were passionate
> about
> > >>>>> CI
> > >>>>> keeping the CI system running smoothly.
> > >>>>>
> > >>>>> You have source code, you have the line the assertion is on. If you
> > >>>>> can’t
> > >>>>> describe what’s going wrong that causes the assertion, then I don’t
> > >>>>> really
> > >>>>> have anything more to add to this conversation beyond what’s below:
> > >>>>>
> > >>>>> The whole “mixing omp libraries” is something that occurs in
> production
> > >>>>> every day and certainly in everything that uses mkl.  It may
> > >>>>> occasionally
> > >>>>> cause problems for some edge cases when there is super-complex
> linking
> > >>>>> strategies and dynamic loading.  But this is not one of those edge
> > >>>>> cases.
> > >>>>> Mostly blaming this is a red herring for other thread-related
> problems
> > >>>>> and
> > >>>>> people switch omp library and the timing of their code changes and
> they
> > >>>>> stop seeing the problem. I’ve spent my entire career doing heavily
> > >>>>> multiphreaded c++ development and i’ve seen that a million times.
> is
> > >>>>> the
> > >>>>> suggestion that libiomp be removed from mkl? have you spoken with
> > >>>>> intel?
> > >>>>> have you consulted Intel at all?
> > >>>>>
> > >>>>> and what you are seeing isn’t some “hard to debug random crash”.
> you’re
> > >>>>> seeing an assertion which is probably related to omp trying to
> create a
> > >>>>> thread pool after a fork and something was done in the mxnet code
> to
> > >>>>> make
> > >>>>> that sketchy to do. I’d suggest filing an issue with the llvm
> openmp
> > >>>>> just
> > >>>>> like you’d file with any other not-well-understood behavior in
> mxnet.
> > >>>>>
> > >>>>> The lack of root-causing the problem and knee-jerk solution here
> makes
> > >>>>> me
> > >>>>> uncomfortable.
> > >>>>>
> > >>>>> if you want to see performance differences there’s an environment
> > >>>>> variable
> > >>>>> you can set in the mxnet omp tuning code that will print overhead
> and
> > >>>>> execution times for the current omp library.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <mechernov@gmail.com
> >
> > >>>>> wrote:
> > >>>>>
> > >>>>> > Hi Chris,
> > >>>>> >
> > >>>>> > Thank you for your answer. If you have noticed the initial email
> > >>>>> comes from
> > >>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not
> > >>>>> from any
> > >>>>> > 'Ci' team that you've mentioned, but from me personally.
> > >>>>> >
> > >>>>> > You are writing:
> > >>>>> >
> > >>>>> > > someone is doing something unhealthy when they fork ...
> > >>>>> >
> > >>>>> > I'm missing any context to understand what you mean.
> > >>>>> >
> > >>>>> > > we get a lot of performance gain from OMP ...
> > >>>>> >
> > >>>>> > There is no data that would prove this statement and therefore
> it is
> > >>>>> a
> > >>>>> > random guess.
> > >>>>> >
> > >>>>> > > in many months, no investigation has occurred as to WHY the
> > >>>>> assertion is
> > >>>>> > failing.
> > >>>>> >
> > >>>>> > The investigation has concluded that this is happening due to
> > >>>>> undefined
> > >>>>> > behaviour which is, in my opinion, a suffient answer that does
> not
> > >>>>> require
> > >>>>> > to go any deeper.
> > >>>>> >
> > >>>>> > > the pr is vetoed until such a time that the actual root cause
> of
> > >>>>> the
> > >>>>> > problem is known.
> > >>>>> >
> > >>>>> > And considering the statements above there is no valid reason to
> > >>>>> veto the
> > >>>>> > PR.
> > >>>>> >
> > >>>>> >
> > >>>>> > Best
> > >>>>> > Anton
> > >>>>> >
> > >>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <
> cjolivier01@gmail.com>:
> > >>>>> >
> > >>>>> > > 3x less overhead*
> > >>>>> > >
> > >>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
> > >>>>> cjolivier01@gmail.com>
> > >>>>> > > wrote:
> > >>>>> > >
> > >>>>> > > > someone is doing something unhealthy when they fork, which is
> > >>>>> causing
> > >>>>> > an
> > >>>>> > > > assertion in the openmp library. the same assertion that
> would
> > >>>>> fire in
> > >>>>> > > mkl,
> > >>>>> > > > which is linked to libiomp5 (exact same omp library). this
> is new
> > >>>>> > > behavior
> > >>>>> > > > and most likely due to an error or suboptimal approach in the
> > >>>>> forking
> > >>>>> > > logic
> > >>>>> > > > in mxnet.
> > >>>>> > > >
> > >>>>> > > > in order to circumvent the assert, the Ci team is proposing
> to
> > >>>>> remove
> > >>>>> > the
> > >>>>> > > > library completely which is equivalent to cutting off your
> leg
> > >>>>> to make
> > >>>>> > > the
> > >>>>> > > > pain from stubbing your toe go away.
> > >>>>> > > >
> > >>>>> > > > we get a lot of performance gain from OMP. is has about a 1/3
> > >>>>> less
> > >>>>> > > > overhead for entering omp regions and also supports omp
> regions
> > >>>>> after a
> > >>>>> > > > fork, which libgomp does not.
> > >>>>> > > >
> > >>>>> > > > in many months, no investigation has occurred as to WHY the
> > >>>>> assertion
> > >>>>> > is
> > >>>>> > > > failing.
> > >>>>> > > >
> > >>>>> > > > the pr is vetoed until such a time that the actual root
> cause of
> > >>>>> the
> > >>>>> > > > problem is known.
> > >>>>> > > >
> > >>>>> > > >
> > >>>>> > > > thanks,
> > >>>>> > > >
> > >>>>> > > > -Chris.
> > >>>>> > > >
> > >>>>> > > >
> > >>>>> > > >
> > >>>>> > > >
> > >>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
> > >>>>> mechernov@gmail.com>
> > >>>>> > > wrote:
> > >>>>> > > >
> > >>>>> > > >> Dear MXNet community,
> > >>>>> > > >>
> > >>>>> > > >> I would like to drive attention to an important issue that
> is
> > >>>>> present
> > >>>>> > in
> > >>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> > >>>>> > > >>
> > >>>>> > > >> I have opened a PR to remove it:
> > >>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > >>>>> > > >>
> > >>>>> > > >> The issue was closed, but I am strong in my oppinion that
> it's
> > >>>>> the
> > >>>>> > right
> > >>>>> > > >> thing to do.
> > >>>>> > > >>
> > >>>>> > > >> *Background*
> > >>>>> > > >> If you want to use OpenMP pragmas in your code for
> > >>>>> parallelization you
> > >>>>> > > >> would supply a special flag to the compiler:
> > >>>>> > > >>
> > >>>>> > > >> - Clang / -fopenmp
> > >>>>> > > >> https://openmp.llvm.org/
> > >>>>> > > >>
> > >>>>> > > >> - GCC / -fopenmp
> > >>>>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > >>>>> > > >>
> > >>>>> > > >> - Intel / [Q]openmp
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > >
> > >>>>> >
> > >>>>>
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>>>> > > >>
> > >>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > >>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > >>>>> > > >>
> > >>>>> > > >> Each of the compilers would enable the '#pragma omp'
> directive
> > >>>>> during
> > >>>>> > > >> C/C++
> > >>>>> > > >> compilation and arrange for automatic linking of the OpenMP
> > >>>>> runtime
> > >>>>> > > >> library
> > >>>>> > > >> supplied by each complier separately.
> > >>>>> > > >>
> > >>>>> > > >> Thus, to use the advantages of an OpenMP implementation one
> has
> > >>>>> to
> > >>>>> > > compile
> > >>>>> > > >> the code with the corresponding compiler.
> > >>>>> > > >>
> > >>>>> > > >> Currently, in MXNet CMake build scripts a bundled version of
> > >>>>> llvm
> > >>>>> > OpenMP
> > >>>>> > > >> is
> > >>>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied
> by the
> > >>>>> > > compiler.
> > >>>>> > > >>
> > >>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math
> > >>>>> Kernel
> > >>>>> > > >> Library
> > >>>>> > > >> for Deep Neural Networks):
> > >>>>> > > >>
> > >>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an
> > >>>>> OpenMP
> > >>>>> > > runtime
> > >>>>> > > >> library to work. As different OpenMP runtimes may not be
> binary
> > >>>>> > > compatible
> > >>>>> > > >> it's important to ensure that only one OpenMP runtime is
> used
> > >>>>> > throughout
> > >>>>> > > >> the application. Having more than one OpenMP runtime
> > >>>>> initialized may
> > >>>>> > > lead
> > >>>>> > > >> to undefined behavior resulting in incorrect results or
> > >>>>> crashes." [3]
> > >>>>> > > >>
> > >>>>> > > >> And:
> > >>>>> > > >>
> > >>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will
> link
> > >>>>> the
> > >>>>> > > >> application with both Intel and GNU OpenMP runtime
> libraries.
> > >>>>> This
> > >>>>> > will
> > >>>>> > > >> lead to undefined behavior of the application." [4]
> > >>>>> > > >>
> > >>>>> > > >> As can be seen from ldd for MXNet:
> > >>>>> > > >>
> > >>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > >>>>> > > >>     libomp.so =>
> > >>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > >>>>> > > >> (0x00007f697bc55000)
> > >>>>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > >>>>> > > >> (0x00007f69660cd000)
> > >>>>> > > >>
> > >>>>> > > >> *Performance*
> > >>>>> > > >>
> > >>>>> > > >> The only performance data related to OpenMP in MXNet I was
> able
> > >>>>> to
> > >>>>> > find
> > >>>>> > > is
> > >>>>> > > >> here:
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > >
> > >>>>> >
> > >>>>>
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > >>>>> > > >>
> > >>>>> > > >> Which in my understanding is testing imact of different
> > >>>>> environment
> > >>>>> > > >> variables for the same setup (using same bundled OpenMP
> > >>>>> library).
> > >>>>> > > >>
> > >>>>> > > >> The libraries may differ in implementation and the Thread
> > >>>>> Affinity
> > >>>>> > > >> Interface [5] may have significant impact on performance.
> > >>>>> > > >>
> > >>>>> > > >> All compliers support it:
> > >>>>> > > >>
> > >>>>> > > >> - Clang / KMP_AFFINITY
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > >
> > >>>>> >
> > >>>>>
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > >>>>> > > >>
> > >>>>> > > >> - GCC / GOMP_CPU_AFFINITY
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > >
> > >>>>> >
> > >>>>>
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > >>>>> > > >>
> > >>>>> > > >> - Intel / KMP_AFFINITY
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > >
> > >>>>> >
> > >>>>>
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>>>> > > >>
> > >>>>> > > >> - Visual Studio / SetThreadAffinityMask
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > >
> > >>>>> >
> > >>>>>
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > >>>>> > > >>
> > >>>>> > > >> *Issues*
> > >>>>> > > >>
> > >>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with
> DEBUG=1
> > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
> > >>>>> > > >>
> > >>>>> > > >> libomp.so dependency (need REAL fix)
> > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
> > >>>>> > > >>
> > >>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy
> > >>>>> with MKL
> > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
> > >>>>> > > >>
> > >>>>> > > >> Performance regression when OMP_NUM_THREADS environment
> > >>>>> variable is
> > >>>>> > not
> > >>>>> > > >> set
> > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
> > >>>>> > > >>
> > >>>>> > > >> Poor concat CPU performance on CUDA builds
> > >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
> > >>>>> > > >>
> > >>>>> > > >> I would appreciate hearing your thoughts.
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > > >> Best
> > >>>>> > > >> Anton
> > >>>>> > > >>
> > >>>>> > > >> [1]
> > >>>>> > > >>
> > >>>>> > > >>
> > >>>>> > >
> > >>>>> >
> > >>>>>
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > >>>>> > > >> [2]
> > >>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > >>>>> > > >> [3]
> > >>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > >>>>> > > >> [4]
> > >>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > >>>>> > > >> [5] https://software.intel.com/en-us/node/522691
> > >>>>> > > >>
> > >>>>> > > >
> > >>>>> > >
> > >>>>> >
> > >>>>>
> > >>>>
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Pedro Larroy <pe...@gmail.com>.

Hi Anton, Stas.

Can we reopen this PR and get it merged as per the data collected by Stas?

https://github.com/apache/incubator-mxnet/pull/12160

https://cwiki.apache.org/confluence/display/MXNET/Benchmarking+MXNet+with+different+OpenMP+implementations

There are multiple issues that will be fixed by solving this problem.


Pedro

On Tue, Feb 12, 2019 at 4:54 AM Anton Chernov <me...@gmail.com> wrote:
>
> I would like to propose a possible alternative solution for consideration.
>
> If keeping llvm OpenMP as a submodule is inevitable one could make
> following adjustments:
>
> Since compilers try to find their own OpenMP library implicitly, MXNet
> needs to ensure that only the bundled version is found. Therefore during
> the build and also during deployment this library has to provide symlinks
> for each possible compiler that would link to the built artifact ie.
>
> libiomp.so -> libgomp.so -> libomp.so
>
> The MKLML iomp would need to be hidden and removed as well.
>
> On Windows it would be a different story, but as can be seen [1] bundled
> OpenMP was not included in the Windows build anyway.
>
> Alternatively: always use iomp (with same symlinking trick though) provided
> by MKLML distribution [2]. This potentially could work on Windows as well.
>
> Best
> Anton
>
> [1]
> https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
> [2] https://github.com/intel/mkl-dnn/releases
>
> вт, 12 февр. 2019 г. в 11:22, Anton Chernov <me...@gmail.com>:
>
> > Recent benchmarking results have been published here [1]. Experiments
> > compare different OpenMP implementations as well as binaries compiled with
> > different compilers including GCC, Clang and ICC.
> >
> > During experimentation another issues with mixing up libraries was
> > identified and described here [2].
> >
> > Best
> > Anton
> >
> > [1] https://cwiki.apache.org/confluence/x/2wclBg
> > [2]
> > https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
> >
> >
> > вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:
> >
> >> Hi Chris,
> >>
> >> Following up on the issue, are all things resolved in the discussion?
> >>
> >> If yes, I kindly ask you to reopen this PR and remove ‘requesting
> >> changes’ status:
> >> https://github.com/apache/incubator-mxnet/pull/12160
> >>
> >> Thank you.
> >>
> >>
> >> Best
> >> Anton
> >>
> >>
> >> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <me...@gmail.com>:
> >>
> >>> Another thing to take into consideration:
> >>>
> >>> All python artefacts that are created (PyPi) are built with make and are
> >>> not using the bundled OpenMP library.
> >>>
> >>> One step for the switch to CMake to happen is the approval and merging
> >>> of the mentioned PR:
> >>>
> >>> https://github.com/apache/incubator-mxnet/pull/12160
> >>>
> >>> If there are no other objections I kindly ask Chris Olivier to remove
> >>> his 'requesting changes' veto on it to unblock the CMake overhaul work.
> >>>
> >>> Thank you.
> >>>
> >>> Best
> >>> Anton
> >>>
> >>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:
> >>>
> >>>>
> >>>> Thank you for you answer, Chris.
> >>>>
> >>>> > The whole “mixing omp libraries” is something that occurs in
> >>>> production
> >>>> every day and certainly in everything that uses mkl.
> >>>>
> >>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
> >>>> this mixture is not happening:
> >>>>
> >>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> >>>> runtime library to work. As different OpenMP runtimes may not be binary
> >>>> compatible it's important to ensure that only one OpenMP runtime is used
> >>>> throughout the application. Having more than one OpenMP runtime initialized
> >>>> may lead to undefined behavior resulting in incorrect results or crashes."
> >>>> [1]
> >>>>
> >>>> That is why 2 different MKLML libraries are provided:
> >>>>
> >>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
> >>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
> >>>> runtime
> >>>>
> >>>> > is the suggestion that libiomp be removed from mkl?
> >>>>
> >>>> That is certainly not my suggestion.
> >>>>
> >>>> > have you spoken with intel? have you consulted Intel at all?
> >>>>
> >>>> Yes, I have asked for comments on the issue.
> >>>>
> >>>> > “hard to debug random crash”. you’re seeing an assertion which is
> >>>> probably ...
> >>>>
> >>>> I'm seeing the result of undefined behaviour. And I want to put
> >>>> emphasis on the following statement:
> >>>>
> >>>> I disregards of whether there is a particular reason for the assert -
> >>>> it is a result of behaviour that should not happen. There are valid ways
> >>>> how to use llvm OpenMP in MXNet and the current way is not one of them.
> >>>>
> >>>> > The lack of root-causing the problem and knee-jerk solution here
> >>>> makes me
> >>>> uncomfortable.
> >>>>
> >>>> I hope that my efforts highlighting the problems reach you to mitigate
> >>>> your uncomfort.
> >>>>
> >>>> > if you want to see performance differences there’s an environment
> >>>> variable
> >>>> you can set in the mxnet omp tuning code that will print overhead and
> >>>> execution times for the current omp library.
> >>>>
> >>>> I don't want to see performance differences in the current OpenMP
> >>>> library. I want to remove the current OpenMP library and use the one
> >>>> provided by the compiler.
> >>>>
> >>>>
> >>>>
> >>>> Best
> >>>> Anton
> >>>>
> >>>> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> >>>>
> >>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cj...@apache.org>:
> >>>>
> >>>>> Do you not work on CI mostly? My apologies for thinking that was some
> >>>>> sort
> >>>>> of team effort between you and a few others that were passionate about
> >>>>> CI
> >>>>> keeping the CI system running smoothly.
> >>>>>
> >>>>> You have source code, you have the line the assertion is on. If you
> >>>>> can’t
> >>>>> describe what’s going wrong that causes the assertion, then I don’t
> >>>>> really
> >>>>> have anything more to add to this conversation beyond what’s below:
> >>>>>
> >>>>> The whole “mixing omp libraries” is something that occurs in production
> >>>>> every day and certainly in everything that uses mkl.  It may
> >>>>> occasionally
> >>>>> cause problems for some edge cases when there is super-complex linking
> >>>>> strategies and dynamic loading.  But this is not one of those edge
> >>>>> cases.
> >>>>> Mostly blaming this is a red herring for other thread-related problems
> >>>>> and
> >>>>> people switch omp library and the timing of their code changes and they
> >>>>> stop seeing the problem. I’ve spent my entire career doing heavily
> >>>>> multiphreaded c++ development and i’ve seen that a million times.  is
> >>>>> the
> >>>>> suggestion that libiomp be removed from mkl? have you spoken with
> >>>>> intel?
> >>>>> have you consulted Intel at all?
> >>>>>
> >>>>> and what you are seeing isn’t some “hard to debug random crash”. you’re
> >>>>> seeing an assertion which is probably related to omp trying to create a
> >>>>> thread pool after a fork and something was done in the mxnet code to
> >>>>> make
> >>>>> that sketchy to do. I’d suggest filing an issue with the llvm openmp
> >>>>> just
> >>>>> like you’d file with any other not-well-understood behavior in mxnet.
> >>>>>
> >>>>> The lack of root-causing the problem and knee-jerk solution here makes
> >>>>> me
> >>>>> uncomfortable.
> >>>>>
> >>>>> if you want to see performance differences there’s an environment
> >>>>> variable
> >>>>> you can set in the mxnet omp tuning code that will print overhead and
> >>>>> execution times for the current omp library.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <me...@gmail.com>
> >>>>> wrote:
> >>>>>
> >>>>> > Hi Chris,
> >>>>> >
> >>>>> > Thank you for your answer. If you have noticed the initial email
> >>>>> comes from
> >>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not
> >>>>> from any
> >>>>> > 'Ci' team that you've mentioned, but from me personally.
> >>>>> >
> >>>>> > You are writing:
> >>>>> >
> >>>>> > > someone is doing something unhealthy when they fork ...
> >>>>> >
> >>>>> > I'm missing any context to understand what you mean.
> >>>>> >
> >>>>> > > we get a lot of performance gain from OMP ...
> >>>>> >
> >>>>> > There is no data that would prove this statement and therefore it is
> >>>>> a
> >>>>> > random guess.
> >>>>> >
> >>>>> > > in many months, no investigation has occurred as to WHY the
> >>>>> assertion is
> >>>>> > failing.
> >>>>> >
> >>>>> > The investigation has concluded that this is happening due to
> >>>>> undefined
> >>>>> > behaviour which is, in my opinion, a suffient answer that does not
> >>>>> require
> >>>>> > to go any deeper.
> >>>>> >
> >>>>> > > the pr is vetoed until such a time that the actual root cause of
> >>>>> the
> >>>>> > problem is known.
> >>>>> >
> >>>>> > And considering the statements above there is no valid reason to
> >>>>> veto the
> >>>>> > PR.
> >>>>> >
> >>>>> >
> >>>>> > Best
> >>>>> > Anton
> >>>>> >
> >>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
> >>>>> >
> >>>>> > > 3x less overhead*
> >>>>> > >
> >>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
> >>>>> cjolivier01@gmail.com>
> >>>>> > > wrote:
> >>>>> > >
> >>>>> > > > someone is doing something unhealthy when they fork, which is
> >>>>> causing
> >>>>> > an
> >>>>> > > > assertion in the openmp library. the same assertion that would
> >>>>> fire in
> >>>>> > > mkl,
> >>>>> > > > which is linked to libiomp5 (exact same omp library). this is new
> >>>>> > > behavior
> >>>>> > > > and most likely due to an error or suboptimal approach in the
> >>>>> forking
> >>>>> > > logic
> >>>>> > > > in mxnet.
> >>>>> > > >
> >>>>> > > > in order to circumvent the assert, the Ci team is proposing to
> >>>>> remove
> >>>>> > the
> >>>>> > > > library completely which is equivalent to cutting off your leg
> >>>>> to make
> >>>>> > > the
> >>>>> > > > pain from stubbing your toe go away.
> >>>>> > > >
> >>>>> > > > we get a lot of performance gain from OMP. is has about a 1/3
> >>>>> less
> >>>>> > > > overhead for entering omp regions and also supports omp regions
> >>>>> after a
> >>>>> > > > fork, which libgomp does not.
> >>>>> > > >
> >>>>> > > > in many months, no investigation has occurred as to WHY the
> >>>>> assertion
> >>>>> > is
> >>>>> > > > failing.
> >>>>> > > >
> >>>>> > > > the pr is vetoed until such a time that the actual root cause of
> >>>>> the
> >>>>> > > > problem is known.
> >>>>> > > >
> >>>>> > > >
> >>>>> > > > thanks,
> >>>>> > > >
> >>>>> > > > -Chris.
> >>>>> > > >
> >>>>> > > >
> >>>>> > > >
> >>>>> > > >
> >>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
> >>>>> mechernov@gmail.com>
> >>>>> > > wrote:
> >>>>> > > >
> >>>>> > > >> Dear MXNet community,
> >>>>> > > >>
> >>>>> > > >> I would like to drive attention to an important issue that is
> >>>>> present
> >>>>> > in
> >>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> >>>>> > > >>
> >>>>> > > >> I have opened a PR to remove it:
> >>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> >>>>> > > >>
> >>>>> > > >> The issue was closed, but I am strong in my oppinion that it's
> >>>>> the
> >>>>> > right
> >>>>> > > >> thing to do.
> >>>>> > > >>
> >>>>> > > >> *Background*
> >>>>> > > >> If you want to use OpenMP pragmas in your code for
> >>>>> parallelization you
> >>>>> > > >> would supply a special flag to the compiler:
> >>>>> > > >>
> >>>>> > > >> - Clang / -fopenmp
> >>>>> > > >> https://openmp.llvm.org/
> >>>>> > > >>
> >>>>> > > >> - GCC / -fopenmp
> >>>>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> >>>>> > > >>
> >>>>> > > >> - Intel / [Q]openmp
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > >
> >>>>> >
> >>>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> >>>>> > > >>
> >>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> >>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> >>>>> > > >>
> >>>>> > > >> Each of the compilers would enable the '#pragma omp' directive
> >>>>> during
> >>>>> > > >> C/C++
> >>>>> > > >> compilation and arrange for automatic linking of the OpenMP
> >>>>> runtime
> >>>>> > > >> library
> >>>>> > > >> supplied by each complier separately.
> >>>>> > > >>
> >>>>> > > >> Thus, to use the advantages of an OpenMP implementation one has
> >>>>> to
> >>>>> > > compile
> >>>>> > > >> the code with the corresponding compiler.
> >>>>> > > >>
> >>>>> > > >> Currently, in MXNet CMake build scripts a bundled version of
> >>>>> llvm
> >>>>> > OpenMP
> >>>>> > > >> is
> >>>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied by the
> >>>>> > > compiler.
> >>>>> > > >>
> >>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math
> >>>>> Kernel
> >>>>> > > >> Library
> >>>>> > > >> for Deep Neural Networks):
> >>>>> > > >>
> >>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an
> >>>>> OpenMP
> >>>>> > > runtime
> >>>>> > > >> library to work. As different OpenMP runtimes may not be binary
> >>>>> > > compatible
> >>>>> > > >> it's important to ensure that only one OpenMP runtime is used
> >>>>> > throughout
> >>>>> > > >> the application. Having more than one OpenMP runtime
> >>>>> initialized may
> >>>>> > > lead
> >>>>> > > >> to undefined behavior resulting in incorrect results or
> >>>>> crashes." [3]
> >>>>> > > >>
> >>>>> > > >> And:
> >>>>> > > >>
> >>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link
> >>>>> the
> >>>>> > > >> application with both Intel and GNU OpenMP runtime libraries.
> >>>>> This
> >>>>> > will
> >>>>> > > >> lead to undefined behavior of the application." [4]
> >>>>> > > >>
> >>>>> > > >> As can be seen from ldd for MXNet:
> >>>>> > > >>
> >>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> >>>>> > > >>     libomp.so =>
> >>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> >>>>> > > >> (0x00007f697bc55000)
> >>>>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> >>>>> > > >> (0x00007f69660cd000)
> >>>>> > > >>
> >>>>> > > >> *Performance*
> >>>>> > > >>
> >>>>> > > >> The only performance data related to OpenMP in MXNet I was able
> >>>>> to
> >>>>> > find
> >>>>> > > is
> >>>>> > > >> here:
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > >
> >>>>> >
> >>>>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> >>>>> > > >>
> >>>>> > > >> Which in my understanding is testing imact of different
> >>>>> environment
> >>>>> > > >> variables for the same setup (using same bundled OpenMP
> >>>>> library).
> >>>>> > > >>
> >>>>> > > >> The libraries may differ in implementation and the Thread
> >>>>> Affinity
> >>>>> > > >> Interface [5] may have significant impact on performance.
> >>>>> > > >>
> >>>>> > > >> All compliers support it:
> >>>>> > > >>
> >>>>> > > >> - Clang / KMP_AFFINITY
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > >
> >>>>> >
> >>>>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> >>>>> > > >>
> >>>>> > > >> - GCC / GOMP_CPU_AFFINITY
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > >
> >>>>> >
> >>>>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> >>>>> > > >>
> >>>>> > > >> - Intel / KMP_AFFINITY
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > >
> >>>>> >
> >>>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> >>>>> > > >>
> >>>>> > > >> - Visual Studio / SetThreadAffinityMask
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > >
> >>>>> >
> >>>>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> >>>>> > > >>
> >>>>> > > >> *Issues*
> >>>>> > > >>
> >>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
> >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
> >>>>> > > >>
> >>>>> > > >> libomp.so dependency (need REAL fix)
> >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
> >>>>> > > >>
> >>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy
> >>>>> with MKL
> >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
> >>>>> > > >>
> >>>>> > > >> Performance regression when OMP_NUM_THREADS environment
> >>>>> variable is
> >>>>> > not
> >>>>> > > >> set
> >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
> >>>>> > > >>
> >>>>> > > >> Poor concat CPU performance on CUDA builds
> >>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
> >>>>> > > >>
> >>>>> > > >> I would appreciate hearing your thoughts.
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > > >> Best
> >>>>> > > >> Anton
> >>>>> > > >>
> >>>>> > > >> [1]
> >>>>> > > >>
> >>>>> > > >>
> >>>>> > >
> >>>>> >
> >>>>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> >>>>> > > >> [2]
> >>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> >>>>> > > >> [3]
> >>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> >>>>> > > >> [4]
> >>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> >>>>> > > >> [5] https://software.intel.com/en-us/node/522691
> >>>>> > > >>
> >>>>> > > >
> >>>>> > >
> >>>>> >
> >>>>>
> >>>>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

I would like to propose a possible alternative solution for consideration.

If keeping llvm OpenMP as a submodule is inevitable one could make
following adjustments:

Since compilers try to find their own OpenMP library implicitly, MXNet
needs to ensure that only the bundled version is found. Therefore during
the build and also during deployment this library has to provide symlinks
for each possible compiler that would link to the built artifact ie.

libiomp.so -> libgomp.so -> libomp.so

The MKLML iomp would need to be hidden and removed as well.

On Windows it would be a different story, but as can be seen [1] bundled
OpenMP was not included in the Windows build anyway.

Alternatively: always use iomp (with same symlinking trick though) provided
by MKLML distribution [2]. This potentially could work on Windows as well.

Best
Anton

[1]
https://github.com/apache/incubator-mxnet/blob/8a63bdecf2d9f12d34fe5874957ae4c867eb5f5b/CMakeLists.txt#L408-L410
[2] https://github.com/intel/mkl-dnn/releases

вт, 12 февр. 2019 г. в 11:22, Anton Chernov <me...@gmail.com>:

> Recent benchmarking results have been published here [1]. Experiments
> compare different OpenMP implementations as well as binaries compiled with
> different compilers including GCC, Clang and ICC.
>
> During experimentation another issues with mixing up libraries was
> identified and described here [2].
>
> Best
> Anton
>
> [1] https://cwiki.apache.org/confluence/x/2wclBg
> [2]
> https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041
>
>
> вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:
>
>> Hi Chris,
>>
>> Following up on the issue, are all things resolved in the discussion?
>>
>> If yes, I kindly ask you to reopen this PR and remove ‘requesting
>> changes’ status:
>> https://github.com/apache/incubator-mxnet/pull/12160
>>
>> Thank you.
>>
>>
>> Best
>> Anton
>>
>>
>> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <me...@gmail.com>:
>>
>>> Another thing to take into consideration:
>>>
>>> All python artefacts that are created (PyPi) are built with make and are
>>> not using the bundled OpenMP library.
>>>
>>> One step for the switch to CMake to happen is the approval and merging
>>> of the mentioned PR:
>>>
>>> https://github.com/apache/incubator-mxnet/pull/12160
>>>
>>> If there are no other objections I kindly ask Chris Olivier to remove
>>> his 'requesting changes' veto on it to unblock the CMake overhaul work.
>>>
>>> Thank you.
>>>
>>> Best
>>> Anton
>>>
>>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:
>>>
>>>>
>>>> Thank you for you answer, Chris.
>>>>
>>>> > The whole “mixing omp libraries” is something that occurs in
>>>> production
>>>> every day and certainly in everything that uses mkl.
>>>>
>>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>>>> this mixture is not happening:
>>>>
>>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>>>> runtime library to work. As different OpenMP runtimes may not be binary
>>>> compatible it's important to ensure that only one OpenMP runtime is used
>>>> throughout the application. Having more than one OpenMP runtime initialized
>>>> may lead to undefined behavior resulting in incorrect results or crashes."
>>>> [1]
>>>>
>>>> That is why 2 different MKLML libraries are provided:
>>>>
>>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
>>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>>>> runtime
>>>>
>>>> > is the suggestion that libiomp be removed from mkl?
>>>>
>>>> That is certainly not my suggestion.
>>>>
>>>> > have you spoken with intel? have you consulted Intel at all?
>>>>
>>>> Yes, I have asked for comments on the issue.
>>>>
>>>> > “hard to debug random crash”. you’re seeing an assertion which is
>>>> probably ...
>>>>
>>>> I'm seeing the result of undefined behaviour. And I want to put
>>>> emphasis on the following statement:
>>>>
>>>> I disregards of whether there is a particular reason for the assert -
>>>> it is a result of behaviour that should not happen. There are valid ways
>>>> how to use llvm OpenMP in MXNet and the current way is not one of them.
>>>>
>>>> > The lack of root-causing the problem and knee-jerk solution here
>>>> makes me
>>>> uncomfortable.
>>>>
>>>> I hope that my efforts highlighting the problems reach you to mitigate
>>>> your uncomfort.
>>>>
>>>> > if you want to see performance differences there’s an environment
>>>> variable
>>>> you can set in the mxnet omp tuning code that will print overhead and
>>>> execution times for the current omp library.
>>>>
>>>> I don't want to see performance differences in the current OpenMP
>>>> library. I want to remove the current OpenMP library and use the one
>>>> provided by the compiler.
>>>>
>>>>
>>>>
>>>> Best
>>>> Anton
>>>>
>>>> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>>>
>>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cj...@apache.org>:
>>>>
>>>>> Do you not work on CI mostly? My apologies for thinking that was some
>>>>> sort
>>>>> of team effort between you and a few others that were passionate about
>>>>> CI
>>>>> keeping the CI system running smoothly.
>>>>>
>>>>> You have source code, you have the line the assertion is on. If you
>>>>> can’t
>>>>> describe what’s going wrong that causes the assertion, then I don’t
>>>>> really
>>>>> have anything more to add to this conversation beyond what’s below:
>>>>>
>>>>> The whole “mixing omp libraries” is something that occurs in production
>>>>> every day and certainly in everything that uses mkl.  It may
>>>>> occasionally
>>>>> cause problems for some edge cases when there is super-complex linking
>>>>> strategies and dynamic loading.  But this is not one of those edge
>>>>> cases.
>>>>> Mostly blaming this is a red herring for other thread-related problems
>>>>> and
>>>>> people switch omp library and the timing of their code changes and they
>>>>> stop seeing the problem. I’ve spent my entire career doing heavily
>>>>> multiphreaded c++ development and i’ve seen that a million times.  is
>>>>> the
>>>>> suggestion that libiomp be removed from mkl? have you spoken with
>>>>> intel?
>>>>> have you consulted Intel at all?
>>>>>
>>>>> and what you are seeing isn’t some “hard to debug random crash”. you’re
>>>>> seeing an assertion which is probably related to omp trying to create a
>>>>> thread pool after a fork and something was done in the mxnet code to
>>>>> make
>>>>> that sketchy to do. I’d suggest filing an issue with the llvm openmp
>>>>> just
>>>>> like you’d file with any other not-well-understood behavior in mxnet.
>>>>>
>>>>> The lack of root-causing the problem and knee-jerk solution here makes
>>>>> me
>>>>> uncomfortable.
>>>>>
>>>>> if you want to see performance differences there’s an environment
>>>>> variable
>>>>> you can set in the mxnet omp tuning code that will print overhead and
>>>>> execution times for the current omp library.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <me...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> > Hi Chris,
>>>>> >
>>>>> > Thank you for your answer. If you have noticed the initial email
>>>>> comes from
>>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not
>>>>> from any
>>>>> > 'Ci' team that you've mentioned, but from me personally.
>>>>> >
>>>>> > You are writing:
>>>>> >
>>>>> > > someone is doing something unhealthy when they fork ...
>>>>> >
>>>>> > I'm missing any context to understand what you mean.
>>>>> >
>>>>> > > we get a lot of performance gain from OMP ...
>>>>> >
>>>>> > There is no data that would prove this statement and therefore it is
>>>>> a
>>>>> > random guess.
>>>>> >
>>>>> > > in many months, no investigation has occurred as to WHY the
>>>>> assertion is
>>>>> > failing.
>>>>> >
>>>>> > The investigation has concluded that this is happening due to
>>>>> undefined
>>>>> > behaviour which is, in my opinion, a suffient answer that does not
>>>>> require
>>>>> > to go any deeper.
>>>>> >
>>>>> > > the pr is vetoed until such a time that the actual root cause of
>>>>> the
>>>>> > problem is known.
>>>>> >
>>>>> > And considering the statements above there is no valid reason to
>>>>> veto the
>>>>> > PR.
>>>>> >
>>>>> >
>>>>> > Best
>>>>> > Anton
>>>>> >
>>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
>>>>> >
>>>>> > > 3x less overhead*
>>>>> > >
>>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
>>>>> cjolivier01@gmail.com>
>>>>> > > wrote:
>>>>> > >
>>>>> > > > someone is doing something unhealthy when they fork, which is
>>>>> causing
>>>>> > an
>>>>> > > > assertion in the openmp library. the same assertion that would
>>>>> fire in
>>>>> > > mkl,
>>>>> > > > which is linked to libiomp5 (exact same omp library). this is new
>>>>> > > behavior
>>>>> > > > and most likely due to an error or suboptimal approach in the
>>>>> forking
>>>>> > > logic
>>>>> > > > in mxnet.
>>>>> > > >
>>>>> > > > in order to circumvent the assert, the Ci team is proposing to
>>>>> remove
>>>>> > the
>>>>> > > > library completely which is equivalent to cutting off your leg
>>>>> to make
>>>>> > > the
>>>>> > > > pain from stubbing your toe go away.
>>>>> > > >
>>>>> > > > we get a lot of performance gain from OMP. is has about a 1/3
>>>>> less
>>>>> > > > overhead for entering omp regions and also supports omp regions
>>>>> after a
>>>>> > > > fork, which libgomp does not.
>>>>> > > >
>>>>> > > > in many months, no investigation has occurred as to WHY the
>>>>> assertion
>>>>> > is
>>>>> > > > failing.
>>>>> > > >
>>>>> > > > the pr is vetoed until such a time that the actual root cause of
>>>>> the
>>>>> > > > problem is known.
>>>>> > > >
>>>>> > > >
>>>>> > > > thanks,
>>>>> > > >
>>>>> > > > -Chris.
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > >
>>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
>>>>> mechernov@gmail.com>
>>>>> > > wrote:
>>>>> > > >
>>>>> > > >> Dear MXNet community,
>>>>> > > >>
>>>>> > > >> I would like to drive attention to an important issue that is
>>>>> present
>>>>> > in
>>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
>>>>> > > >>
>>>>> > > >> I have opened a PR to remove it:
>>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
>>>>> > > >>
>>>>> > > >> The issue was closed, but I am strong in my oppinion that it's
>>>>> the
>>>>> > right
>>>>> > > >> thing to do.
>>>>> > > >>
>>>>> > > >> *Background*
>>>>> > > >> If you want to use OpenMP pragmas in your code for
>>>>> parallelization you
>>>>> > > >> would supply a special flag to the compiler:
>>>>> > > >>
>>>>> > > >> - Clang / -fopenmp
>>>>> > > >> https://openmp.llvm.org/
>>>>> > > >>
>>>>> > > >> - GCC / -fopenmp
>>>>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>>>>> > > >>
>>>>> > > >> - Intel / [Q]openmp
>>>>> > > >>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>>>> > > >>
>>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
>>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>>>>> > > >>
>>>>> > > >> Each of the compilers would enable the '#pragma omp' directive
>>>>> during
>>>>> > > >> C/C++
>>>>> > > >> compilation and arrange for automatic linking of the OpenMP
>>>>> runtime
>>>>> > > >> library
>>>>> > > >> supplied by each complier separately.
>>>>> > > >>
>>>>> > > >> Thus, to use the advantages of an OpenMP implementation one has
>>>>> to
>>>>> > > compile
>>>>> > > >> the code with the corresponding compiler.
>>>>> > > >>
>>>>> > > >> Currently, in MXNet CMake build scripts a bundled version of
>>>>> llvm
>>>>> > OpenMP
>>>>> > > >> is
>>>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied by the
>>>>> > > compiler.
>>>>> > > >>
>>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math
>>>>> Kernel
>>>>> > > >> Library
>>>>> > > >> for Deep Neural Networks):
>>>>> > > >>
>>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an
>>>>> OpenMP
>>>>> > > runtime
>>>>> > > >> library to work. As different OpenMP runtimes may not be binary
>>>>> > > compatible
>>>>> > > >> it's important to ensure that only one OpenMP runtime is used
>>>>> > throughout
>>>>> > > >> the application. Having more than one OpenMP runtime
>>>>> initialized may
>>>>> > > lead
>>>>> > > >> to undefined behavior resulting in incorrect results or
>>>>> crashes." [3]
>>>>> > > >>
>>>>> > > >> And:
>>>>> > > >>
>>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link
>>>>> the
>>>>> > > >> application with both Intel and GNU OpenMP runtime libraries.
>>>>> This
>>>>> > will
>>>>> > > >> lead to undefined behavior of the application." [4]
>>>>> > > >>
>>>>> > > >> As can be seen from ldd for MXNet:
>>>>> > > >>
>>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
>>>>> > > >>     libomp.so =>
>>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
>>>>> > > >> (0x00007f697bc55000)
>>>>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>>>>> > > >> (0x00007f69660cd000)
>>>>> > > >>
>>>>> > > >> *Performance*
>>>>> > > >>
>>>>> > > >> The only performance data related to OpenMP in MXNet I was able
>>>>> to
>>>>> > find
>>>>> > > is
>>>>> > > >> here:
>>>>> > > >>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>>>>> > > >>
>>>>> > > >> Which in my understanding is testing imact of different
>>>>> environment
>>>>> > > >> variables for the same setup (using same bundled OpenMP
>>>>> library).
>>>>> > > >>
>>>>> > > >> The libraries may differ in implementation and the Thread
>>>>> Affinity
>>>>> > > >> Interface [5] may have significant impact on performance.
>>>>> > > >>
>>>>> > > >> All compliers support it:
>>>>> > > >>
>>>>> > > >> - Clang / KMP_AFFINITY
>>>>> > > >>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>>>>> > > >>
>>>>> > > >> - GCC / GOMP_CPU_AFFINITY
>>>>> > > >>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>>>>> > > >>
>>>>> > > >> - Intel / KMP_AFFINITY
>>>>> > > >>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>>>> > > >>
>>>>> > > >> - Visual Studio / SetThreadAffinityMask
>>>>> > > >>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>>>>> > > >>
>>>>> > > >> *Issues*
>>>>> > > >>
>>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
>>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
>>>>> > > >>
>>>>> > > >> libomp.so dependency (need REAL fix)
>>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
>>>>> > > >>
>>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy
>>>>> with MKL
>>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
>>>>> > > >>
>>>>> > > >> Performance regression when OMP_NUM_THREADS environment
>>>>> variable is
>>>>> > not
>>>>> > > >> set
>>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
>>>>> > > >>
>>>>> > > >> Poor concat CPU performance on CUDA builds
>>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
>>>>> > > >>
>>>>> > > >> I would appreciate hearing your thoughts.
>>>>> > > >>
>>>>> > > >>
>>>>> > > >> Best
>>>>> > > >> Anton
>>>>> > > >>
>>>>> > > >> [1]
>>>>> > > >>
>>>>> > > >>
>>>>> > >
>>>>> >
>>>>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
>>>>> > > >> [2]
>>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
>>>>> > > >> [3]
>>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>>>> > > >> [4]
>>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
>>>>> > > >> [5] https://software.intel.com/en-us/node/522691
>>>>> > > >>
>>>>> > > >
>>>>> > >
>>>>> >
>>>>>
>>>>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

Recent benchmarking results have been published here [1]. Experiments
compare different OpenMP implementations as well as binaries compiled with
different compilers including GCC, Clang and ICC.

During experimentation another issues with mixing up libraries was
identified and described here [2].

Best
Anton

[1] https://cwiki.apache.org/confluence/x/2wclBg
[2]
https://github.com/apache/incubator-mxnet/issues/14087#issuecomment-461734041


вс, 9 дек. 2018 г. в 16:28, Anton Chernov <me...@gmail.com>:

> Hi Chris,
>
> Following up on the issue, are all things resolved in the discussion?
>
> If yes, I kindly ask you to reopen this PR and remove ‘requesting changes’
> status:
> https://github.com/apache/incubator-mxnet/pull/12160
>
> Thank you.
>
>
> Best
> Anton
>
>
> вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <me...@gmail.com>:
>
>> Another thing to take into consideration:
>>
>> All python artefacts that are created (PyPi) are built with make and are
>> not using the bundled OpenMP library.
>>
>> One step for the switch to CMake to happen is the approval and merging of
>> the mentioned PR:
>>
>> https://github.com/apache/incubator-mxnet/pull/12160
>>
>> If there are no other objections I kindly ask Chris Olivier to remove his
>> 'requesting changes' veto on it to unblock the CMake overhaul work.
>>
>> Thank you.
>>
>> Best
>> Anton
>>
>> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:
>>
>>>
>>> Thank you for you answer, Chris.
>>>
>>> > The whole “mixing omp libraries” is something that occurs in production
>>> every day and certainly in everything that uses mkl.
>>>
>>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>>> this mixture is not happening:
>>>
>>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>>> runtime library to work. As different OpenMP runtimes may not be binary
>>> compatible it's important to ensure that only one OpenMP runtime is used
>>> throughout the application. Having more than one OpenMP runtime initialized
>>> may lead to undefined behavior resulting in incorrect results or crashes."
>>> [1]
>>>
>>> That is why 2 different MKLML libraries are provided:
>>>
>>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
>>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>>> runtime
>>>
>>> > is the suggestion that libiomp be removed from mkl?
>>>
>>> That is certainly not my suggestion.
>>>
>>> > have you spoken with intel? have you consulted Intel at all?
>>>
>>> Yes, I have asked for comments on the issue.
>>>
>>> > “hard to debug random crash”. you’re seeing an assertion which is
>>> probably ...
>>>
>>> I'm seeing the result of undefined behaviour. And I want to put emphasis
>>> on the following statement:
>>>
>>> I disregards of whether there is a particular reason for the assert - it
>>> is a result of behaviour that should not happen. There are valid ways how
>>> to use llvm OpenMP in MXNet and the current way is not one of them.
>>>
>>> > The lack of root-causing the problem and knee-jerk solution here makes
>>> me
>>> uncomfortable.
>>>
>>> I hope that my efforts highlighting the problems reach you to mitigate
>>> your uncomfort.
>>>
>>> > if you want to see performance differences there’s an environment
>>> variable
>>> you can set in the mxnet omp tuning code that will print overhead and
>>> execution times for the current omp library.
>>>
>>> I don't want to see performance differences in the current OpenMP
>>> library. I want to remove the current OpenMP library and use the one
>>> provided by the compiler.
>>>
>>>
>>>
>>> Best
>>> Anton
>>>
>>> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>>
>>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cj...@apache.org>:
>>>
>>>> Do you not work on CI mostly? My apologies for thinking that was some
>>>> sort
>>>> of team effort between you and a few others that were passionate about
>>>> CI
>>>> keeping the CI system running smoothly.
>>>>
>>>> You have source code, you have the line the assertion is on. If you
>>>> can’t
>>>> describe what’s going wrong that causes the assertion, then I don’t
>>>> really
>>>> have anything more to add to this conversation beyond what’s below:
>>>>
>>>> The whole “mixing omp libraries” is something that occurs in production
>>>> every day and certainly in everything that uses mkl.  It may
>>>> occasionally
>>>> cause problems for some edge cases when there is super-complex linking
>>>> strategies and dynamic loading.  But this is not one of those edge
>>>> cases.
>>>> Mostly blaming this is a red herring for other thread-related problems
>>>> and
>>>> people switch omp library and the timing of their code changes and they
>>>> stop seeing the problem. I’ve spent my entire career doing heavily
>>>> multiphreaded c++ development and i’ve seen that a million times.  is
>>>> the
>>>> suggestion that libiomp be removed from mkl? have you spoken with intel?
>>>> have you consulted Intel at all?
>>>>
>>>> and what you are seeing isn’t some “hard to debug random crash”. you’re
>>>> seeing an assertion which is probably related to omp trying to create a
>>>> thread pool after a fork and something was done in the mxnet code to
>>>> make
>>>> that sketchy to do. I’d suggest filing an issue with the llvm openmp
>>>> just
>>>> like you’d file with any other not-well-understood behavior in mxnet.
>>>>
>>>> The lack of root-causing the problem and knee-jerk solution here makes
>>>> me
>>>> uncomfortable.
>>>>
>>>> if you want to see performance differences there’s an environment
>>>> variable
>>>> you can set in the mxnet omp tuning code that will print overhead and
>>>> execution times for the current omp library.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <me...@gmail.com>
>>>> wrote:
>>>>
>>>> > Hi Chris,
>>>> >
>>>> > Thank you for your answer. If you have noticed the initial email
>>>> comes from
>>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not
>>>> from any
>>>> > 'Ci' team that you've mentioned, but from me personally.
>>>> >
>>>> > You are writing:
>>>> >
>>>> > > someone is doing something unhealthy when they fork ...
>>>> >
>>>> > I'm missing any context to understand what you mean.
>>>> >
>>>> > > we get a lot of performance gain from OMP ...
>>>> >
>>>> > There is no data that would prove this statement and therefore it is a
>>>> > random guess.
>>>> >
>>>> > > in many months, no investigation has occurred as to WHY the
>>>> assertion is
>>>> > failing.
>>>> >
>>>> > The investigation has concluded that this is happening due to
>>>> undefined
>>>> > behaviour which is, in my opinion, a suffient answer that does not
>>>> require
>>>> > to go any deeper.
>>>> >
>>>> > > the pr is vetoed until such a time that the actual root cause of the
>>>> > problem is known.
>>>> >
>>>> > And considering the statements above there is no valid reason to veto
>>>> the
>>>> > PR.
>>>> >
>>>> >
>>>> > Best
>>>> > Anton
>>>> >
>>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
>>>> >
>>>> > > 3x less overhead*
>>>> > >
>>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <
>>>> cjolivier01@gmail.com>
>>>> > > wrote:
>>>> > >
>>>> > > > someone is doing something unhealthy when they fork, which is
>>>> causing
>>>> > an
>>>> > > > assertion in the openmp library. the same assertion that would
>>>> fire in
>>>> > > mkl,
>>>> > > > which is linked to libiomp5 (exact same omp library). this is new
>>>> > > behavior
>>>> > > > and most likely due to an error or suboptimal approach in the
>>>> forking
>>>> > > logic
>>>> > > > in mxnet.
>>>> > > >
>>>> > > > in order to circumvent the assert, the Ci team is proposing to
>>>> remove
>>>> > the
>>>> > > > library completely which is equivalent to cutting off your leg to
>>>> make
>>>> > > the
>>>> > > > pain from stubbing your toe go away.
>>>> > > >
>>>> > > > we get a lot of performance gain from OMP. is has about a 1/3 less
>>>> > > > overhead for entering omp regions and also supports omp regions
>>>> after a
>>>> > > > fork, which libgomp does not.
>>>> > > >
>>>> > > > in many months, no investigation has occurred as to WHY the
>>>> assertion
>>>> > is
>>>> > > > failing.
>>>> > > >
>>>> > > > the pr is vetoed until such a time that the actual root cause of
>>>> the
>>>> > > > problem is known.
>>>> > > >
>>>> > > >
>>>> > > > thanks,
>>>> > > >
>>>> > > > -Chris.
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > >
>>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <
>>>> mechernov@gmail.com>
>>>> > > wrote:
>>>> > > >
>>>> > > >> Dear MXNet community,
>>>> > > >>
>>>> > > >> I would like to drive attention to an important issue that is
>>>> present
>>>> > in
>>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
>>>> > > >>
>>>> > > >> I have opened a PR to remove it:
>>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
>>>> > > >>
>>>> > > >> The issue was closed, but I am strong in my oppinion that it's
>>>> the
>>>> > right
>>>> > > >> thing to do.
>>>> > > >>
>>>> > > >> *Background*
>>>> > > >> If you want to use OpenMP pragmas in your code for
>>>> parallelization you
>>>> > > >> would supply a special flag to the compiler:
>>>> > > >>
>>>> > > >> - Clang / -fopenmp
>>>> > > >> https://openmp.llvm.org/
>>>> > > >>
>>>> > > >> - GCC / -fopenmp
>>>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>>>> > > >>
>>>> > > >> - Intel / [Q]openmp
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>>> > > >>
>>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
>>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>>>> > > >>
>>>> > > >> Each of the compilers would enable the '#pragma omp' directive
>>>> during
>>>> > > >> C/C++
>>>> > > >> compilation and arrange for automatic linking of the OpenMP
>>>> runtime
>>>> > > >> library
>>>> > > >> supplied by each complier separately.
>>>> > > >>
>>>> > > >> Thus, to use the advantages of an OpenMP implementation one has
>>>> to
>>>> > > compile
>>>> > > >> the code with the corresponding compiler.
>>>> > > >>
>>>> > > >> Currently, in MXNet CMake build scripts a bundled version of llvm
>>>> > OpenMP
>>>> > > >> is
>>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied by the
>>>> > > compiler.
>>>> > > >>
>>>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math
>>>> Kernel
>>>> > > >> Library
>>>> > > >> for Deep Neural Networks):
>>>> > > >>
>>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an
>>>> OpenMP
>>>> > > runtime
>>>> > > >> library to work. As different OpenMP runtimes may not be binary
>>>> > > compatible
>>>> > > >> it's important to ensure that only one OpenMP runtime is used
>>>> > throughout
>>>> > > >> the application. Having more than one OpenMP runtime initialized
>>>> may
>>>> > > lead
>>>> > > >> to undefined behavior resulting in incorrect results or
>>>> crashes." [3]
>>>> > > >>
>>>> > > >> And:
>>>> > > >>
>>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link
>>>> the
>>>> > > >> application with both Intel and GNU OpenMP runtime libraries.
>>>> This
>>>> > will
>>>> > > >> lead to undefined behavior of the application." [4]
>>>> > > >>
>>>> > > >> As can be seen from ldd for MXNet:
>>>> > > >>
>>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
>>>> > > >>     libomp.so =>
>>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
>>>> > > >> (0x00007f697bc55000)
>>>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>>>> > > >> (0x00007f69660cd000)
>>>> > > >>
>>>> > > >> *Performance*
>>>> > > >>
>>>> > > >> The only performance data related to OpenMP in MXNet I was able
>>>> to
>>>> > find
>>>> > > is
>>>> > > >> here:
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>>>> > > >>
>>>> > > >> Which in my understanding is testing imact of different
>>>> environment
>>>> > > >> variables for the same setup (using same bundled OpenMP library).
>>>> > > >>
>>>> > > >> The libraries may differ in implementation and the Thread
>>>> Affinity
>>>> > > >> Interface [5] may have significant impact on performance.
>>>> > > >>
>>>> > > >> All compliers support it:
>>>> > > >>
>>>> > > >> - Clang / KMP_AFFINITY
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>>>> > > >>
>>>> > > >> - GCC / GOMP_CPU_AFFINITY
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>>>> > > >>
>>>> > > >> - Intel / KMP_AFFINITY
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>>> > > >>
>>>> > > >> - Visual Studio / SetThreadAffinityMask
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>>>> > > >>
>>>> > > >> *Issues*
>>>> > > >>
>>>> > > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
>>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
>>>> > > >>
>>>> > > >> libomp.so dependency (need REAL fix)
>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
>>>> > > >>
>>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy
>>>> with MKL
>>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
>>>> > > >>
>>>> > > >> Performance regression when OMP_NUM_THREADS environment variable
>>>> is
>>>> > not
>>>> > > >> set
>>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
>>>> > > >>
>>>> > > >> Poor concat CPU performance on CUDA builds
>>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
>>>> > > >>
>>>> > > >> I would appreciate hearing your thoughts.
>>>> > > >>
>>>> > > >>
>>>> > > >> Best
>>>> > > >> Anton
>>>> > > >>
>>>> > > >> [1]
>>>> > > >>
>>>> > > >>
>>>> > >
>>>> >
>>>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
>>>> > > >> [2]
>>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
>>>> > > >> [3]
>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>>> > > >> [4]
>>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
>>>> > > >> [5] https://software.intel.com/en-us/node/522691
>>>> > > >>
>>>> > > >
>>>> > >
>>>> >
>>>>
>>>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

Hi Chris,

Following up on the issue, are all things resolved in the discussion?

If yes, I kindly ask you to reopen this PR and remove ‘requesting changes’
status:
https://github.com/apache/incubator-mxnet/pull/12160

Thank you.


Best
Anton


вт, 27 нояб. 2018 г. в 17:15, Anton Chernov <me...@gmail.com>:

> Another thing to take into consideration:
>
> All python artefacts that are created (PyPi) are built with make and are
> not using the bundled OpenMP library.
>
> One step for the switch to CMake to happen is the approval and merging of
> the mentioned PR:
>
> https://github.com/apache/incubator-mxnet/pull/12160
>
> If there are no other objections I kindly ask Chris Olivier to remove his
> 'requesting changes' veto on it to unblock the CMake overhaul work.
>
> Thank you.
>
> Best
> Anton
>
> чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:
>
>>
>> Thank you for you answer, Chris.
>>
>> > The whole “mixing omp libraries” is something that occurs in production
>> every day and certainly in everything that uses mkl.
>>
>> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
>> this mixture is not happening:
>>
>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>> runtime library to work. As different OpenMP runtimes may not be binary
>> compatible it's important to ensure that only one OpenMP runtime is used
>> throughout the application. Having more than one OpenMP runtime initialized
>> may lead to undefined behavior resulting in incorrect results or crashes."
>> [1]
>>
>> That is why 2 different MKLML libraries are provided:
>>
>> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
>> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP
>> runtime
>>
>> > is the suggestion that libiomp be removed from mkl?
>>
>> That is certainly not my suggestion.
>>
>> > have you spoken with intel? have you consulted Intel at all?
>>
>> Yes, I have asked for comments on the issue.
>>
>> > “hard to debug random crash”. you’re seeing an assertion which is
>> probably ...
>>
>> I'm seeing the result of undefined behaviour. And I want to put emphasis
>> on the following statement:
>>
>> I disregards of whether there is a particular reason for the assert - it
>> is a result of behaviour that should not happen. There are valid ways how
>> to use llvm OpenMP in MXNet and the current way is not one of them.
>>
>> > The lack of root-causing the problem and knee-jerk solution here makes
>> me
>> uncomfortable.
>>
>> I hope that my efforts highlighting the problems reach you to mitigate
>> your uncomfort.
>>
>> > if you want to see performance differences there’s an environment
>> variable
>> you can set in the mxnet omp tuning code that will print overhead and
>> execution times for the current omp library.
>>
>> I don't want to see performance differences in the current OpenMP
>> library. I want to remove the current OpenMP library and use the one
>> provided by the compiler.
>>
>>
>>
>> Best
>> Anton
>>
>> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>
>> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cj...@apache.org>:
>>
>>> Do you not work on CI mostly? My apologies for thinking that was some
>>> sort
>>> of team effort between you and a few others that were passionate about CI
>>> keeping the CI system running smoothly.
>>>
>>> You have source code, you have the line the assertion is on. If you can’t
>>> describe what’s going wrong that causes the assertion, then I don’t
>>> really
>>> have anything more to add to this conversation beyond what’s below:
>>>
>>> The whole “mixing omp libraries” is something that occurs in production
>>> every day and certainly in everything that uses mkl.  It may occasionally
>>> cause problems for some edge cases when there is super-complex linking
>>> strategies and dynamic loading.  But this is not one of those edge cases.
>>> Mostly blaming this is a red herring for other thread-related problems
>>> and
>>> people switch omp library and the timing of their code changes and they
>>> stop seeing the problem. I’ve spent my entire career doing heavily
>>> multiphreaded c++ development and i’ve seen that a million times.  is the
>>> suggestion that libiomp be removed from mkl? have you spoken with intel?
>>> have you consulted Intel at all?
>>>
>>> and what you are seeing isn’t some “hard to debug random crash”. you’re
>>> seeing an assertion which is probably related to omp trying to create a
>>> thread pool after a fork and something was done in the mxnet code to make
>>> that sketchy to do. I’d suggest filing an issue with the llvm openmp just
>>> like you’d file with any other not-well-understood behavior in mxnet.
>>>
>>> The lack of root-causing the problem and knee-jerk solution here makes me
>>> uncomfortable.
>>>
>>> if you want to see performance differences there’s an environment
>>> variable
>>> you can set in the mxnet omp tuning code that will print overhead and
>>> execution times for the current omp library.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <me...@gmail.com>
>>> wrote:
>>>
>>> > Hi Chris,
>>> >
>>> > Thank you for your answer. If you have noticed the initial email comes
>>> from
>>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from
>>> any
>>> > 'Ci' team that you've mentioned, but from me personally.
>>> >
>>> > You are writing:
>>> >
>>> > > someone is doing something unhealthy when they fork ...
>>> >
>>> > I'm missing any context to understand what you mean.
>>> >
>>> > > we get a lot of performance gain from OMP ...
>>> >
>>> > There is no data that would prove this statement and therefore it is a
>>> > random guess.
>>> >
>>> > > in many months, no investigation has occurred as to WHY the
>>> assertion is
>>> > failing.
>>> >
>>> > The investigation has concluded that this is happening due to undefined
>>> > behaviour which is, in my opinion, a suffient answer that does not
>>> require
>>> > to go any deeper.
>>> >
>>> > > the pr is vetoed until such a time that the actual root cause of the
>>> > problem is known.
>>> >
>>> > And considering the statements above there is no valid reason to veto
>>> the
>>> > PR.
>>> >
>>> >
>>> > Best
>>> > Anton
>>> >
>>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
>>> >
>>> > > 3x less overhead*
>>> > >
>>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cjolivier01@gmail.com
>>> >
>>> > > wrote:
>>> > >
>>> > > > someone is doing something unhealthy when they fork, which is
>>> causing
>>> > an
>>> > > > assertion in the openmp library. the same assertion that would
>>> fire in
>>> > > mkl,
>>> > > > which is linked to libiomp5 (exact same omp library). this is new
>>> > > behavior
>>> > > > and most likely due to an error or suboptimal approach in the
>>> forking
>>> > > logic
>>> > > > in mxnet.
>>> > > >
>>> > > > in order to circumvent the assert, the Ci team is proposing to
>>> remove
>>> > the
>>> > > > library completely which is equivalent to cutting off your leg to
>>> make
>>> > > the
>>> > > > pain from stubbing your toe go away.
>>> > > >
>>> > > > we get a lot of performance gain from OMP. is has about a 1/3 less
>>> > > > overhead for entering omp regions and also supports omp regions
>>> after a
>>> > > > fork, which libgomp does not.
>>> > > >
>>> > > > in many months, no investigation has occurred as to WHY the
>>> assertion
>>> > is
>>> > > > failing.
>>> > > >
>>> > > > the pr is vetoed until such a time that the actual root cause of
>>> the
>>> > > > problem is known.
>>> > > >
>>> > > >
>>> > > > thanks,
>>> > > >
>>> > > > -Chris.
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <mechernov@gmail.com
>>> >
>>> > > wrote:
>>> > > >
>>> > > >> Dear MXNet community,
>>> > > >>
>>> > > >> I would like to drive attention to an important issue that is
>>> present
>>> > in
>>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
>>> > > >>
>>> > > >> I have opened a PR to remove it:
>>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
>>> > > >>
>>> > > >> The issue was closed, but I am strong in my oppinion that it's the
>>> > right
>>> > > >> thing to do.
>>> > > >>
>>> > > >> *Background*
>>> > > >> If you want to use OpenMP pragmas in your code for
>>> parallelization you
>>> > > >> would supply a special flag to the compiler:
>>> > > >>
>>> > > >> - Clang / -fopenmp
>>> > > >> https://openmp.llvm.org/
>>> > > >>
>>> > > >> - GCC / -fopenmp
>>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>>> > > >>
>>> > > >> - Intel / [Q]openmp
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>> > > >>
>>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
>>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>>> > > >>
>>> > > >> Each of the compilers would enable the '#pragma omp' directive
>>> during
>>> > > >> C/C++
>>> > > >> compilation and arrange for automatic linking of the OpenMP
>>> runtime
>>> > > >> library
>>> > > >> supplied by each complier separately.
>>> > > >>
>>> > > >> Thus, to use the advantages of an OpenMP implementation one has to
>>> > > compile
>>> > > >> the code with the corresponding compiler.
>>> > > >>
>>> > > >> Currently, in MXNet CMake build scripts a bundled version of llvm
>>> > OpenMP
>>> > > >> is
>>> > > >> used ([1] and [2]) to replace the OpenMP library supplied by the
>>> > > compiler.
>>> > > >>
>>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math
>>> Kernel
>>> > > >> Library
>>> > > >> for Deep Neural Networks):
>>> > > >>
>>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>>> > > runtime
>>> > > >> library to work. As different OpenMP runtimes may not be binary
>>> > > compatible
>>> > > >> it's important to ensure that only one OpenMP runtime is used
>>> > throughout
>>> > > >> the application. Having more than one OpenMP runtime initialized
>>> may
>>> > > lead
>>> > > >> to undefined behavior resulting in incorrect results or crashes."
>>> [3]
>>> > > >>
>>> > > >> And:
>>> > > >>
>>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link
>>> the
>>> > > >> application with both Intel and GNU OpenMP runtime libraries. This
>>> > will
>>> > > >> lead to undefined behavior of the application." [4]
>>> > > >>
>>> > > >> As can be seen from ldd for MXNet:
>>> > > >>
>>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
>>> > > >>     libomp.so =>
>>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
>>> > > >> (0x00007f697bc55000)
>>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>>> > > >> (0x00007f69660cd000)
>>> > > >>
>>> > > >> *Performance*
>>> > > >>
>>> > > >> The only performance data related to OpenMP in MXNet I was able to
>>> > find
>>> > > is
>>> > > >> here:
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>>> > > >>
>>> > > >> Which in my understanding is testing imact of different
>>> environment
>>> > > >> variables for the same setup (using same bundled OpenMP library).
>>> > > >>
>>> > > >> The libraries may differ in implementation and the Thread Affinity
>>> > > >> Interface [5] may have significant impact on performance.
>>> > > >>
>>> > > >> All compliers support it:
>>> > > >>
>>> > > >> - Clang / KMP_AFFINITY
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>>> > > >>
>>> > > >> - GCC / GOMP_CPU_AFFINITY
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>>> > > >>
>>> > > >> - Intel / KMP_AFFINITY
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>> > > >>
>>> > > >> - Visual Studio / SetThreadAffinityMask
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>>> > > >>
>>> > > >> *Issues*
>>> > > >>
>>> > > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
>>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
>>> > > >>
>>> > > >> libomp.so dependency (need REAL fix)
>>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
>>> > > >>
>>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with
>>> MKL
>>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
>>> > > >>
>>> > > >> Performance regression when OMP_NUM_THREADS environment variable
>>> is
>>> > not
>>> > > >> set
>>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
>>> > > >>
>>> > > >> Poor concat CPU performance on CUDA builds
>>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
>>> > > >>
>>> > > >> I would appreciate hearing your thoughts.
>>> > > >>
>>> > > >>
>>> > > >> Best
>>> > > >> Anton
>>> > > >>
>>> > > >> [1]
>>> > > >>
>>> > > >>
>>> > >
>>> >
>>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
>>> > > >> [2]
>>> https://github.com/apache/incubator-mxnet/tree/master/3rdparty
>>> > > >> [3]
>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>>> > > >> [4]
>>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
>>> > > >> [5] https://software.intel.com/en-us/node/522691
>>> > > >>
>>> > > >
>>> > >
>>> >
>>>
>>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

Another thing to take into consideration:

All python artefacts that are created (PyPi) are built with make and are
not using the bundled OpenMP library.

One step for the switch to CMake to happen is the approval and merging of
the mentioned PR:

https://github.com/apache/incubator-mxnet/pull/12160

If there are no other objections I kindly ask Chris Olivier to remove his
'requesting changes' veto on it to unblock the CMake overhaul work.

Thank you.

Best
Anton

чт, 22 нояб. 2018 г. в 17:11, Anton Chernov <me...@gmail.com>:

>
> Thank you for you answer, Chris.
>
> > The whole “mixing omp libraries” is something that occurs in production
> every day and certainly in everything that uses mkl.
>
> I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
> this mixture is not happening:
>
> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
> library to work. As different OpenMP runtimes may not be binary compatible
> it's important to ensure that only one OpenMP runtime is used throughout
> the application. Having more than one OpenMP runtime initialized may lead
> to undefined behavior resulting in incorrect results or crashes." [1]
>
> That is why 2 different MKLML libraries are provided:
>
> lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
> lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP runtime
>
> > is the suggestion that libiomp be removed from mkl?
>
> That is certainly not my suggestion.
>
> > have you spoken with intel? have you consulted Intel at all?
>
> Yes, I have asked for comments on the issue.
>
> > “hard to debug random crash”. you’re seeing an assertion which is
> probably ...
>
> I'm seeing the result of undefined behaviour. And I want to put emphasis
> on the following statement:
>
> I disregards of whether there is a particular reason for the assert - it
> is a result of behaviour that should not happen. There are valid ways how
> to use llvm OpenMP in MXNet and the current way is not one of them.
>
> > The lack of root-causing the problem and knee-jerk solution here makes me
> uncomfortable.
>
> I hope that my efforts highlighting the problems reach you to mitigate
> your uncomfort.
>
> > if you want to see performance differences there’s an environment
> variable
> you can set in the mxnet omp tuning code that will print overhead and
> execution times for the current omp library.
>
> I don't want to see performance differences in the current OpenMP library.
> I want to remove the current OpenMP library and use the one provided by the
> compiler.
>
>
>
> Best
> Anton
>
> [1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>
> чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cj...@apache.org>:
>
>> Do you not work on CI mostly? My apologies for thinking that was some sort
>> of team effort between you and a few others that were passionate about CI
>> keeping the CI system running smoothly.
>>
>> You have source code, you have the line the assertion is on. If you can’t
>> describe what’s going wrong that causes the assertion, then I don’t really
>> have anything more to add to this conversation beyond what’s below:
>>
>> The whole “mixing omp libraries” is something that occurs in production
>> every day and certainly in everything that uses mkl.  It may occasionally
>> cause problems for some edge cases when there is super-complex linking
>> strategies and dynamic loading.  But this is not one of those edge cases.
>> Mostly blaming this is a red herring for other thread-related problems and
>> people switch omp library and the timing of their code changes and they
>> stop seeing the problem. I’ve spent my entire career doing heavily
>> multiphreaded c++ development and i’ve seen that a million times.  is the
>> suggestion that libiomp be removed from mkl? have you spoken with intel?
>> have you consulted Intel at all?
>>
>> and what you are seeing isn’t some “hard to debug random crash”. you’re
>> seeing an assertion which is probably related to omp trying to create a
>> thread pool after a fork and something was done in the mxnet code to make
>> that sketchy to do. I’d suggest filing an issue with the llvm openmp just
>> like you’d file with any other not-well-understood behavior in mxnet.
>>
>> The lack of root-causing the problem and knee-jerk solution here makes me
>> uncomfortable.
>>
>> if you want to see performance differences there’s an environment variable
>> you can set in the mxnet omp tuning code that will print overhead and
>> execution times for the current omp library.
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <me...@gmail.com>
>> wrote:
>>
>> > Hi Chris,
>> >
>> > Thank you for your answer. If you have noticed the initial email comes
>> from
>> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from
>> any
>> > 'Ci' team that you've mentioned, but from me personally.
>> >
>> > You are writing:
>> >
>> > > someone is doing something unhealthy when they fork ...
>> >
>> > I'm missing any context to understand what you mean.
>> >
>> > > we get a lot of performance gain from OMP ...
>> >
>> > There is no data that would prove this statement and therefore it is a
>> > random guess.
>> >
>> > > in many months, no investigation has occurred as to WHY the assertion
>> is
>> > failing.
>> >
>> > The investigation has concluded that this is happening due to undefined
>> > behaviour which is, in my opinion, a suffient answer that does not
>> require
>> > to go any deeper.
>> >
>> > > the pr is vetoed until such a time that the actual root cause of the
>> > problem is known.
>> >
>> > And considering the statements above there is no valid reason to veto
>> the
>> > PR.
>> >
>> >
>> > Best
>> > Anton
>> >
>> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
>> >
>> > > 3x less overhead*
>> > >
>> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cj...@gmail.com>
>> > > wrote:
>> > >
>> > > > someone is doing something unhealthy when they fork, which is
>> causing
>> > an
>> > > > assertion in the openmp library. the same assertion that would fire
>> in
>> > > mkl,
>> > > > which is linked to libiomp5 (exact same omp library). this is new
>> > > behavior
>> > > > and most likely due to an error or suboptimal approach in the
>> forking
>> > > logic
>> > > > in mxnet.
>> > > >
>> > > > in order to circumvent the assert, the Ci team is proposing to
>> remove
>> > the
>> > > > library completely which is equivalent to cutting off your leg to
>> make
>> > > the
>> > > > pain from stubbing your toe go away.
>> > > >
>> > > > we get a lot of performance gain from OMP. is has about a 1/3 less
>> > > > overhead for entering omp regions and also supports omp regions
>> after a
>> > > > fork, which libgomp does not.
>> > > >
>> > > > in many months, no investigation has occurred as to WHY the
>> assertion
>> > is
>> > > > failing.
>> > > >
>> > > > the pr is vetoed until such a time that the actual root cause of the
>> > > > problem is known.
>> > > >
>> > > >
>> > > > thanks,
>> > > >
>> > > > -Chris.
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <me...@gmail.com>
>> > > wrote:
>> > > >
>> > > >> Dear MXNet community,
>> > > >>
>> > > >> I would like to drive attention to an important issue that is
>> present
>> > in
>> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
>> > > >>
>> > > >> I have opened a PR to remove it:
>> > > >> https://github.com/apache/incubator-mxnet/pull/12160
>> > > >>
>> > > >> The issue was closed, but I am strong in my oppinion that it's the
>> > right
>> > > >> thing to do.
>> > > >>
>> > > >> *Background*
>> > > >> If you want to use OpenMP pragmas in your code for parallelization
>> you
>> > > >> would supply a special flag to the compiler:
>> > > >>
>> > > >> - Clang / -fopenmp
>> > > >> https://openmp.llvm.org/
>> > > >>
>> > > >> - GCC / -fopenmp
>> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>> > > >>
>> > > >> - Intel / [Q]openmp
>> > > >>
>> > > >>
>> > >
>> >
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>> > > >>
>> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
>> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>> > > >>
>> > > >> Each of the compilers would enable the '#pragma omp' directive
>> during
>> > > >> C/C++
>> > > >> compilation and arrange for automatic linking of the OpenMP runtime
>> > > >> library
>> > > >> supplied by each complier separately.
>> > > >>
>> > > >> Thus, to use the advantages of an OpenMP implementation one has to
>> > > compile
>> > > >> the code with the corresponding compiler.
>> > > >>
>> > > >> Currently, in MXNet CMake build scripts a bundled version of llvm
>> > OpenMP
>> > > >> is
>> > > >> used ([1] and [2]) to replace the OpenMP library supplied by the
>> > > compiler.
>> > > >>
>> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
>> > > >> Library
>> > > >> for Deep Neural Networks):
>> > > >>
>> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
>> > > runtime
>> > > >> library to work. As different OpenMP runtimes may not be binary
>> > > compatible
>> > > >> it's important to ensure that only one OpenMP runtime is used
>> > throughout
>> > > >> the application. Having more than one OpenMP runtime initialized
>> may
>> > > lead
>> > > >> to undefined behavior resulting in incorrect results or crashes."
>> [3]
>> > > >>
>> > > >> And:
>> > > >>
>> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link the
>> > > >> application with both Intel and GNU OpenMP runtime libraries. This
>> > will
>> > > >> lead to undefined behavior of the application." [4]
>> > > >>
>> > > >> As can be seen from ldd for MXNet:
>> > > >>
>> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
>> > > >>     libomp.so =>
>> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
>> > > >> (0x00007f697bc55000)
>> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>> > > >> (0x00007f69660cd000)
>> > > >>
>> > > >> *Performance*
>> > > >>
>> > > >> The only performance data related to OpenMP in MXNet I was able to
>> > find
>> > > is
>> > > >> here:
>> > > >>
>> > > >>
>> > >
>> >
>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>> > > >>
>> > > >> Which in my understanding is testing imact of different environment
>> > > >> variables for the same setup (using same bundled OpenMP library).
>> > > >>
>> > > >> The libraries may differ in implementation and the Thread Affinity
>> > > >> Interface [5] may have significant impact on performance.
>> > > >>
>> > > >> All compliers support it:
>> > > >>
>> > > >> - Clang / KMP_AFFINITY
>> > > >>
>> > > >>
>> > >
>> >
>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>> > > >>
>> > > >> - GCC / GOMP_CPU_AFFINITY
>> > > >>
>> > > >>
>> > >
>> >
>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>> > > >>
>> > > >> - Intel / KMP_AFFINITY
>> > > >>
>> > > >>
>> > >
>> >
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>> > > >>
>> > > >> - Visual Studio / SetThreadAffinityMask
>> > > >>
>> > > >>
>> > >
>> >
>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>> > > >>
>> > > >> *Issues*
>> > > >>
>> > > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
>> > > >> https://github.com/apache/incubator-mxnet/issues/10856
>> > > >>
>> > > >> libomp.so dependency (need REAL fix)
>> > > >> https://github.com/apache/incubator-mxnet/issues/11417
>> > > >>
>> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with
>> MKL
>> > > >> https://github.com/apache/incubator-mxnet/issues/8532
>> > > >>
>> > > >> Performance regression when OMP_NUM_THREADS environment variable is
>> > not
>> > > >> set
>> > > >> https://github.com/apache/incubator-mxnet/issues/9744
>> > > >>
>> > > >> Poor concat CPU performance on CUDA builds
>> > > >> https://github.com/apache/incubator-mxnet/issues/11905
>> > > >>
>> > > >> I would appreciate hearing your thoughts.
>> > > >>
>> > > >>
>> > > >> Best
>> > > >> Anton
>> > > >>
>> > > >> [1]
>> > > >>
>> > > >>
>> > >
>> >
>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
>> > > >> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
>> > > >> [3]
>> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>> > > >> [4]
>> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
>> > > >> [5] https://software.intel.com/en-us/node/522691
>> > > >>
>> > > >
>> > >
>> >
>>
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

Thank you for you answer, Chris.

> The whole “mixing omp libraries” is something that occurs in production
every day and certainly in everything that uses mkl.

I'm afraid this statement is wrong. Intel MKL-DNN strictly ensures that
this mixture is not happening:

"Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
library to work. As different OpenMP runtimes may not be binary compatible
it's important to ensure that only one OpenMP runtime is used throughout
the application. Having more than one OpenMP runtime initialized may lead
to undefined behavior resulting in incorrect results or crashes." [1]

That is why 2 different MKLML libraries are provided:

lib/libmklml_gnu.so  | Intel MKL small library for GNU* OpenMP runtime
lib/libmklml_intel.so | Intel MKL small library for Intel(R) OpenMP runtime

> is the suggestion that libiomp be removed from mkl?

That is certainly not my suggestion.

> have you spoken with intel? have you consulted Intel at all?

Yes, I have asked for comments on the issue.

> “hard to debug random crash”. you’re seeing an assertion which is
probably ...

I'm seeing the result of undefined behaviour. And I want to put emphasis on
the following statement:

I disregards of whether there is a particular reason for the assert - it is
a result of behaviour that should not happen. There are valid ways how to
use llvm OpenMP in MXNet and the current way is not one of them.

> The lack of root-causing the problem and knee-jerk solution here makes me
uncomfortable.

I hope that my efforts highlighting the problems reach you to mitigate your
uncomfort.

> if you want to see performance differences there’s an environment variable
you can set in the mxnet omp tuning code that will print overhead and
execution times for the current omp library.

I don't want to see performance differences in the current OpenMP library.
I want to remove the current OpenMP library and use the one provided by the
compiler.



Best
Anton

[1] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265

чт, 22 нояб. 2018 г. в 16:50, Chris Olivier <cj...@apache.org>:

> Do you not work on CI mostly? My apologies for thinking that was some sort
> of team effort between you and a few others that were passionate about CI
> keeping the CI system running smoothly.
>
> You have source code, you have the line the assertion is on. If you can’t
> describe what’s going wrong that causes the assertion, then I don’t really
> have anything more to add to this conversation beyond what’s below:
>
> The whole “mixing omp libraries” is something that occurs in production
> every day and certainly in everything that uses mkl.  It may occasionally
> cause problems for some edge cases when there is super-complex linking
> strategies and dynamic loading.  But this is not one of those edge cases.
> Mostly blaming this is a red herring for other thread-related problems and
> people switch omp library and the timing of their code changes and they
> stop seeing the problem. I’ve spent my entire career doing heavily
> multiphreaded c++ development and i’ve seen that a million times.  is the
> suggestion that libiomp be removed from mkl? have you spoken with intel?
> have you consulted Intel at all?
>
> and what you are seeing isn’t some “hard to debug random crash”. you’re
> seeing an assertion which is probably related to omp trying to create a
> thread pool after a fork and something was done in the mxnet code to make
> that sketchy to do. I’d suggest filing an issue with the llvm openmp just
> like you’d file with any other not-well-understood behavior in mxnet.
>
> The lack of root-causing the problem and knee-jerk solution here makes me
> uncomfortable.
>
> if you want to see performance differences there’s an environment variable
> you can set in the mxnet omp tuning code that will print overhead and
> execution times for the current omp library.
>
>
>
>
>
>
>
> On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <me...@gmail.com> wrote:
>
> > Hi Chris,
> >
> > Thank you for your answer. If you have noticed the initial email comes
> from
> > me, Anton Chernov (@lebeg on Github) and thus the proposal is not from
> any
> > 'Ci' team that you've mentioned, but from me personally.
> >
> > You are writing:
> >
> > > someone is doing something unhealthy when they fork ...
> >
> > I'm missing any context to understand what you mean.
> >
> > > we get a lot of performance gain from OMP ...
> >
> > There is no data that would prove this statement and therefore it is a
> > random guess.
> >
> > > in many months, no investigation has occurred as to WHY the assertion
> is
> > failing.
> >
> > The investigation has concluded that this is happening due to undefined
> > behaviour which is, in my opinion, a suffient answer that does not
> require
> > to go any deeper.
> >
> > > the pr is vetoed until such a time that the actual root cause of the
> > problem is known.
> >
> > And considering the statements above there is no valid reason to veto the
> > PR.
> >
> >
> > Best
> > Anton
> >
> > чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
> >
> > > 3x less overhead*
> > >
> > > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cj...@gmail.com>
> > > wrote:
> > >
> > > > someone is doing something unhealthy when they fork, which is causing
> > an
> > > > assertion in the openmp library. the same assertion that would fire
> in
> > > mkl,
> > > > which is linked to libiomp5 (exact same omp library). this is new
> > > behavior
> > > > and most likely due to an error or suboptimal approach in the forking
> > > logic
> > > > in mxnet.
> > > >
> > > > in order to circumvent the assert, the Ci team is proposing to remove
> > the
> > > > library completely which is equivalent to cutting off your leg to
> make
> > > the
> > > > pain from stubbing your toe go away.
> > > >
> > > > we get a lot of performance gain from OMP. is has about a 1/3 less
> > > > overhead for entering omp regions and also supports omp regions
> after a
> > > > fork, which libgomp does not.
> > > >
> > > > in many months, no investigation has occurred as to WHY the assertion
> > is
> > > > failing.
> > > >
> > > > the pr is vetoed until such a time that the actual root cause of the
> > > > problem is known.
> > > >
> > > >
> > > > thanks,
> > > >
> > > > -Chris.
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <me...@gmail.com>
> > > wrote:
> > > >
> > > >> Dear MXNet community,
> > > >>
> > > >> I would like to drive attention to an important issue that is
> present
> > in
> > > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> > > >>
> > > >> I have opened a PR to remove it:
> > > >> https://github.com/apache/incubator-mxnet/pull/12160
> > > >>
> > > >> The issue was closed, but I am strong in my oppinion that it's the
> > right
> > > >> thing to do.
> > > >>
> > > >> *Background*
> > > >> If you want to use OpenMP pragmas in your code for parallelization
> you
> > > >> would supply a special flag to the compiler:
> > > >>
> > > >> - Clang / -fopenmp
> > > >> https://openmp.llvm.org/
> > > >>
> > > >> - GCC / -fopenmp
> > > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > > >>
> > > >> - Intel / [Q]openmp
> > > >>
> > > >>
> > >
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > >>
> > > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > > >>
> > > >> Each of the compilers would enable the '#pragma omp' directive
> during
> > > >> C/C++
> > > >> compilation and arrange for automatic linking of the OpenMP runtime
> > > >> library
> > > >> supplied by each complier separately.
> > > >>
> > > >> Thus, to use the advantages of an OpenMP implementation one has to
> > > compile
> > > >> the code with the corresponding compiler.
> > > >>
> > > >> Currently, in MXNet CMake build scripts a bundled version of llvm
> > OpenMP
> > > >> is
> > > >> used ([1] and [2]) to replace the OpenMP library supplied by the
> > > compiler.
> > > >>
> > > >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
> > > >> Library
> > > >> for Deep Neural Networks):
> > > >>
> > > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> > > runtime
> > > >> library to work. As different OpenMP runtimes may not be binary
> > > compatible
> > > >> it's important to ensure that only one OpenMP runtime is used
> > throughout
> > > >> the application. Having more than one OpenMP runtime initialized may
> > > lead
> > > >> to undefined behavior resulting in incorrect results or crashes."
> [3]
> > > >>
> > > >> And:
> > > >>
> > > >> "Using GNU compiler with -fopenmp and -liomp5 options will link the
> > > >> application with both Intel and GNU OpenMP runtime libraries. This
> > will
> > > >> lead to undefined behavior of the application." [4]
> > > >>
> > > >> As can be seen from ldd for MXNet:
> > > >>
> > > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > > >>     libomp.so =>
> > /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > > >> (0x00007f697bc55000)
> > > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > > >> (0x00007f69660cd000)
> > > >>
> > > >> *Performance*
> > > >>
> > > >> The only performance data related to OpenMP in MXNet I was able to
> > find
> > > is
> > > >> here:
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > > >>
> > > >> Which in my understanding is testing imact of different environment
> > > >> variables for the same setup (using same bundled OpenMP library).
> > > >>
> > > >> The libraries may differ in implementation and the Thread Affinity
> > > >> Interface [5] may have significant impact on performance.
> > > >>
> > > >> All compliers support it:
> > > >>
> > > >> - Clang / KMP_AFFINITY
> > > >>
> > > >>
> > >
> >
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > > >>
> > > >> - GCC / GOMP_CPU_AFFINITY
> > > >>
> > > >>
> > >
> >
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > > >>
> > > >> - Intel / KMP_AFFINITY
> > > >>
> > > >>
> > >
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > > >>
> > > >> - Visual Studio / SetThreadAffinityMask
> > > >>
> > > >>
> > >
> >
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > > >>
> > > >> *Issues*
> > > >>
> > > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
> > > >> https://github.com/apache/incubator-mxnet/issues/10856
> > > >>
> > > >> libomp.so dependency (need REAL fix)
> > > >> https://github.com/apache/incubator-mxnet/issues/11417
> > > >>
> > > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with
> MKL
> > > >> https://github.com/apache/incubator-mxnet/issues/8532
> > > >>
> > > >> Performance regression when OMP_NUM_THREADS environment variable is
> > not
> > > >> set
> > > >> https://github.com/apache/incubator-mxnet/issues/9744
> > > >>
> > > >> Poor concat CPU performance on CUDA builds
> > > >> https://github.com/apache/incubator-mxnet/issues/11905
> > > >>
> > > >> I would appreciate hearing your thoughts.
> > > >>
> > > >>
> > > >> Best
> > > >> Anton
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > > >> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > > >> [3]
> https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > > >> [4]
> https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > > >> [5] https://software.intel.com/en-us/node/522691
> > > >>
> > > >
> > >
> >
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Chris Olivier <cj...@apache.org>.

Do you not work on CI mostly? My apologies for thinking that was some sort
of team effort between you and a few others that were passionate about CI
keeping the CI system running smoothly.

You have source code, you have the line the assertion is on. If you can’t
describe what’s going wrong that causes the assertion, then I don’t really
have anything more to add to this conversation beyond what’s below:

The whole “mixing omp libraries” is something that occurs in production
every day and certainly in everything that uses mkl.  It may occasionally
cause problems for some edge cases when there is super-complex linking
strategies and dynamic loading.  But this is not one of those edge cases.
Mostly blaming this is a red herring for other thread-related problems and
people switch omp library and the timing of their code changes and they
stop seeing the problem. I’ve spent my entire career doing heavily
multiphreaded c++ development and i’ve seen that a million times.  is the
suggestion that libiomp be removed from mkl? have you spoken with intel?
have you consulted Intel at all?

and what you are seeing isn’t some “hard to debug random crash”. you’re
seeing an assertion which is probably related to omp trying to create a
thread pool after a fork and something was done in the mxnet code to make
that sketchy to do. I’d suggest filing an issue with the llvm openmp just
like you’d file with any other not-well-understood behavior in mxnet.

The lack of root-causing the problem and knee-jerk solution here makes me
uncomfortable.

if you want to see performance differences there’s an environment variable
you can set in the mxnet omp tuning code that will print overhead and
execution times for the current omp library.







On Thu, Nov 22, 2018 at 7:12 AM Anton Chernov <me...@gmail.com> wrote:

> Hi Chris,
>
> Thank you for your answer. If you have noticed the initial email comes from
> me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any
> 'Ci' team that you've mentioned, but from me personally.
>
> You are writing:
>
> > someone is doing something unhealthy when they fork ...
>
> I'm missing any context to understand what you mean.
>
> > we get a lot of performance gain from OMP ...
>
> There is no data that would prove this statement and therefore it is a
> random guess.
>
> > in many months, no investigation has occurred as to WHY the assertion is
> failing.
>
> The investigation has concluded that this is happening due to undefined
> behaviour which is, in my opinion, a suffient answer that does not require
> to go any deeper.
>
> > the pr is vetoed until such a time that the actual root cause of the
> problem is known.
>
> And considering the statements above there is no valid reason to veto the
> PR.
>
>
> Best
> Anton
>
> чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
>
> > 3x less overhead*
> >
> > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > someone is doing something unhealthy when they fork, which is causing
> an
> > > assertion in the openmp library. the same assertion that would fire in
> > mkl,
> > > which is linked to libiomp5 (exact same omp library). this is new
> > behavior
> > > and most likely due to an error or suboptimal approach in the forking
> > logic
> > > in mxnet.
> > >
> > > in order to circumvent the assert, the Ci team is proposing to remove
> the
> > > library completely which is equivalent to cutting off your leg to make
> > the
> > > pain from stubbing your toe go away.
> > >
> > > we get a lot of performance gain from OMP. is has about a 1/3 less
> > > overhead for entering omp regions and also supports omp regions after a
> > > fork, which libgomp does not.
> > >
> > > in many months, no investigation has occurred as to WHY the assertion
> is
> > > failing.
> > >
> > > the pr is vetoed until such a time that the actual root cause of the
> > > problem is known.
> > >
> > >
> > > thanks,
> > >
> > > -Chris.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <me...@gmail.com>
> > wrote:
> > >
> > >> Dear MXNet community,
> > >>
> > >> I would like to drive attention to an important issue that is present
> in
> > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> > >>
> > >> I have opened a PR to remove it:
> > >> https://github.com/apache/incubator-mxnet/pull/12160
> > >>
> > >> The issue was closed, but I am strong in my oppinion that it's the
> right
> > >> thing to do.
> > >>
> > >> *Background*
> > >> If you want to use OpenMP pragmas in your code for parallelization you
> > >> would supply a special flag to the compiler:
> > >>
> > >> - Clang / -fopenmp
> > >> https://openmp.llvm.org/
> > >>
> > >> - GCC / -fopenmp
> > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > >>
> > >> - Intel / [Q]openmp
> > >>
> > >>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>
> > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > >>
> > >> Each of the compilers would enable the '#pragma omp' directive during
> > >> C/C++
> > >> compilation and arrange for automatic linking of the OpenMP runtime
> > >> library
> > >> supplied by each complier separately.
> > >>
> > >> Thus, to use the advantages of an OpenMP implementation one has to
> > compile
> > >> the code with the corresponding compiler.
> > >>
> > >> Currently, in MXNet CMake build scripts a bundled version of llvm
> OpenMP
> > >> is
> > >> used ([1] and [2]) to replace the OpenMP library supplied by the
> > compiler.
> > >>
> > >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
> > >> Library
> > >> for Deep Neural Networks):
> > >>
> > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> > runtime
> > >> library to work. As different OpenMP runtimes may not be binary
> > compatible
> > >> it's important to ensure that only one OpenMP runtime is used
> throughout
> > >> the application. Having more than one OpenMP runtime initialized may
> > lead
> > >> to undefined behavior resulting in incorrect results or crashes." [3]
> > >>
> > >> And:
> > >>
> > >> "Using GNU compiler with -fopenmp and -liomp5 options will link the
> > >> application with both Intel and GNU OpenMP runtime libraries. This
> will
> > >> lead to undefined behavior of the application." [4]
> > >>
> > >> As can be seen from ldd for MXNet:
> > >>
> > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > >>     libomp.so =>
> /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > >> (0x00007f697bc55000)
> > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > >> (0x00007f69660cd000)
> > >>
> > >> *Performance*
> > >>
> > >> The only performance data related to OpenMP in MXNet I was able to
> find
> > is
> > >> here:
> > >>
> > >>
> >
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > >>
> > >> Which in my understanding is testing imact of different environment
> > >> variables for the same setup (using same bundled OpenMP library).
> > >>
> > >> The libraries may differ in implementation and the Thread Affinity
> > >> Interface [5] may have significant impact on performance.
> > >>
> > >> All compliers support it:
> > >>
> > >> - Clang / KMP_AFFINITY
> > >>
> > >>
> >
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > >>
> > >> - GCC / GOMP_CPU_AFFINITY
> > >>
> > >>
> >
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > >>
> > >> - Intel / KMP_AFFINITY
> > >>
> > >>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>
> > >> - Visual Studio / SetThreadAffinityMask
> > >>
> > >>
> >
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > >>
> > >> *Issues*
> > >>
> > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
> > >> https://github.com/apache/incubator-mxnet/issues/10856
> > >>
> > >> libomp.so dependency (need REAL fix)
> > >> https://github.com/apache/incubator-mxnet/issues/11417
> > >>
> > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
> > >> https://github.com/apache/incubator-mxnet/issues/8532
> > >>
> > >> Performance regression when OMP_NUM_THREADS environment variable is
> not
> > >> set
> > >> https://github.com/apache/incubator-mxnet/issues/9744
> > >>
> > >> Poor concat CPU performance on CUDA builds
> > >> https://github.com/apache/incubator-mxnet/issues/11905
> > >>
> > >> I would appreciate hearing your thoughts.
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > >> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > >> [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > >> [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > >> [5] https://software.intel.com/en-us/node/522691
> > >>
> > >
> >
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Alfredo Luque <al...@airbnb.com.INVALID>.

The proposal here is not to eliminate the use of OpenMP but rather to use
the compiler's OpenMP implementation rather than a bundled one. I've been
bitten by issues with having multiple linked OpenMP implementations before
in another library and it was extremely difficult to debug.


It seems to me that tackling the issue with the assert is an orthogonal
issue altogether.

--Alfredo Luque

Software Engineer
Airbnb
Machine Learning Infrastructure

On Thu, Nov 22, 2018 at 10:12 AM Anton Chernov <me...@gmail.com> wrote:

> Hi Chris,
>
> Thank you for your answer. If you have noticed the initial email comes from
> me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any
> 'Ci' team that you've mentioned, but from me personally.
>
> You are writing:
>
> > someone is doing something unhealthy when they fork ...
>
> I'm missing any context to understand what you mean.
>
> > we get a lot of performance gain from OMP ...
>
> There is no data that would prove this statement and therefore it is a
> random guess.
>
> > in many months, no investigation has occurred as to WHY the assertion is
> failing.
>
> The investigation has concluded that this is happening due to undefined
> behaviour which is, in my opinion, a suffient answer that does not require
> to go any deeper.
>
> > the pr is vetoed until such a time that the actual root cause of the
> problem is known.
>
> And considering the statements above there is no valid reason to veto the
> PR.
>
>
> Best
> Anton
>
> чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:
>
> > 3x less overhead*
> >
> > On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cj...@gmail.com>
> > wrote:
> >
> > > someone is doing something unhealthy when they fork, which is causing
> an
> > > assertion in the openmp library. the same assertion that would fire in
> > mkl,
> > > which is linked to libiomp5 (exact same omp library). this is new
> > behavior
> > > and most likely due to an error or suboptimal approach in the forking
> > logic
> > > in mxnet.
> > >
> > > in order to circumvent the assert, the Ci team is proposing to remove
> the
> > > library completely which is equivalent to cutting off your leg to make
> > the
> > > pain from stubbing your toe go away.
> > >
> > > we get a lot of performance gain from OMP. is has about a 1/3 less
> > > overhead for entering omp regions and also supports omp regions after a
> > > fork, which libgomp does not.
> > >
> > > in many months, no investigation has occurred as to WHY the assertion
> is
> > > failing.
> > >
> > > the pr is vetoed until such a time that the actual root cause of the
> > > problem is known.
> > >
> > >
> > > thanks,
> > >
> > > -Chris.
> > >
> > >
> > >
> > >
> > > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <me...@gmail.com>
> > wrote:
> > >
> > >> Dear MXNet community,
> > >>
> > >> I would like to drive attention to an important issue that is present
> in
> > >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> > >>
> > >> I have opened a PR to remove it:
> > >> https://github.com/apache/incubator-mxnet/pull/12160
> > >>
> > >> The issue was closed, but I am strong in my oppinion that it's the
> right
> > >> thing to do.
> > >>
> > >> *Background*
> > >> If you want to use OpenMP pragmas in your code for parallelization you
> > >> would supply a special flag to the compiler:
> > >>
> > >> - Clang / -fopenmp
> > >> https://openmp.llvm.org/
> > >>
> > >> - GCC / -fopenmp
> > >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> > >>
> > >> - Intel / [Q]openmp
> > >>
> > >>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>
> > >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> > >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> > >>
> > >> Each of the compilers would enable the '#pragma omp' directive during
> > >> C/C++
> > >> compilation and arrange for automatic linking of the OpenMP runtime
> > >> library
> > >> supplied by each complier separately.
> > >>
> > >> Thus, to use the advantages of an OpenMP implementation one has to
> > compile
> > >> the code with the corresponding compiler.
> > >>
> > >> Currently, in MXNet CMake build scripts a bundled version of llvm
> OpenMP
> > >> is
> > >> used ([1] and [2]) to replace the OpenMP library supplied by the
> > compiler.
> > >>
> > >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
> > >> Library
> > >> for Deep Neural Networks):
> > >>
> > >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> > runtime
> > >> library to work. As different OpenMP runtimes may not be binary
> > compatible
> > >> it's important to ensure that only one OpenMP runtime is used
> throughout
> > >> the application. Having more than one OpenMP runtime initialized may
> > lead
> > >> to undefined behavior resulting in incorrect results or crashes." [3]
> > >>
> > >> And:
> > >>
> > >> "Using GNU compiler with -fopenmp and -liomp5 options will link the
> > >> application with both Intel and GNU OpenMP runtime libraries. This
> will
> > >> lead to undefined behavior of the application." [4]
> > >>
> > >> As can be seen from ldd for MXNet:
> > >>
> > >> $ ldd build/tests/mxnet_unit_tests | grep omp
> > >>     libomp.so =>
> /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> > >> (0x00007f697bc55000)
> > >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> > >> (0x00007f69660cd000)
> > >>
> > >> *Performance*
> > >>
> > >> The only performance data related to OpenMP in MXNet I was able to
> find
> > is
> > >> here:
> > >>
> > >>
> >
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> > >>
> > >> Which in my understanding is testing imact of different environment
> > >> variables for the same setup (using same bundled OpenMP library).
> > >>
> > >> The libraries may differ in implementation and the Thread Affinity
> > >> Interface [5] may have significant impact on performance.
> > >>
> > >> All compliers support it:
> > >>
> > >> - Clang / KMP_AFFINITY
> > >>
> > >>
> >
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> > >>
> > >> - GCC / GOMP_CPU_AFFINITY
> > >>
> > >>
> >
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> > >>
> > >> - Intel / KMP_AFFINITY
> > >>
> > >>
> >
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> > >>
> > >> - Visual Studio / SetThreadAffinityMask
> > >>
> > >>
> >
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> > >>
> > >> *Issues*
> > >>
> > >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
> > >> https://github.com/apache/incubator-mxnet/issues/10856
> > >>
> > >> libomp.so dependency (need REAL fix)
> > >> https://github.com/apache/incubator-mxnet/issues/11417
> > >>
> > >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
> > >> https://github.com/apache/incubator-mxnet/issues/8532
> > >>
> > >> Performance regression when OMP_NUM_THREADS environment variable is
> not
> > >> set
> > >> https://github.com/apache/incubator-mxnet/issues/9744
> > >>
> > >> Poor concat CPU performance on CUDA builds
> > >> https://github.com/apache/incubator-mxnet/issues/11905
> > >>
> > >> I would appreciate hearing your thoughts.
> > >>
> > >>
> > >> Best
> > >> Anton
> > >>
> > >> [1]
> > >>
> > >>
> >
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> > >> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> > >> [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> > >> [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> > >> [5] https://software.intel.com/en-us/node/522691
> > >>
> > >
> >
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Anton Chernov <me...@gmail.com>.

Hi Chris,

Thank you for your answer. If you have noticed the initial email comes from
me, Anton Chernov (@lebeg on Github) and thus the proposal is not from any
'Ci' team that you've mentioned, but from me personally.

You are writing:

> someone is doing something unhealthy when they fork ...

I'm missing any context to understand what you mean.

> we get a lot of performance gain from OMP ...

There is no data that would prove this statement and therefore it is a
random guess.

> in many months, no investigation has occurred as to WHY the assertion is
failing.

The investigation has concluded that this is happening due to undefined
behaviour which is, in my opinion, a suffient answer that does not require
to go any deeper.

> the pr is vetoed until such a time that the actual root cause of the
problem is known.

And considering the statements above there is no valid reason to veto the
PR.


Best
Anton

чт, 22 нояб. 2018 г. в 15:38, Chris Olivier <cj...@gmail.com>:

> 3x less overhead*
>
> On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cj...@gmail.com>
> wrote:
>
> > someone is doing something unhealthy when they fork, which is causing an
> > assertion in the openmp library. the same assertion that would fire in
> mkl,
> > which is linked to libiomp5 (exact same omp library). this is new
> behavior
> > and most likely due to an error or suboptimal approach in the forking
> logic
> > in mxnet.
> >
> > in order to circumvent the assert, the Ci team is proposing to remove the
> > library completely which is equivalent to cutting off your leg to make
> the
> > pain from stubbing your toe go away.
> >
> > we get a lot of performance gain from OMP. is has about a 1/3 less
> > overhead for entering omp regions and also supports omp regions after a
> > fork, which libgomp does not.
> >
> > in many months, no investigation has occurred as to WHY the assertion is
> > failing.
> >
> > the pr is vetoed until such a time that the actual root cause of the
> > problem is known.
> >
> >
> > thanks,
> >
> > -Chris.
> >
> >
> >
> >
> > On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <me...@gmail.com>
> wrote:
> >
> >> Dear MXNet community,
> >>
> >> I would like to drive attention to an important issue that is present in
> >> the MXNet CMake build: usage of bundled llvm OpenMP library.
> >>
> >> I have opened a PR to remove it:
> >> https://github.com/apache/incubator-mxnet/pull/12160
> >>
> >> The issue was closed, but I am strong in my oppinion that it's the right
> >> thing to do.
> >>
> >> *Background*
> >> If you want to use OpenMP pragmas in your code for parallelization you
> >> would supply a special flag to the compiler:
> >>
> >> - Clang / -fopenmp
> >> https://openmp.llvm.org/
> >>
> >> - GCC / -fopenmp
> >> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
> >>
> >> - Intel / [Q]openmp
> >>
> >>
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> >>
> >> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> >> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
> >>
> >> Each of the compilers would enable the '#pragma omp' directive during
> >> C/C++
> >> compilation and arrange for automatic linking of the OpenMP runtime
> >> library
> >> supplied by each complier separately.
> >>
> >> Thus, to use the advantages of an OpenMP implementation one has to
> compile
> >> the code with the corresponding compiler.
> >>
> >> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP
> >> is
> >> used ([1] and [2]) to replace the OpenMP library supplied by the
> compiler.
> >>
> >> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
> >> Library
> >> for Deep Neural Networks):
> >>
> >> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP
> runtime
> >> library to work. As different OpenMP runtimes may not be binary
> compatible
> >> it's important to ensure that only one OpenMP runtime is used throughout
> >> the application. Having more than one OpenMP runtime initialized may
> lead
> >> to undefined behavior resulting in incorrect results or crashes." [3]
> >>
> >> And:
> >>
> >> "Using GNU compiler with -fopenmp and -liomp5 options will link the
> >> application with both Intel and GNU OpenMP runtime libraries. This will
> >> lead to undefined behavior of the application." [4]
> >>
> >> As can be seen from ldd for MXNet:
> >>
> >> $ ldd build/tests/mxnet_unit_tests | grep omp
> >>     libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> >> (0x00007f697bc55000)
> >>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> >> (0x00007f69660cd000)
> >>
> >> *Performance*
> >>
> >> The only performance data related to OpenMP in MXNet I was able to find
> is
> >> here:
> >>
> >>
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
> >>
> >> Which in my understanding is testing imact of different environment
> >> variables for the same setup (using same bundled OpenMP library).
> >>
> >> The libraries may differ in implementation and the Thread Affinity
> >> Interface [5] may have significant impact on performance.
> >>
> >> All compliers support it:
> >>
> >> - Clang / KMP_AFFINITY
> >>
> >>
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
> >>
> >> - GCC / GOMP_CPU_AFFINITY
> >>
> >>
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
> >>
> >> - Intel / KMP_AFFINITY
> >>
> >>
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
> >>
> >> - Visual Studio / SetThreadAffinityMask
> >>
> >>
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
> >>
> >> *Issues*
> >>
> >> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
> >> https://github.com/apache/incubator-mxnet/issues/10856
> >>
> >> libomp.so dependency (need REAL fix)
> >> https://github.com/apache/incubator-mxnet/issues/11417
> >>
> >> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
> >> https://github.com/apache/incubator-mxnet/issues/8532
> >>
> >> Performance regression when OMP_NUM_THREADS environment variable is not
> >> set
> >> https://github.com/apache/incubator-mxnet/issues/9744
> >>
> >> Poor concat CPU performance on CUDA builds
> >> https://github.com/apache/incubator-mxnet/issues/11905
> >>
> >> I would appreciate hearing your thoughts.
> >>
> >>
> >> Best
> >> Anton
> >>
> >> [1]
> >>
> >>
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> >> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> >> [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> >> [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> >> [5] https://software.intel.com/en-us/node/522691
> >>
> >
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Chris Olivier <cj...@gmail.com>.

3x less overhead*

On Thu, Nov 22, 2018 at 6:25 AM Chris Olivier <cj...@gmail.com> wrote:

> someone is doing something unhealthy when they fork, which is causing an
> assertion in the openmp library. the same assertion that would fire in mkl,
> which is linked to libiomp5 (exact same omp library). this is new behavior
> and most likely due to an error or suboptimal approach in the forking logic
> in mxnet.
>
> in order to circumvent the assert, the Ci team is proposing to remove the
> library completely which is equivalent to cutting off your leg to make the
> pain from stubbing your toe go away.
>
> we get a lot of performance gain from OMP. is has about a 1/3 less
> overhead for entering omp regions and also supports omp regions after a
> fork, which libgomp does not.
>
> in many months, no investigation has occurred as to WHY the assertion is
> failing.
>
> the pr is vetoed until such a time that the actual root cause of the
> problem is known.
>
>
> thanks,
>
> -Chris.
>
>
>
>
> On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <me...@gmail.com> wrote:
>
>> Dear MXNet community,
>>
>> I would like to drive attention to an important issue that is present in
>> the MXNet CMake build: usage of bundled llvm OpenMP library.
>>
>> I have opened a PR to remove it:
>> https://github.com/apache/incubator-mxnet/pull/12160
>>
>> The issue was closed, but I am strong in my oppinion that it's the right
>> thing to do.
>>
>> *Background*
>> If you want to use OpenMP pragmas in your code for parallelization you
>> would supply a special flag to the compiler:
>>
>> - Clang / -fopenmp
>> https://openmp.llvm.org/
>>
>> - GCC / -fopenmp
>> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>>
>> - Intel / [Q]openmp
>>
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>
>> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
>> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>>
>> Each of the compilers would enable the '#pragma omp' directive during
>> C/C++
>> compilation and arrange for automatic linking of the OpenMP runtime
>> library
>> supplied by each complier separately.
>>
>> Thus, to use the advantages of an OpenMP implementation one has to compile
>> the code with the corresponding compiler.
>>
>> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP
>> is
>> used ([1] and [2]) to replace the OpenMP library supplied by the compiler.
>>
>> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel
>> Library
>> for Deep Neural Networks):
>>
>> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
>> library to work. As different OpenMP runtimes may not be binary compatible
>> it's important to ensure that only one OpenMP runtime is used throughout
>> the application. Having more than one OpenMP runtime initialized may lead
>> to undefined behavior resulting in incorrect results or crashes." [3]
>>
>> And:
>>
>> "Using GNU compiler with -fopenmp and -liomp5 options will link the
>> application with both Intel and GNU OpenMP runtime libraries. This will
>> lead to undefined behavior of the application." [4]
>>
>> As can be seen from ldd for MXNet:
>>
>> $ ldd build/tests/mxnet_unit_tests | grep omp
>>     libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
>> (0x00007f697bc55000)
>>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
>> (0x00007f69660cd000)
>>
>> *Performance*
>>
>> The only performance data related to OpenMP in MXNet I was able to find is
>> here:
>>
>> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>>
>> Which in my understanding is testing imact of different environment
>> variables for the same setup (using same bundled OpenMP library).
>>
>> The libraries may differ in implementation and the Thread Affinity
>> Interface [5] may have significant impact on performance.
>>
>> All compliers support it:
>>
>> - Clang / KMP_AFFINITY
>>
>> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>>
>> - GCC / GOMP_CPU_AFFINITY
>>
>> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>>
>> - Intel / KMP_AFFINITY
>>
>> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>>
>> - Visual Studio / SetThreadAffinityMask
>>
>> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>>
>> *Issues*
>>
>> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
>> https://github.com/apache/incubator-mxnet/issues/10856
>>
>> libomp.so dependency (need REAL fix)
>> https://github.com/apache/incubator-mxnet/issues/11417
>>
>> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
>> https://github.com/apache/incubator-mxnet/issues/8532
>>
>> Performance regression when OMP_NUM_THREADS environment variable is not
>> set
>> https://github.com/apache/incubator-mxnet/issues/9744
>>
>> Poor concat CPU performance on CUDA builds
>> https://github.com/apache/incubator-mxnet/issues/11905
>>
>> I would appreciate hearing your thoughts.
>>
>>
>> Best
>> Anton
>>
>> [1]
>>
>> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
>> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
>> [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
>> [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
>> [5] https://software.intel.com/en-us/node/522691
>>
>

Re: [Discussion] Remove bundled llvm OpenMP

Posted by Chris Olivier <cj...@gmail.com>.

someone is doing something unhealthy when they fork, which is causing an
assertion in the openmp library. the same assertion that would fire in mkl,
which is linked to libiomp5 (exact same omp library). this is new behavior
and most likely due to an error or suboptimal approach in the forking logic
in mxnet.

in order to circumvent the assert, the Ci team is proposing to remove the
library completely which is equivalent to cutting off your leg to make the
pain from stubbing your toe go away.

we get a lot of performance gain from OMP. is has about a 1/3 less overhead
for entering omp regions and also supports omp regions after a fork, which
libgomp does not.

in many months, no investigation has occurred as to WHY the assertion is
failing.

the pr is vetoed until such a time that the actual root cause of the
problem is known.


thanks,

-Chris.




On Thu, Nov 22, 2018 at 4:36 AM Anton Chernov <me...@gmail.com> wrote:

> Dear MXNet community,
>
> I would like to drive attention to an important issue that is present in
> the MXNet CMake build: usage of bundled llvm OpenMP library.
>
> I have opened a PR to remove it:
> https://github.com/apache/incubator-mxnet/pull/12160
>
> The issue was closed, but I am strong in my oppinion that it's the right
> thing to do.
>
> *Background*
> If you want to use OpenMP pragmas in your code for parallelization you
> would supply a special flag to the compiler:
>
> - Clang / -fopenmp
> https://openmp.llvm.org/
>
> - GCC / -fopenmp
> https://gcc.gnu.org/onlinedocs/libgomp/Enabling-OpenMP.html
>
> - Intel / [Q]openmp
>
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>
> - Visual Studio: /openmp (Enable OpenMP 2.0 Support)
> https://msdn.microsoft.com/en-us/library/tt15eb9t.aspx
>
> Each of the compilers would enable the '#pragma omp' directive during C/C++
> compilation and arrange for automatic linking of the OpenMP runtime library
> supplied by each complier separately.
>
> Thus, to use the advantages of an OpenMP implementation one has to compile
> the code with the corresponding compiler.
>
> Currently, in MXNet CMake build scripts a bundled version of llvm OpenMP is
> used ([1] and [2]) to replace the OpenMP library supplied by the compiler.
>
> I will quote here the README from the MKL-DNN (Intel(R) Math Kernel Library
> for Deep Neural Networks):
>
> "Intel MKL-DNN uses OpenMP* for parallelism and requires an OpenMP runtime
> library to work. As different OpenMP runtimes may not be binary compatible
> it's important to ensure that only one OpenMP runtime is used throughout
> the application. Having more than one OpenMP runtime initialized may lead
> to undefined behavior resulting in incorrect results or crashes." [3]
>
> And:
>
> "Using GNU compiler with -fopenmp and -liomp5 options will link the
> application with both Intel and GNU OpenMP runtime libraries. This will
> lead to undefined behavior of the application." [4]
>
> As can be seen from ldd for MXNet:
>
> $ ldd build/tests/mxnet_unit_tests | grep omp
>     libomp.so => /.../mxnet/build/3rdparty/openmp/runtime/src/libomp.so
> (0x00007f697bc55000)
>     libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1
> (0x00007f69660cd000)
>
> *Performance*
>
> The only performance data related to OpenMP in MXNet I was able to find is
> here:
>
> https://github.com/apache/incubator-mxnet/issues/9744#issuecomment-367711172
>
> Which in my understanding is testing imact of different environment
> variables for the same setup (using same bundled OpenMP library).
>
> The libraries may differ in implementation and the Thread Affinity
> Interface [5] may have significant impact on performance.
>
> All compliers support it:
>
> - Clang / KMP_AFFINITY
>
> https://github.com/clang-ykt/openmp/blob/master/runtime/src/kmp_affinity.cpp
>
> - GCC / GOMP_CPU_AFFINITY
>
> https://gcc.gnu.org/onlinedocs/gcc-4.7.1/libgomp/GOMP_005fCPU_005fAFFINITY.html
>
> - Intel / KMP_AFFINITY
>
> https://software.intel.com/en-us/node/522689#6E24682E-F411-4AE3-A04D-ECD81C7008D1
>
> - Visual Studio / SetThreadAffinityMask
>
> https://docs.microsoft.com/en-us/windows/desktop/api/winbase/nf-winbase-setthreadaffinitymask
>
> *Issues*
>
> Failed OpenMP assertion when loading MXNet compiled with DEBUG=1
> https://github.com/apache/incubator-mxnet/issues/10856
>
> libomp.so dependency (need REAL fix)
> https://github.com/apache/incubator-mxnet/issues/11417
>
> mxnet-mkl (v0.12.0) crash when using (conda-installed) numpy with MKL
> https://github.com/apache/incubator-mxnet/issues/8532
>
> Performance regression when OMP_NUM_THREADS environment variable is not set
> https://github.com/apache/incubator-mxnet/issues/9744
>
> Poor concat CPU performance on CUDA builds
> https://github.com/apache/incubator-mxnet/issues/11905
>
> I would appreciate hearing your thoughts.
>
>
> Best
> Anton
>
> [1]
>
> https://github.com/apache/incubator-mxnet/blob/master/CMakeLists.txt#L400-L405
> [2] https://github.com/apache/incubator-mxnet/tree/master/3rdparty
> [3] https://github.com/intel/mkl-dnn/blame/master/README.md#L261-L265
> [4] https://github.com/intel/mkl-dnn/blame/master/README.md#L278-L280
> [5] https://software.intel.com/en-us/node/522691
>