You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/09/25 09:12:01 UTC
[GitHub] leezu opened a new issue #12661: MXNet MKL conflicts with other MKL optimized packages

leezu opened a new issue #12661: MXNet MKL conflicts with other MKL optimized packages
URL: https://github.com/apache/incubator-mxnet/issues/12661
 
 
   ## Description
   For MXNet with MKL for BLAS, [MKL is statically linked](https://github.com/apache/incubator-mxnet/blob/c93c78e/make/config.mk#L128-L133). When importing and using mxnet in python but also using other python packages that separately link with MKL, the program will crash as having two MKLs  linked is not supported by Intel. An example for a package being incompatible with MXNet with MKL is Numpy.
   
   This is a serious issue as it makes usage of optimized MXNet build with optimized numpy build impossible.
   
   ## Environment info (Required)
   
   ```
   % python diagnose.py                                                                                                                                                         2m 12s ~ ip-172-31-91-127
   ----------Python Info----------
   Version      : 3.6.4
   Compiler     : GCC 7.2.0
   Build        : ('default', 'Jan 16 2018 18:10:19')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 10.0.1
   Directory    : /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.3.0
   Directory    : /home/ubuntu/.local/lib/python3.6/site-packages/mxnet
   Commit Hash   : b3be92f4a48bce62a5a8424271871c2f81c8f7f1
   ----------System Info----------
   Platform     : Linux-4.4.0-1067-aws-x86_64-with-debian-stretch-sid
   system       : Linux
   node         : ip-172-31-91-127
   release      : 4.4.0-1067-aws
   version      : #77-Ubuntu SMP Mon Aug 27 13:22:03 UTC 2018
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:          x86_64
   CPU op-mode(s):        32-bit, 64-bit
   Byte Order:            Little Endian
   CPU(s):                32
   On-line CPU(s) list:   0-31
   Thread(s) per core:    2
   Core(s) per socket:    16
   Socket(s):             1
   NUMA node(s):          1
   Vendor ID:             GenuineIntel
   CPU family:            6
   Model:                 79
   Model name:            Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
   Stepping:              1
   CPU MHz:               2699.804
   CPU max MHz:           3000.0000
   CPU min MHz:           1200.0000
   BogoMIPS:              4600.13
   Hypervisor vendor:     Xen
   Virtualization type:   full
   L1d cache:             32K
   L1i cache:             32K
   L2 cache:              256K
   L3 cache:              46080K
   NUMA node0 CPU(s):     0-31
   Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0020 sec, LOAD: 0.3378 sec.
   Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1504 sec, LOAD: 0.3767 sec.
   Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1830 sec, LOAD: 0.3789 sec.
   Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0292 sec, LOAD: 0.1328 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0022 sec, LOAD: 0.0899 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0036 sec, LOAD: 0.0250 sec.
   
   
   ```
   
   I'm using Python.
   
   ## Error Message:
   ```
   OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
   OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade perfo
   rmance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoidi
   ng static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable K
   MP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more
   information, please see http://www.intel.com/software/products/support/.
   zsh: abort (core dumped)  python test.py
   ```
   
   Examining the core file shows that the crash occurs once numpy tries to execute a MKL optimzed routine (after mxnet had already loaded MKL):
   
   ```
   #0  0x00007f902cb0a428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
   #1  0x00007f902cb0c02a in __GI_abort () at abort.c:89
   #2  0x00007f8e93927c53 in __kmp_abort_process () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
   #3  0x00007f8e939162fb in __kmp_fatal () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
   #4  0x00007f8e93926068 in __kmp_register_library_startup() () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
   #5  0x00007f8e93926d86 in __kmp_middle_initialize () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
   #6  0x00007f8e93910dae in omp_get_num_procs@OMP_1.0 () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
   #7  0x00007f8e9176989d in mkl_serv_domain_get_max_threads () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
   #8  0x00007f8e9182a81c in mkl_blas_dsyrk () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
   #9  0x00007f8e90bcc35b in dsyrk_ () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
   #10 0x00007f8e90bfeb44 in cblas_dsyrk () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
   #11 0x00007f902780683c in syrk.constprop () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
   #12 0x00007f9027905a9a in cblas_matrixproduct () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
   #13 0x00007f90278d8c19 in PyArray_MatrixProduct2 () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
   #14 0x00007f90278d9a58 in array_matrixproduct () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
   
   ```
   
   ## Minimum reproducible example
   
   
   ## Steps to reproduce
   - Install MKL optimized numpy (e.g. the default numpy included in conda https://docs.anaconda.com/mkl-optimizations/)
   - `pip install mxnet-mkl`
   
   ```
   In [1]: import mxnet as mx
   
   In [2]: mxnd = mx.nd.zeros((1000,1000))
   
   In [3]: mx.nd.dot(mxnd, mxnd)
   Out[3]:
   
   [[ 0.  0.  0. ...,  0.  0.  0.]
    [ 0.  0.  0. ...,  0.  0.  0.]
    [ 0.  0.  0. ...,  0.  0.  0.]
    ...,
    [ 0.  0.  0. ...,  0.  0.  0.]
    [ 0.  0.  0. ...,  0.  0.  0.]
    [ 0.  0.  0. ...,  0.  0.  0.]]
   <NDArray 1000x1000 @cpu(0)>
   
   In [4]: import numpy as np
   
   In [5]: npnd = np.zeros((1000,1000))
   
   In [6]: np.dot(npnd, npnd)
   OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
   OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
   zsh: abort (core dumped)  ipython
   
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services