You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2018/09/25 09:12:01 UTC
[GitHub] leezu opened a new issue #12661: MXNet MKL conflicts with other MKL
optimized packages
leezu opened a new issue #12661: MXNet MKL conflicts with other MKL optimized packages
URL: https://github.com/apache/incubator-mxnet/issues/12661
## Description
For MXNet with MKL for BLAS, [MKL is statically linked](https://github.com/apache/incubator-mxnet/blob/c93c78e/make/config.mk#L128-L133). When importing and using mxnet in python but also using other python packages that separately link with MKL, the program will crash as having two MKLs linked is not supported by Intel. An example for a package being incompatible with MXNet with MKL is Numpy.
This is a serious issue as it makes usage of optimized MXNet build with optimized numpy build impossible.
## Environment info (Required)
```
% python diagnose.py 2m 12s ~ ip-172-31-91-127
----------Python Info----------
Version : 3.6.4
Compiler : GCC 7.2.0
Build : ('default', 'Jan 16 2018 18:10:19')
Arch : ('64bit', '')
------------Pip Info-----------
Version : 10.0.1
Directory : /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/pip
----------MXNet Info-----------
Version : 1.3.0
Directory : /home/ubuntu/.local/lib/python3.6/site-packages/mxnet
Commit Hash : b3be92f4a48bce62a5a8424271871c2f81c8f7f1
----------System Info----------
Platform : Linux-4.4.0-1067-aws-x86_64-with-debian-stretch-sid
system : Linux
node : ip-172-31-91-127
release : 4.4.0-1067-aws
version : #77-Ubuntu SMP Mon Aug 27 13:22:03 UTC 2018
----------Hardware Info----------
machine : x86_64
processor : x86_64
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 79
Model name: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
Stepping: 1
CPU MHz: 2699.804
CPU max MHz: 3000.0000
CPU min MHz: 1200.0000
BogoMIPS: 4600.13
Hypervisor vendor: Xen
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 46080K
NUMA node0 CPU(s): 0-31
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single kaiser fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx xsaveopt
----------Network Test----------
Setting timeout: 10
Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0020 sec, LOAD: 0.3378 sec.
Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 0.1504 sec, LOAD: 0.3767 sec.
Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.1830 sec, LOAD: 0.3789 sec.
Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0292 sec, LOAD: 0.1328 sec.
Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.0022 sec, LOAD: 0.0899 sec.
Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.0036 sec, LOAD: 0.0250 sec.
```
I'm using Python.
## Error Message:
```
OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade perfo
rmance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoidi
ng static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable K
MP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more
information, please see http://www.intel.com/software/products/support/.
zsh: abort (core dumped) python test.py
```
Examining the core file shows that the crash occurs once numpy tries to execute a MKL optimzed routine (after mxnet had already loaded MKL):
```
#0 0x00007f902cb0a428 in __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007f902cb0c02a in __GI_abort () at abort.c:89
#2 0x00007f8e93927c53 in __kmp_abort_process () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#3 0x00007f8e939162fb in __kmp_fatal () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#4 0x00007f8e93926068 in __kmp_register_library_startup() () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#5 0x00007f8e93926d86 in __kmp_middle_initialize () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#6 0x00007f8e93910dae in omp_get_num_procs@OMP_1.0 () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libiomp5.so
#7 0x00007f8e9176989d in mkl_serv_domain_get_max_threads () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
#8 0x00007f8e9182a81c in mkl_blas_dsyrk () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_thread.so
#9 0x00007f8e90bcc35b in dsyrk_ () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
#10 0x00007f8e90bfeb44 in cblas_dsyrk () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/../../../../libmkl_intel_lp64.so
#11 0x00007f902780683c in syrk.constprop () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#12 0x00007f9027905a9a in cblas_matrixproduct () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#13 0x00007f90278d8c19 in PyArray_MatrixProduct2 () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#14 0x00007f90278d9a58 in array_matrixproduct () from /home/ubuntu/anaconda3/envs/python3/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
```
## Minimum reproducible example
## Steps to reproduce
- Install MKL optimized numpy (e.g. the default numpy included in conda https://docs.anaconda.com/mkl-optimizations/)
- `pip install mxnet-mkl`
```
In [1]: import mxnet as mx
In [2]: mxnd = mx.nd.zeros((1000,1000))
In [3]: mx.nd.dot(mxnd, mxnd)
Out[3]:
[[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
...,
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]
[ 0. 0. 0. ..., 0. 0. 0.]]
<NDArray 1000x1000 @cpu(0)>
In [4]: import numpy as np
In [5]: npnd = np.zeros((1000,1000))
In [6]: np.dot(npnd, npnd)
OMP: Error #15: Initializing libiomp5.so, but found libiomp5.so already initialized.
OMP: Hint: This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.
zsh: abort (core dumped) ipython
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services