You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/01/07 16:49:44 UTC

[GitHub] [incubator-mxnet] mjsML opened a new issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

mjsML opened a new issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238
 
 
   ## Description
   Trying to build mxnet with the following config :
   ```cmake -DUSE_CUDA=1 -DUSE_CUDA_PATH=/usr/local/cuda -DUSE_CUDNN=1 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -DUSE_TENSORRT=1 -DUSE_NCCL=1 -DUSE_NCCL_PATH=/usr/local -DMKL_INCLUDE_DIR=/home/mj/intel/mkl/include -DMKL_RT_LIBRARY=/home/mj/intel/mkl/lib -GNinja .. ```
   
   cmake generates the make files successfully. Then the build fails at the very end. the cmakeerror log is below:
   ### Error Message
   ```
   Determining if the pthread_create exist failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_35eea 
   [1/2] Building C object CMakeFiles/cmTC_35eea.dir/CheckSymbolExists.c.o
   [2/2] Linking C executable cmTC_35eea
   FAILED: cmTC_35eea 
   : && /usr/lib/ccache/cc -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3   CMakeFiles/cmTC_35eea.dir/CheckSymbolExists.c.o  -o cmTC_35eea   && :
   CMakeFiles/cmTC_35eea.dir/CheckSymbolExists.c.o: In function `main':
   CheckSymbolExists.c:(.text.startup+0x3): undefined reference to `pthread_create'
   collect2: error: ld returned 1 exit status
   ninja: build stopped: subcommand failed.
   
   File /home/mj/mxnet/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
   /* */
   #include <pthread.h>
   
   int main(int argc, char** argv)
   {
     (void)argv;
   #ifndef pthread_create
     return ((int*)(&pthread_create))[argc];
   #else
     (void)argc;
     return 0;
   #endif
   }
   
   Determining if the function pthread_create exists in the pthreads failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_e1686 
   [1/2] Building C object CMakeFiles/cmTC_e1686.dir/CheckFunctionExists.c.o
   [2/2] Linking C executable cmTC_e1686
   FAILED: cmTC_e1686 
   : && /usr/lib/ccache/cc -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DCHECK_FUNCTION_EXISTS=pthread_create   CMakeFiles/cmTC_e1686.dir/CheckFunctionExists.c.o  -o cmTC_e1686  -lpthreads && :
   /usr/bin/ld: cannot find -lpthreads
   collect2: error: ld returned 1 exit status
   ninja: build stopped: subcommand failed.
   
   
   Performing C SOURCE FILE Test LIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_01d40 
   [1/2] Building C object CMakeFiles/cmTC_01d40.dir/src.c.o
   FAILED: CMakeFiles/cmTC_01d40.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_WNO_UNUSED_LOCAL_TYPEDEF_FLAG -fPIE   -Wunused-local-typedef -o CMakeFiles/cmTC_01d40.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-Wunused-local-typedef'; did you mean '-Wunused-local-typedefs'?
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_9c906 
   [1/2] Building C object CMakeFiles/cmTC_9c906.dir/src.c.o
   FAILED: CMakeFiles/cmTC_9c906.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_WNO_COVERED_SWITCH_DEFAULT_FLAG -fPIE   -Wcovered-switch-default -o CMakeFiles/cmTC_9c906.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-Wcovered-switch-default'; did you mean '-Wno-switch-default'?
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_d798e 
   [1/2] Building C object CMakeFiles/cmTC_d798e.dir/src.c.o
   FAILED: CMakeFiles/cmTC_d798e.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_WNO_DEPRECATED_REGISTER_FLAG -fPIE   -Wdeprecated-register -o CMakeFiles/cmTC_d798e.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-Wdeprecated-register'; did you mean '-frename-registers'?
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_f6031 
   [1/2] Building C object CMakeFiles/cmTC_f6031.dir/src.c.o
   FAILED: CMakeFiles/cmTC_f6031.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_WNO_GNU_ANONYMOUS_STRUCT_FLAG -fPIE   -Wgnu-anonymous-struct -o CMakeFiles/cmTC_f6031.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-Wgnu-anonymous-struct'
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_69435 
   [1/2] Building C object CMakeFiles/cmTC_69435.dir/src.c.o
   FAILED: CMakeFiles/cmTC_69435.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_WNO_SELF_ASSIGN_FLAG -fPIE   -Wself-assign -o CMakeFiles/cmTC_69435.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-Wself-assign'; did you mean '-Wcast-align'?
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_664fc 
   [1/2] Building C object CMakeFiles/cmTC_664fc.dir/src.c.o
   FAILED: CMakeFiles/cmTC_664fc.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_WNO_VLA_EXTENSION_FLAG -fPIE   -Wvla-extension -o CMakeFiles/cmTC_664fc.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-Wvla-extension'; did you mean '-fms-extensions'?
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_be40b 
   [1/2] Building C object CMakeFiles/cmTC_be40b.dir/src.c.o
   FAILED: CMakeFiles/cmTC_be40b.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_WNO_FORMAT_PEDANTIC_FLAG -fPIE   -Wformat-pedantic -o CMakeFiles/cmTC_be40b.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-Wformat-pedantic'; did you mean '-Wno-pedantic'?
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_MMIC_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_aab2a 
   [1/2] Building C object CMakeFiles/cmTC_aab2a.dir/src.c.o
   FAILED: CMakeFiles/cmTC_aab2a.dir/src.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_MMIC_FLAG -mmic -fPIE   -mmic -o CMakeFiles/cmTC_aab2a.dir/src.c.o   -c src.c
   cc: error: unrecognized command line option '-mmic'; did you mean '-fpic'?
   cc: error: unrecognized command line option '-mmic'; did you mean '-fpic'?
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Performing C SOURCE FILE Test LIBOMP_HAVE_M32_FLAG failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_5861e 
   [1/2] Building C object CMakeFiles/cmTC_5861e.dir/src.c.o
   [2/2] Linking C executable cmTC_5861e
   FAILED: cmTC_5861e 
   : && /usr/lib/ccache/cc -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DLIBOMP_HAVE_M32_FLAG -m32  -rdynamic CMakeFiles/cmTC_5861e.dir/src.c.o  -o cmTC_5861e   && :
   /usr/bin/ld: cannot find Scrt1.o: No such file or directory
   /usr/bin/ld: cannot find crti.o: No such file or directory
   /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a when searching for -lgcc
   /usr/bin/ld: cannot find -lgcc
   /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/7/libgcc_s.so.1 when searching for libgcc_s.so.1
   /usr/bin/ld: cannot find libgcc_s.so.1
   /usr/bin/ld: skipping incompatible /usr/lib/gcc/x86_64-linux-gnu/7/libgcc.a when searching for -lgcc
   /usr/bin/ld: cannot find -lgcc
   collect2: error: ld returned 1 exit status
   ninja: build stopped: subcommand failed.
   
   Source file was:
   int main(void) { return 0; }
   Determining if files windows.h;psapi.h exist failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_df20f 
   [1/2] Building C object CMakeFiles/cmTC_df20f.dir/LIBOMP_HAVE_PSAPI_H.c.o
   FAILED: CMakeFiles/cmTC_df20f.dir/LIBOMP_HAVE_PSAPI_H.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3  -fPIE -o CMakeFiles/cmTC_df20f.dir/LIBOMP_HAVE_PSAPI_H.c.o   -c /home/mj/mxnet/build/CMakeFiles/CheckIncludeFiles/LIBOMP_HAVE_PSAPI_H.c
   /home/mj/mxnet/build/CMakeFiles/CheckIncludeFiles/LIBOMP_HAVE_PSAPI_H.c:2:10: fatal error: windows.h: No such file or directory
    #include <windows.h>
             ^~~~~~~~~~~
   compilation terminated.
   ninja: build stopped: subcommand failed.
   
   Source:
   /* */
   #include <windows.h>
   #include <psapi.h>
   
   
   int main(void){return 0;}
   
   Determining if the function EnumProcessModules exists in the psapi failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_935d0 
   [1/2] Building C object CMakeFiles/cmTC_935d0.dir/CheckFunctionExists.c.o
   [2/2] Linking C executable cmTC_935d0
   FAILED: cmTC_935d0 
   : && /usr/lib/ccache/cc -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DCHECK_FUNCTION_EXISTS=EnumProcessModules   CMakeFiles/cmTC_935d0.dir/CheckFunctionExists.c.o  -o cmTC_935d0  -lpsapi && :
   /usr/bin/ld: cannot find -lpsapi
   collect2: error: ld returned 1 exit status
   ninja: build stopped: subcommand failed.
   
   
   Determining if the function __atomic_load_1 exists failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_bf21b 
   [1/2] Building C object CMakeFiles/cmTC_bf21b.dir/CheckFunctionExists.c.o
   <command-line>:0:23: warning: conflicting types for built-in function ‘__atomic_load_1’ [-Wbuiltin-declaration-mismatch]
   /usr/local/share/cmake-3.14/Modules/CheckFunctionExists.c:7:3: note: in expansion of macro ‘CHECK_FUNCTION_EXISTS’
      CHECK_FUNCTION_EXISTS(void);
      ^~~~~~~~~~~~~~~~~~~~~
   [2/2] Linking C executable cmTC_bf21b
   FAILED: cmTC_bf21b 
   : && /usr/lib/ccache/cc -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -DCHECK_FUNCTION_EXISTS=__atomic_load_1  -rdynamic CMakeFiles/cmTC_bf21b.dir/CheckFunctionExists.c.o  -o cmTC_bf21b   && :
   CMakeFiles/cmTC_bf21b.dir/CheckFunctionExists.c.o: In function `main':
   CheckFunctionExists.c:(.text.startup+0xc): undefined reference to `__atomic_load_1'
   collect2: error: ld returned 1 exit status
   ninja: build stopped: subcommand failed.
   
   
   Determining if the fopen64 exist failed with the following output:
   Change Dir: /home/mj/mxnet/build/CMakeFiles/CMakeTmp
   
   Run Build Command(s):/home/mj/anaconda3/bin/ninja cmTC_ee96b 
   [1/2] Building C object CMakeFiles/cmTC_ee96b.dir/CheckSymbolExists.c.o
   FAILED: CMakeFiles/cmTC_ee96b.dir/CheckSymbolExists.c.o 
   /usr/lib/ccache/cc   -mf16c -Wall -Wno-unknown-pragmas -Wno-sign-compare -O3 -msse3 -fopenmp  -fPIE -o CMakeFiles/cmTC_ee96b.dir/CheckSymbolExists.c.o   -c CheckSymbolExists.c
   CheckSymbolExists.c: In function ‘main’:
   CheckSymbolExists.c:8:19: error: ‘fopen64’ undeclared (first use in this function); did you mean ‘fopen’?
      return ((int*)(&fopen64))[argc];
                      ^~~~~~~
                      fopen
   CheckSymbolExists.c:8:19: note: each undeclared identifier is reported only once for each function it appears in
   ninja: build stopped: subcommand failed.
   
   File /home/mj/mxnet/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
   /* */
   #include <stdio.h>
   
   int main(int argc, char** argv)
   {
     (void)argv;
   #ifndef fopen64
     return ((int*)(&fopen64))[argc];
   #else
     (void)argc;
     return 0;
   #endif
   }
   ```
   
   ## To Reproduce
   Simply run the cmake make on ubuntu 18.04 with CUDA10.0 and TesnorRT7 and NCCL2.5.6 with Intel XE 2020 (latest MKL)
   
   ### Steps to reproduce
   ```
   cmake -DUSE_CUDA=1 -DUSE_CUDA_PATH=/usr/local/cuda -DUSE_CUDNN=1 -DUSE_MKLDNN=1 -DCMAKE_BUILD_TYPE=Release -DUSE_TENSORRT=1 -DUSE_NCCL=1 -DUSE_NCCL_PATH=/usr/local -DMKL_INCLUDE_DIR=/home/mj/intel/mkl/include -DMKL_RT_LIBRARY=/home/mj/intel/mkl/lib -GNinja ..
   
   ninja -v
   ```
   ## What have you tried to solve it?
   
   reinstalled libboost same error
   
   ## Environment
   
   ```
   curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
   
   ----------Python Info----------
   Version      : 3.7.4
   Compiler     : GCC 7.3.0
   Build        : ('default', 'Aug 13 2019 20:35:49')
   Arch         : ('64bit', '')
   ------------Pip Info-----------
   Version      : 19.2
   Directory    : /home/mj/anaconda3/lib/python3.7/site-packages/pip
   ----------MXNet Info-----------
   Version      : 1.5.1
   Directory    : /home/mj/anaconda3/lib/python3.7/site-packages/mxnet
   Num GPUs     : 3
   Commit Hash   : c9818480680f84daa6e281a974ab263691302ba8
   ----------System Info----------
   Platform     : Linux-4.15.0-43-generic-x86_64-with-debian-buster-sid
   system       : Linux
   node         : prometheusu
   release      : 4.15.0-43-generic
   version      : #46-Ubuntu SMP Thu Dec 6 14:45:28 UTC 2018
   ----------Hardware Info----------
   machine      : x86_64
   processor    : x86_64
   Architecture:        x86_64
   CPU op-mode(s):      32-bit, 64-bit
   Byte Order:          Little Endian
   CPU(s):              8
   On-line CPU(s) list: 0-7
   Thread(s) per core:  2
   Core(s) per socket:  4
   Socket(s):           1
   NUMA node(s):        1
   Vendor ID:           GenuineIntel
   CPU family:          6
   Model:               60
   Model name:          Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
   Stepping:            3
   CPU MHz:             4181.322
   CPU max MHz:         4400.0000
   CPU min MHz:         800.0000
   BogoMIPS:            8000.51
   Virtualization:      VT-x
   L1d cache:           32K
   L1i cache:           32K
   L2 cache:            256K
   L3 cache:            8192K
   NUMA node0 CPU(s):   0-7
   Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d
   ----------Network Test----------
   Setting timeout: 10
   Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0034 sec, LOAD: 0.8277 sec.
   Timing for GluonNLP GitHub: https://github.com/dmlc/gluon-nlp, DNS: 0.0006 sec, LOAD: 0.8006 sec.
   Timing for GluonNLP: http://gluon-nlp.mxnet.io, DNS: 0.1911 sec, LOAD: 0.9508 sec.
   Timing for D2L: http://d2l.ai, DNS: 0.1454 sec, LOAD: 0.2055 sec.
   Timing for D2L (zh-cn): http://zh.d2l.ai, DNS: 0.0622 sec, LOAD: 0.4024 sec.
   Timing for FashionMNIST: https://repo.mxnet.io/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.2145 sec, LOAD: 1.0244 sec.
   Timing for PYPI: https://pypi.python.org/pypi/pip, DNS: 0.3087 sec, LOAD: 1.7032 sec.
   Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 0.1470 sec, LOAD: 0.5086 sec.
   
   ```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572020519
 
 
   @leezu I'll try to build without MKL as see what happens then.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571690599
 
 
   Sorry I wasn't clear. I didn't use the entire script ... I used the script steps below, because I got another error while building it and that fixed the error:
   ```
   cmake \
           -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
           -DCMAKE_C_COMPILER_LAUNCHER=ccache \
           -DCMAKE_CXX_FLAGS=-I/usr/include/python${PYVER}\
           -DBUILD_SHARED_LIBS=ON ..\
           -G Ninja
       ninja -j 1 -v onnx/onnx.proto
       ninja -j 1 -v
       export LIBRARY_PATH=`pwd`:`pwd`/onnx/:$LIBRARY_PATH
       export CPLUS_INCLUDE_PATH=`pwd`:$CPLUS_INCLUDE_PATH
      
   
       # Build ONNX-TensorRT
       cd 3rdparty/onnx-tensorrt/
       mkdir -p build
       cd build
       cmake \
           -DCMAKE_CXX_COMPILER_LAUNCHER=ccache \
           -DCMAKE_C_COMPILER_LAUNCHER=ccache \
           ..
       make -j$(nproc)
       export LIBRARY_PATH=`pwd`:$LIBRARY_PATH```
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571683448
 
 
   @leezu Sorry, I don't understand what do you mean by submodules ... I did clone master recursively and built tensorrt-onnx and onnx before I tried to build the root package. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572020519
 
 
   @leezu I'll try to build without MKL and see what happens then.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571689002
 
 
   That script installs packages for Ubuntu 16.04, for example
   
   `wget -qO tensorrt.deb https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64/nvidia-machine-learning-repo-ubuntu1604_1.0.0-1_amd64.deb`
   
   That may cause your issues

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571683448
 
 
   I don't understand what do you mean by submodules ... I did clone master recursively and built tensorrt-onnx and onnx before I tried to build the root package. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mseth10 edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mseth10 edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-609036760
 
 
   @mxnet-label-bot add [MKL]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572006441
 
 
   It seems the compilation instructions and the compilation process for building with TensorRT are lacking.. How is your experience with the make build?
   
   For cmake, I suggest we improve the process to build onnx and onnx-tensorrt automatically as part of the MXNet build, similar to openmp. @mjsML are you familiar with cmake and would like to contribute?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571684177
 
 
   Ok, that means you have up-to-date submodules. Otherwise `git submodule update --init --recursive` will update.
   
   How did you build and install onnx? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572014705
 
 
   @pengzhao-intel are you aware of the header leaks mentioned by @mjsML?
   
   @mjsML I suggest you don't link with MKL for now to fix your problem given your "main target is to get a fast GPU build that utilizes the Nvidia packages". There may be a bug in the build setup with MKL + the nvidia libs you mention.
   
   > however I'm not sure how you sync the 3rd party folder with the source?
   
   The 3rdparty folder is updated time to time when new versions of MKL-DNN are released. Such 3rdparty folder may also go together with an update in mxnet source code to implement necessary changes due to API changes in MKL-DNN.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572382597
 
 
   @wuxun-zhang for looking into the issue

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mseth10 edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mseth10 edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-609036760
 
 
   mxnet-label-bot add 'MKL'

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572014705

@pengzhao-intel are you aware of the header leaks mentioned by @mjsML?

@mjsML I suggest you don't link with MKL for now to fix your problem given your "main target is to get a fast GPU build that utilizes the Nvidia packages". There may be a bug in the build setup with MKL + the nvidia libs you mention.

> however I'm not sure how you sync the 3rd party folder with the source?

The 3rdparty folder is updated time to time when new versions of MKL-DNN are released. Such 3rdparty folder may also go together with an update in mxnet source code to implement necessary changes due to API changes in MKL-DNN.

> Also sure I'd love to contribute to the build process or otherwise :) ... imho we need a few new build types actually ... my suggestions are Training and Inference by accelerator type

Great. The first step would be to integrate the tensorrt build into the mxnet cmake build (assuming tensorrt has a sufficiently good cmake support). Providing recommended "Training and Inference by accelerator type" build configurations and making them easy to build would be great.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572048234
 
 
   @leezu Success! 
   ```
   >>> from mxnet.runtime import feature_list
   >>> feature_list()
   [✔ CUDA, ✔ CUDNN, ✔ NCCL, ✔ CUDA_RTC, ✔ TENSORRT, ✔ CPU_SSE, ✔ CPU_SSE2, ✔ CPU_SSE3, ✔ CPU_SSE4_1, ✔ CPU_SSE4_2, ✖ CPU_SSE4A, ✔ CPU_AVX, ✖ CPU_AVX2, ✔ OPENMP, ✖ SSE, ✔ F16C, ✔ JEMALLOC, ✔ BLAS_OPEN, ✖ BLAS_ATLAS, ✖ BLAS_MKL, ✖ BLAS_APPLE, ✔ LAPACK, ✖ MKLDNN, ✔ OPENCV, ✖ CAFFE, ✖ PROFILER, ✖ DIST_KVSTORE, ✖ CXX14, ✖ INT64_TENSOR_SIZE, ✔ SIGNAL_HANDLER, ✖ DEBUG, ✖ TVM_OP]
   ```
   So it seems that MKL and MKLDNN with NVIDIA is the culbert here ...
   I'll pull on the build guides later tomorrow then ... Cheers and thanks for the help :) *falls asleep*

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571687461
 
 
   I used the steps mentioned below in the second script (the CI for docker):
   [How to build mxnet with tensorrt support?](https://discuss.mxnet.io/t/how-to-build-mxnet-with-tensorrt-support/3865/2)
   
   I deleted everything now and started from scratch with the submodule update and now building...

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] pengzhao-intel commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

pengzhao-intel commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572383044
 
 
   > @pengzhao-intel as a concrete example `/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm_threading.hpp` and `/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm_pack_storage.hpp` are missing the assert.h includes while using them ... I'm suspecting the other internal headers leaks (example includes of utils.hpp) are veneered by build / compiler arguments / include directories that neither my compiler nor I were not able to locate ...
   
   Thanks for the information. We will investigate and fix the problem. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572018611
 
 
   @pengzhao-intel as a concrete example `/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm_threading.hpp` and `/mxnet/3rdparty/mkldnn/src/cpu/gemm/gemm_pack_storage.hpp` are missing the assert.h includes while using them ... I'm suspecting the other internal headers leaks (example includes of utils.hpp) are veneered by build / compiler arguments / include directories that neither my compiler nor I were not able to locate  ... 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571695569
 
 
   @leezu even with the submodule update the build still failed with the pthread and fopen64 errors. I'll try to build it with make instead of ninja and see what happens.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571681303
 
 
   Did you update all submodules?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu edited a comment on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-571684177
 
 
   Ok, that means you have up-to-date submodules. Otherwise `git submodule update --init --recursive` will update.
   
   How did you build and install tensorrt-onnx and onnx? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mjsML commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-572012537

@leezu
The pthread errors and had to do with some missing libs... This [page](https://nextjournal.com/mpd/compiling-mxnet) specifically running apt with these packages:

`apt-get install --no-install-recommends \
software-properties-common apt-transport-https \
build-essential cmake libjemalloc-dev \
libatlas-base-dev liblapack-dev liblapacke-dev libopenblas-dev libopencv-dev \
libcurl4-openssl-dev libzmq3-dev ninja-build libhdf5-dev libomp-dev` fixed the errors related to those but then the assert.h inside MKLDNN didn't go away.

After few hours banging my head against the wall, I got to the root cause ... MKLDNN has a few ["header leaks"](https://jira.mongodb.org/browse/CXX-1423) which means I had to manually go in the MKLDNN src and add #include<assert.h> and the likes.
That fixed the assert.h error .. but then a whole bunch of other leaks sprung because of improper header referencing all across MKLDNN of their own internal headers.

I ended up ditching MKLDNN (by setting -DUSE_MKLDNN=0) because my main target is to get a fast GPU build that utilizes the Nvidia packages (mainly NCCL and TensorRT on x64 with my desktop training machine and TensorRT on aarch64 with jetson Nano) ... When I have more time I'll pull on the intel repo to fix the header issues, however I'm not sure how you sync the 3rd party folder with the source?
Now the build passes but the tests fail with linking errors ... I did specify that I wanted to use MKL for BLAS but I'm getting this when linking the examples / tests:
`//usr/lib/x86_64/-linuxusr-/gnubin//libblas.so.3ld:: errorlibmxnet.a (addingla_op.cc.o )symbols:: undefinedDSO referencemissing tofrom symbolcommand 'linecblas_dtrsm
'
//usr/lib/x86_64-linux-gnu/libblas.so.3: error adding symbols: DSO missing from command collect2: error: ld returned 1 exit status
line`
I then built OpenBLAS latest (0.3.8) and updated the symbols by updating alternatives as [instructions](https://github.com/xianyi/OpenBLAS/wiki/faq#debianlts):
`sudo update-alternatives --install /usr/lib/libblas.so.3 libblas.so.3 /opt/OpenBLAS/lib/libopenblas.so.0 41 \
--slave /usr/lib/liblapack.so.3 liblapack.so.3 /opt/OpenBLAS/lib/libopenblas.so.0`
This was futile as well as I'm still stuck with the same linking error ... I'm at loss why do we need to link OpenBLAS in the first place if I built with MKL as my BLAS of choice.

Also sure I'd love to contribute to the build process or otherwise :) ... imho we need a few new build types actually ... my suggestions are Training and Inference by accelerator type (in this example, I'm building a CUDA training build (CUDA, cuDNN, NCCL, and TensorRT), while an Inference build would have CUDA, cuDNN and TesnorRT only (in my case edge accelerator on aarch64 too :/ ) ... [Insert accelerator type here] need also the same ... in my mind this is like the "Debug" and "Release" config in the ML world. Food for thought.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mseth10 commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

mseth10 commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-609036760
 
 
   mxnet-label-bot add ['MKL']

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] bgawrych commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

bgawrych commented on issue #17238:
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-874561827


   Hey, I can't reproduce issue with assert.h headers on 1.x branch, however I was able to reproduce issue with linking tests - it can be solved by passing full path in **DMKL_RT_LIBRARY** to libmkl_rt.so e.g.:
   When I passed -DMKL_RT_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_rt.so everything works fine
   
   @szha can we close this issue? I think issue is solved in new oneDNN versions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] bgawrych commented on issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

bgawrych commented on issue #17238:
URL: https://github.com/apache/incubator-mxnet/issues/17238#issuecomment-874561827


   Hey, I can't reproduce issue with assert.h headers on 1.x branch, however I was able to reproduce issue with linking tests - it can be solved by passing full path in **DMKL_RT_LIBRARY** to libmkl_rt.so e.g.:
   When I passed -DMKL_RT_LIBRARY=/opt/intel/mkl/lib/intel64/libmkl_rt.so everything works fine
   
   @szha can we close this issue? I think issue is solved in new oneDNN versions


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org

[GitHub] [incubator-mxnet] leezu closed issue #17238: Error building on ubuntu 18.04 on x64 with intel XEP, CUDA10.0, NCCL and TensorRT

Posted by GitBox <gi...@apache.org>.

leezu closed issue #17238:
URL: https://github.com/apache/incubator-mxnet/issues/17238


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org