You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/01/09 21:52:14 UTC

[GitHub] [incubator-mxnet] cyrusbehr opened a new issue #17262: Unable to build / link mxnet against cuda 10.2

cyrusbehr opened a new issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262
 
 
   I am trying to build mxnet with cuda 10.2 in a docker container.
   For my build, I am using the following docker image from nvidia: `nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04`
   
   Within the container (from fresh image) I am running the following commands:
   
   ```
   apt-get update && apt-get upgrade -y
   apt-get install -y libopenblas-dev git python python-pip
   apt-get install -y libjemalloc-dev
   pip install cmake 
   
   git clone https://github.com/apache/incubator-mxnet.git mxnet
   cd mxnet
   mkdir build
   cd build
   
   export CUDA_HOME=/usr/local/cuda
   export LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}
   export LD_LIBRARY_PATH=${CUDA_HOME}/compat:${LD_LIBRARY_PATH}
   
   cmake -DUSE_CUDNN=1 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCMAKE_BUILD_TYPE=Release \
   	-DBLAS=open -DUSE_OPENCV=OFF -DUSE_CPP_PACKAGE=ON -DENABLE_CUDA_RTC=ON ..
   make -j4
   ```
   
   The build gets to the 95% percent mark, then fails with the following message: 
   ```
   [21:33:50] :	 [Step 1/1] [ 95%] Building CXX object tests/CMakeFiles/mxnet_unit_tests.dir/cpp/test_main.cc.o
   [21:34:11] :	 [Step 1/1] [ 95%] Linking CUDA device code CMakeFiles/mxnet_unit_tests.dir/cmake_device_link.o
   [21:34:27] :	 [Step 1/1] [ 95%] Linking CXX executable mxnet_unit_tests
   [21:34:36]W:	 [Step 1/1] /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o: In function `_start':
   [21:34:36]W:	 [Step 1/1] (.text+0x26): relocation truncated to fit: R_X86_64_GOTPCRELX against symbol `__libc_start_main@@GLIBC_2.2.5' defined in .text section in /lib/x86_64-linux-gnu/libc.so.6
   [21:34:36] :	 [Step 1/1] tests/CMakeFiles/mxnet_unit_tests.dir/build.make:449: recipe for target 'tests/mxnet_unit_tests' failed
   [21:34:36]W:	 [Step 1/1] /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o:(.eh_frame+0x20): I
   [21:34:36]W:	 [Step 1/1] /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o: In function `_init':
   [21:34:36]W:	 [Step 1/1] (.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
   [21:34:36]W:	 [Step 1/1] CMakeFiles/mxnet_unit_tests.dir/cpp/engine/omp_test.cc.o: In function `OMPBehaviour_after_fork_Test::TestBody()':
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x193): relocation truncated to fit: R_X86_64_PC32 against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x1a7): relocation truncated to fit: R_X86_64_PC32 against symbol `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x1b2): relocation truncated to fit: R_X86_64_PC32 against symbol `VTT for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x1f6): relocation truncated to fit: R_X86_64_PC32 against symbol `vtable for std::__cxx11::basic_ostringstream<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x217): relocation truncated to fit: R_X86_64_PC32 against symbol `vtable for std::basic_streambuf<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x242): relocation truncated to fit: R_X86_64_PC32 against symbol `vtable for std::__cxx11::basic_stringbuf<char, std::char_traits<char>, std::allocator<char> >@@GLIBCXX_3.4.21' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x432): relocation truncated to fit: R_X86_64_PC32 against symbol `vtable for std::basic_ios<char, std::char_traits<char> >@@GLIBCXX_3.4' defined in .data.rel.ro section in /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/Scrt1.o
   [21:34:36]W:	 [Step 1/1] omp_test.cc:(.text+0x446): additional relocation overflows omitted from the output
   [21:34:36]W:	 [Step 1/1] /usr/bin/ld: failed to convert GOTPCREL relocation; relink with --no-relax
   [21:34:36]W:	 [Step 1/1] collect2: error: ld returned 1 exit status
   [21:34:36]W:	 [Step 1/1] make[2]: *** [tests/mxnet_unit_tests] Error 1
   [21:34:36] :	 [Step 1/1] CMakeFiles/Makefile2:2398: recipe for target 'tests/CMakeFiles/mxnet_unit_tests.dir/all' failed
   [21:34:36] :	 [Step 1/1] Makefile:140: recipe for target 'all' failed
   [21:34:36]W:	 [Step 1/1] make[1]: *** [tests/CMakeFiles/mxnet_unit_tests.dir/all] Error 2
   [21:34:36]W:	 [Step 1/1] make: *** [all] Error 2
   [21:34:36]i:	 [Step 1/1] Docker event: {"status":"die","id":"a42467491c7573f601b26314aee381f40e74a1c4ae21e2e94929d22b6587b2a3","from":"nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04","Type":"container","Action":"die","Actor":{"ID":"a42467491c7573f601b26314aee381f40e74a1c4ae21e2e94929d22b6587b2a3","Attributes":{"com.nvidia.cudnn.version":"7.6.5.32","exitCode":"2","image":"nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04","jetbrains.teamcity.buildId":"202","maintainer":"NVIDIA CORPORATION <cu...@nvidia.com>","name":"priceless_tereshkova"}},"scope":"local","time":1578605676,"timeNano":1578605676595309140}
   [21:34:37]i:	 [Step 1/1] Docker event: {"status":"destroy","id":"a42467491c7573f601b26314aee381f40e74a1c4ae21e2e94929d22b6587b2a3","from":"nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04","Type":"container","Action":"destroy","Actor":{"ID":"a42467491c7573f601b26314aee381f40e74a1c4ae21e2e94929d22b6587b2a3","Attributes":{"com.nvidia.cudnn.version":"7.6.5.32","image":"nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04","jetbrains.teamcity.buildId":"202","maintainer":"NVIDIA CORPORATION <cu...@nvidia.com>","name":"priceless_tereshkova"}},"scope":"local","time":1578605677,"timeNano":1578605677103449725}
   [21:34:37]W:	 [Step 1/1] Process exited with code 2
   [21:34:37]E:	 [Step 1/1] Process exited with code 2 (Step: Command Line)
   [21:34:37]E:	 [Step 1/1] Step Command Line failed
   [21:34:37] : Publishing artifacts
   [21:34:37] :	 [Publishing artifacts] Collecting files to publish: [/opt/jetbrains/TeamCity/buildAgent/temp/buildTmp/.teamcity/docker/build_3/events.json => .teamcity/docker/]
   [21:34:37] :	 [Publishing artifacts] Publishing 1 file using [WebPublisher]: /opt/jetbrains/TeamCity/buildAgent/temp/buildTmp/.teamcity/docker/build_3/events.json => .teamcity/docker
   [21:34:37] :	 [Publishing artifacts] Publishing 1 file using [ArtifactsCachePublisher]: /opt/jetbrains/TeamCity/buildAgent/temp/buildTmp/.teamcity/docker/build_3/events.json => .teamcity/docker
   [21:34:37]i: Docker wrapper: setting permissions for '/opt/jetbrains/TeamCity/buildAgent/temp/buildTmp' and '/opt/jetbrains/TeamCity/buildAgent/work/a29666811c94acf8' to 755
   [21:34:37] : Publishing internal artifacts
   [21:34:37] :	 [Publishing internal artifacts] Publishing 1 file using [WebPublisher]
   [21:34:37] :	 [Publishing internal artifacts] Publishing 1 file using [ArtifactsCachePublisher]
   [21:34:37] : Build is failed. Artifacts will not be published for this build
   [21:34:37] : Build finished
   
   
   ```
   
   If I build mxnet using the exact same build command and options but use the cuda 10.0 image `nvidia/cuda:10.0-cudnn7-devel-ubuntu18.04`, then the build completes as expected.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-578975798
 
 
   Let's track the issue at https://github.com/apache/incubator-mxnet/issues/16852#issuecomment-565016904

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] litaotju commented on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
litaotju commented on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-573405031
 
 
   @leezu, I didn't have this issue when building locally maybe because I have a GPU. Got the answer from stackoverflow. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-573300257
 
 
   I suppose you are not building on a GPU machine? Currently, in this case, our cmake build setup will fall-back to building for all possible cuda architectures. Most likely you only want to build for the GPUs that you want to run MXNet on in the end.
   
   Please set `-DMXNET_CUDA_ARCH=7.0` (or your respective cuda arch) in the `cmake` command.
   
   @litaotju does this actually work for you? The same issue is tracked at https://github.com/apache/incubator-mxnet/issues/16852 and I think not yet solved completely.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-579009753
 
 
   @cyrusbehr you can remove a few arches from https://github.com/apache/incubator-mxnet/blob/master/Makefile#L429
   
   For example, code compiled for `30` also runs on `35`. So you may change `30 35 50 52 60 61 70 75` to `30 50 60 61 70 75` or similar? (Though this may have a slight performance impact on the users of 52, 61 arch, or whatever you remove.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-579009753
 
 
   @cyrusbehr for a static build for distribution, you need to use the static build scripts.
   You can use the code in https://github.com/apache/incubator-mxnet/pull/17448 to do the staticbuild with cmake.
   
   If you don't want to rely on development version of MXNet, you can use the staticbuild based on Makefile https://github.com/apache/incubator-mxnet/tree/master/tools/staticbuild
   
   
   To solve your issue, you can remove some cuda arches that are covered by "minor version forward compatibility guarantee".
   For example, code compiled for `30` also runs on `35`. So you may change `30 35 50 52 60 61 70 75` to `30 50 60 61 70 75` or similar (Though this may have a slight performance impact on the users of 35 arch, or whatever you remove.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] litaotju commented on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
litaotju commented on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-573272438
 
 
   Seems the data section size is too large. Could you try to add `-mcmodel=large`** to the link command to see if it works?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] cyrusbehr commented on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
cyrusbehr commented on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-578983406
 
 
   @leezu You are correct in saying that my machine does not have a GPU. I am trying to build with support for all Cuda Arch's as I would like to distribute mxnet as part of an SDK and require compatibility with different Architectures.
   
   And sounds good, I will follow the issue you linked.  

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] cyrusbehr commented on issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
cyrusbehr commented on issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262#issuecomment-578951508
 
 
   @litaotju I tried what you suggested, and added the following to the top of the cmake file:
   ```
   set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -mcmodel=large")
   set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -mcmodel=large")
   set(CMAKE_LINKER_FLAGS "${CMAKE_LINKER_FLAGS} -mcmodel=large")
   ```
   
   However, I still get the error:
   ```
   [ 95%] Linking CXX shared library libmxnet.so
   [13:01:14]  [Step 2/2] /usr/lib/gcc/x86_64-linux-gnu/7/../../../x86_64-linux-gnu/crti.o: In function `_init':
   [13:01:14]  [Step 2/2] (.init+0x7): relocation truncated to fit: R_X86_64_REX_GOTPCRELX against undefined symbol `__gmon_start__'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o): In function `void mxnet::op::CastStorageComputeImpl<mshadow::cpu>(mxnet::OpContext const&, mxnet::NDArray const&, mxnet::NDArray const&)':
   [13:01:14]  [Step 2/2] utils.cc:(.text._ZN5mxnet2op22CastStorageComputeImplIN7mshadow3cpuEEEvRKNS_9OpContextERKNS_7NDArrayES9_[_ZN5mxnet2op22CastStorageComputeImplIN7mshadow3cpuEEEvRKNS_9OpContextERKNS_7NDArrayES9_]+0x2a40): relocation truncated to fit: R_X86_64_TLSLD against symbol `guard variable for mxnet::TmpMemMgr::Get()::mgr' defined in .tbss._ZGVZN5mxnet9TmpMemMgr3GetEvE3mgr[_ZGVZN5mxnet9TmpMemMgr3GetEvE3mgr] section in libmxnet.a(utils.cc.o)
   [13:01:14]  [Step 2/2] utils.cc:(.text._ZN5mxnet2op22CastStorageComputeImplIN7mshadow3cpuEEEvRKNS_9OpContextERKNS_7NDArrayES9_[_ZN5mxnet2op22CastStorageComputeImplIN7mshadow3cpuEEEvRKNS_9OpContextERKNS_7NDArrayES9_]+0x2a8d): relocation truncated to fit: R_X86_64_TLSLD against symbol `guard variable for mxnet::TmpMemMgr::Get()::mgr' defined in .tbss._ZGVZN5mxnet9TmpMemMgr3GetEvE3mgr[_ZGVZN5mxnet9TmpMemMgr3GetEvE3mgr] section in libmxnet.a(utils.cc.o)
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0x20): relocation truncated to fit: R_X86_64_PC32 against `.text._ZNK4dnnl5error4whatEv'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0x80): relocation truncated to fit: R_X86_64_PC32 against `.text'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0xcc): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPfPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.1'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0xfc): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPfPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.2'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0x12c): relocation truncated to fit: R_X86_64_PC32 against `.text'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0x178): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common16csr_indptr_checkEN7mshadow3cpuEE6LaunchIJPdPlllEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.4'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0x1a8): relocation truncated to fit: R_X86_64_PC32 against `.text._ZN5mxnet2op8mxnet_op6KernelINS_6common13csr_idx_checkEN7mshadow3cpuEE6LaunchIJPdPlSA_lEEEbPNS5_6StreamIS6_EEmDpT_._omp_fn.5'
   [13:01:14]  [Step 2/2] libmxnet.a(utils.cc.o):(.eh_frame+0x1f8): additional relocation overflows omitted from the output
   [13:01:14]  [Step 2/2] libmxnet.so: PC-relative offset overflow in PLT entry for `_ZN5mxnet2op8mxnet_op6KernelINS0_9pick_gradILi3ELb0EEEN7mshadow3gpuEE6LaunchIJPdS9_PfiiNS5_5ShapeILi3EEESC_EEEvPNS5_6StreamIS6_EEiDpT_'
   [13:01:14]  [Step 2/2] collect2: error: ld returned 1 exit status
   [13:01:14]  [Step 2/2] CMakeFiles/mxnet.dir/build.make:126: recipe for target 'libmxnet.so' failed
   [13:01:14]  [Step 2/2] make[2]: *** [libmxnet.so] Error 1
   [13:01:14]  [Step 2/2] CMakeFiles/Makefile2:144: recipe for target 'CMakeFiles/mxnet.dir/all' failed
   [13:01:14]  [Step 2/2] make[1]: *** [CMakeFiles/mxnet.dir/all] Error 2
   [13:01:14]  [Step 2/2] make[1]: *** Waiting for unfinished jobs....
   [13:01:18]  [Step 2/2] [ 95%] Linking CUDA device code CMakeFiles/mxnet_unit_tests.dir/cmake_device_link.o
   [13:01:38]  [Step 2/2] [ 95%] Linking CXX executable mxnet_unit_tests
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu closed issue #17262: Unable to build / link mxnet against cuda 10.2

Posted by GitBox <gi...@apache.org>.
leezu closed issue #17262: Unable to build / link mxnet against cuda 10.2
URL: https://github.com/apache/incubator-mxnet/issues/17262
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services