You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/30 09:45:12 UTC

[GitHub] [incubator-mxnet] mseth10 opened a new pull request #18205: Fix nightly cd

mseth10 opened a new pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205


   ## Description ##
   This PR fixes nightly CD GPU tests by updating the build toolchain to use cmake static build and updating dnnl headers stash location. It removes 7.5 arch for cu100, cu101, cu102 builds to solve oversized libmxnet.so binary issue with cmake builds.
   It also fixes the issue with dnnl headers packaging into nightly build artifacts. Fixes #18120 
   
   ## Checklist ##
   ### Essentials ###
   Please feel free to remove inapplicable items for your PR.
   - [ ] The PR title starts with [MXNET-$JIRA_ID], where $JIRA_ID refers to the relevant [JIRA issue](https://issues.apache.org/jira/projects/MXNET/issues) created (except PRs with tiny changes)
   - [ ] Changes are complete (i.e. I finished coding on this PR)
   - [ ] All changes have test coverage:
   - Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
   - Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
   - Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
   - [ ] Code is well-documented: 
   - For user-facing API changes, API doc string has been updated. 
   - For new C++ functions in header files, their functionalities and arguments are documented. 
   - For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
   - Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
   - [ ] To the best of my knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change
   
   ### Changes ###
   - [ ] Feature1, tests, (and when applicable, API doc)
   - [ ] Feature2, tests, (and when applicable, API doc)
   
   ## Comments ##
   - If this change is a backward incompatible change, why must this change be made.
   - Interesting edge cases to note here
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ChaiBapchya commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
ChaiBapchya commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418167961



##########
File path: tools/pip/setup.py
##########
@@ -150,9 +150,8 @@ def skip_markdown_comments(md):
 package_data = {'mxnet': [os.path.join('mxnet', os.path.basename(LIB_PATH[0]))],
                 'dmlc_tracker': []}
 if variant.endswith('MKL'):
-    if platform.system() == 'Darwin':
-        shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),
-                        os.path.join(CURRENT_DIR, 'mxnet/include/mkldnn'))
+    shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),

Review comment:
       Curious how
   1. previously when nightly CD was working, why was mkldnn include was done only for Darwin
   2. now, to fix nightly CD, this needs to be done for all OS




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
leezu commented on pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#issuecomment-622164919


   Retriggered one previously broken pipeline as http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job/detail/mxnet-cd-release-job/1048/pipeline


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mxnet-bot commented on pull request #18205: Fix nightly cd

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#issuecomment-621728802


   Hey @mseth10 , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [edge, miscellaneous, unix-cpu, centos-gpu, website, sanity, windows-gpu, unix-gpu, clang, centos-cpu, windows-cpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418193832



##########
File path: config/distribution/linux_cu100.cmake
##########
@@ -33,4 +33,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.0/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       Sidenote: 7.0 binaries also run on 7.5




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418349899



##########
File path: config/distribution/linux_cu101.cmake
##########
@@ -35,4 +35,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.1/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       Size limitation




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] eric-haibin-lin commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
eric-haibin-lin commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418341987



##########
File path: config/distribution/linux_cu101.cmake
##########
@@ -35,4 +35,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.1/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       why remove 7.5?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ChaiBapchya commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
ChaiBapchya commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418167961



##########
File path: tools/pip/setup.py
##########
@@ -150,9 +150,8 @@ def skip_markdown_comments(md):
 package_data = {'mxnet': [os.path.join('mxnet', os.path.basename(LIB_PATH[0]))],
                 'dmlc_tracker': []}
 if variant.endswith('MKL'):
-    if platform.system() == 'Darwin':
-        shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),
-                        os.path.join(CURRENT_DIR, 'mxnet/include/mkldnn'))
+    shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),

Review comment:
       Curious how
   1. previously when nightly CD was working, why was mkldnn include done only for Darwin
   2. now, to fix nightly CD, this needs to be done for all OS




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418381003



##########
File path: config/distribution/linux_cu101.cmake
##########
@@ -35,4 +35,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.1/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       No. But compiling for 7.5 can result in faster runtime. So we should definitely fix the size limitation issues and re-enable the 7.5 target once possible.
   See  https://docs.nvidia.com/cuda/turing-compatibility-guide/index.html#turing-volta-compatibility




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418367413



##########
File path: config/distribution/linux_cu101.cmake
##########
@@ -35,4 +35,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.1/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       Code compiled for 7.0 runs fine on 7.5




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418372835



##########
File path: config/distribution/linux_cu101.cmake
##########
@@ -35,4 +35,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.1/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       Right, but it causes an increased initial startup time while the kernels are generated on the fly, doesn't it?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418169616



##########
File path: config/distribution/linux_cu100.cmake
##########
@@ -33,4 +33,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.0/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       It's a temporary fix to get the CD working, I checked the builds for cu100 and cu102, both fail because of binary size issues. We should work on adding back 7.5 arch after making sure the build works.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ChaiBapchya commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
ChaiBapchya commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418165835



##########
File path: config/distribution/linux_cu100.cmake
##########
@@ -33,4 +33,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.0/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       dropping support for 7.5 cuda arch is it favorable?
   For eg for Tesla T4 [G4 instances] cuda arch supported is 7.5
   @leezu What do you think?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
mseth10 commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418170848



##########
File path: tools/pip/setup.py
##########
@@ -150,9 +150,8 @@ def skip_markdown_comments(md):
 package_data = {'mxnet': [os.path.join('mxnet', os.path.basename(LIB_PATH[0]))],
                 'dmlc_tracker': []}
 if variant.endswith('MKL'):
-    if platform.system() == 'Darwin':
-        shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),
-                        os.path.join(CURRENT_DIR, 'mxnet/include/mkldnn'))
+    shutil.copytree(os.path.join(CURRENT_DIR, 'mxnet-build/3rdparty/mkldnn/build/install/include'),

Review comment:
       This platform condition was added by mistake in some earlier commit. We are fixing that here. No reason to package dnnl header files for only Darwin.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] ChaiBapchya commented on pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
ChaiBapchya commented on pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#issuecomment-621992954


   Also @mseth10 do you mind updating the description [removing extraneous content] and also adding the Jenkins Pipeline build where you tested it out or a command you used to reproduce & test this fix. Thanks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] mseth10 commented on pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
mseth10 commented on pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#issuecomment-621999732


   @ChaiBapchya updated the description. Check it out.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #18205: Fix nightly CD for GPU builds

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #18205:
URL: https://github.com/apache/incubator-mxnet/pull/18205#discussion_r418365078



##########
File path: config/distribution/linux_cu101.cmake
##########
@@ -35,4 +35,4 @@ set(USE_F16C OFF CACHE BOOL "Build with x86 F16C instruction support")
 set(USE_LIBJPEG_TURBO ON CACHE BOOL "Build with libjpeg-turbo")
 
 set(CUDACXX "/usr/local/cuda-10.1/bin/nvcc" CACHE STRING "Cuda compiler")
-set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0;7.5" CACHE STRING "Cuda architectures")
+set(MXNET_CUDA_ARCH "3.0;5.0;6.0;7.0" CACHE STRING "Cuda architectures")

Review comment:
       Well why not remove another one then which has a smaller use base? This is about precompiling kernels for convenience, so I'd rather skip a less frequently used one




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org