You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/04/02 19:39:29 UTC

[GitHub] [incubator-mxnet] leezu opened a new pull request #17962: Fix Windows GPU CI

leezu opened a new pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962
 
 
   ## Description ##
   Minimal version of https://github.com/apache/incubator-mxnet/pull/17808 
   
   CC: @marcoabreu 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402568577
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   It's a requirement for running on g4.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608562408
 
 
   @marcoabreu gpu build is still flaky due to thrust + VS2019 issues. Adding back the retries. http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/mxnet-validation%2Fwindows-gpu/detail/PR-17962/17/pipeline

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r403122044
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   What do you mean? Cuda 9.2 does not support VS 2019

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r403122765
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   You can read for example https://superuser.com/questions/1506044/installing-cuda-9-2-with-vs-2019

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608548689
 
 
   > So are going to x version of VS as a default?
   
   Will do

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] mxnet-bot commented on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
mxnet-bot commented on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608064371
 
 
   Hey @leezu , Thanks for submitting the PR 
   All tests are already queued to run once. If tests fail, you can trigger one or more tests again with the following commands: 
   - To trigger all jobs: @mxnet-bot run ci [all] 
   - To trigger specific jobs: @mxnet-bot run ci [job1, job2] 
   *** 
   **CI supported jobs**: [windows-gpu, unix-cpu, centos-cpu, sanity, unix-gpu, website, clang, windows-cpu, edge, miscellaneous, centos-gpu]
   *** 
   _Note_: 
    Only following 3 categories can trigger CI :PR Author, MXNet Committer, Jenkins Admin. 
   All CI tests must pass before the PR can be merged. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402565999
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -106,29 +129,37 @@ class BuildFlavour(Enum):
         '-DCMAKE_BUILD_TYPE=Release')
 
     , 'WIN_GPU': (
+        '-DCMAKE_C_COMPILER=cl '
+        '-DCMAKE_CXX_COMPILER=cl '
         '-DUSE_CUDA=ON '
         '-DUSE_CUDNN=ON '
         '-DENABLE_CUDA_RTC=ON '
         '-DUSE_OPENCV=ON  '
+        '-DOpenCV_RUNTIME=vc15 '
+        '-DOpenCV_ARCH=x64 '
         '-DUSE_OPENMP=ON '
         '-DUSE_BLAS=open '
         '-DUSE_LAPACK=ON '
         '-DUSE_DIST_KVSTORE=OFF '
-        '-DMXNET_CUDA_ARCH="5.2" '
+        '-DMXNET_CUDA_ARCH="7.5" '
         '-DCMAKE_CXX_FLAGS="/FS /MD /O2 /Ob2" '
         '-DUSE_MKL_IF_AVAILABLE=OFF '
         '-DCMAKE_BUILD_TYPE=Release')
 
     , 'WIN_GPU_MKLDNN': (
+        '-DCMAKE_C_COMPILER=cl '
+        '-DCMAKE_CXX_COMPILER=cl '
         '-DUSE_CUDA=ON '
         '-DUSE_CUDNN=ON '
         '-DENABLE_CUDA_RTC=ON '
         '-DUSE_OPENCV=ON '
+        '-DOpenCV_RUNTIME=vc15 '
+        '-DOpenCV_ARCH=x64 '
         '-DUSE_OPENMP=ON '
         '-DUSE_BLAS=open '
         '-DUSE_LAPACK=ON '
         '-DUSE_DIST_KVSTORE=OFF '
-        '-DMXNET_CUDA_ARCH="5.2" '
+        '-DMXNET_CUDA_ARCH="7.5" '
 
 Review comment:
   Please revert the CUDA change. That does not represent a fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r403134909
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   > But is there still some kind of compatibility mode which checks that the is still compliant with older cuda standards?
   
   No. Unix & CentOS tests Cuda 10.1. Windows now tests Cuda 10.2.
   But the risk is low that we'd break Cuda 9 support within the next few days. So it's not a one-way door decision. I suggest we discuss on dev if we want to support cuda 9. If we decide to support it, let's switch the CentOS tests to use Cuda 9.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402566100
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   Please revert the CUDA change. That does not represent a fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608548689
 
 
   > Can you please summarize the changes and the reasonings in the PR description?
   
   Will do

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r403126540
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   Oh, they intentionally broke that backwards compatibility feature. Usually it was possible to compile older cuda versions in later vs versions by installing toolkits which make sure that the integration is available. Seems like that caused issues and thus Microsoft and Nvidia decided to not go that route any further. 
   
   In that case, find to proceed. 
   
   But is there still some kind of compatibility mode which checks that the is still compliant with older cuda standards?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r403122765
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   You can read for example https://superuser.com/questions/1506044/installing-cuda-9-2-with-vs-2019 or https://devblogs.microsoft.com/cppblog/cuda-10-1-available-now-with-support-for-latest-microsoft-visual-studio-2019-versions/

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r403119434
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   Usually visual studio does allow more than one cuda version. You just have to install the respective Toolkit. 
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu merged pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu merged pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402574713
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -106,29 +129,37 @@ class BuildFlavour(Enum):
         '-DCMAKE_BUILD_TYPE=Release')
 
     , 'WIN_GPU': (
+        '-DCMAKE_C_COMPILER=cl '
+        '-DCMAKE_CXX_COMPILER=cl '
         '-DUSE_CUDA=ON '
         '-DUSE_CUDNN=ON '
         '-DENABLE_CUDA_RTC=ON '
         '-DUSE_OPENCV=ON  '
+        '-DOpenCV_RUNTIME=vc15 '
+        '-DOpenCV_ARCH=x64 '
         '-DUSE_OPENMP=ON '
         '-DUSE_BLAS=open '
         '-DUSE_LAPACK=ON '
         '-DUSE_DIST_KVSTORE=OFF '
-        '-DMXNET_CUDA_ARCH="5.2" '
+        '-DMXNET_CUDA_ARCH="7.5" '
 
 Review comment:
   This represents a fix for the issues experienced on g3 instances, which are blocking the Windows CI at this point in time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402571006
 
 

 ##########
 File path: tests/python/unittest/test_gluon_data.py
 ##########
 @@ -285,6 +285,11 @@ def test_multi_worker_dataloader_release_pool():
 
 
 def test_dataloader_context():
+    if os.name == 'nt':
+        print("Skipping test_dataloader_context on Windows due to "
+              "https://github.com/apache/incubator-mxnet/issues/17961")
 
 Review comment:
   G3 represent the minimum we are verifying. G4 is compatible with G3 but not the other way around. 
   
   Do not introduce things if they do not work. I want to see some quality check before changes like this are made without checking with the community. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402574988
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -106,29 +129,37 @@ class BuildFlavour(Enum):
         '-DCMAKE_BUILD_TYPE=Release')
 
     , 'WIN_GPU': (
+        '-DCMAKE_C_COMPILER=cl '
+        '-DCMAKE_CXX_COMPILER=cl '
         '-DUSE_CUDA=ON '
         '-DUSE_CUDNN=ON '
         '-DENABLE_CUDA_RTC=ON '
         '-DUSE_OPENCV=ON  '
+        '-DOpenCV_RUNTIME=vc15 '
+        '-DOpenCV_ARCH=x64 '
         '-DUSE_OPENMP=ON '
         '-DUSE_BLAS=open '
         '-DUSE_LAPACK=ON '
         '-DUSE_DIST_KVSTORE=OFF '
-        '-DMXNET_CUDA_ARCH="5.2" '
+        '-DMXNET_CUDA_ARCH="7.5" '
 
 Review comment:
   It's a requirement for running on g4, which is a required fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402568577
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -253,12 +269,8 @@ def main():
     system = platform.system()
     if system == 'Windows':
         logging.info("Detected Windows platform")
-        if 'OpenBLAS_HOME' not in os.environ:
-            os.environ["OpenBLAS_HOME"] = "C:\\Program Files\\OpenBLAS-v0.2.19"
-        if 'OpenCV_DIR' not in os.environ:
-            os.environ["OpenCV_DIR"] = "C:\\Program Files\\OpenCV-v3.4.1\\build"
         if 'CUDA_PATH' not in os.environ:
-            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.2"
+            os.environ["CUDA_PATH"] = "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v10.2"
 
 Review comment:
   It's a requirement for VS2019

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608548689
 
 
   > Can you please summarize the changes and the reasonings in the PR description?
   
   Will do
   
   > Also, setting up Windows for yourself, you might want to see how CI does it and why.
   
   https://github.com/apache/incubator-mxnet/pull/17808 will provide an updated setup. This PR is only a emergency fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608069619
 
 
   Could you elaborate on how the distinction between running tests on a g3 or g4 does not align with the projects interest?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402571006
 
 

 ##########
 File path: tests/python/unittest/test_gluon_data.py
 ##########
 @@ -285,6 +285,11 @@ def test_multi_worker_dataloader_release_pool():
 
 
 def test_dataloader_context():
+    if os.name == 'nt':
+        print("Skipping test_dataloader_context on Windows due to "
+              "https://github.com/apache/incubator-mxnet/issues/17961")
 
 Review comment:
   G3 represent the minimum we are verifying. G4 is compatible with G3 but not the other way around. 
   
   Do not change things if they do not work. I want to see some quality check before changes like this are made without checking with the community. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402574713
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -106,29 +129,37 @@ class BuildFlavour(Enum):
         '-DCMAKE_BUILD_TYPE=Release')
 
     , 'WIN_GPU': (
+        '-DCMAKE_C_COMPILER=cl '
+        '-DCMAKE_CXX_COMPILER=cl '
         '-DUSE_CUDA=ON '
         '-DUSE_CUDNN=ON '
         '-DENABLE_CUDA_RTC=ON '
         '-DUSE_OPENCV=ON  '
+        '-DOpenCV_RUNTIME=vc15 '
+        '-DOpenCV_ARCH=x64 '
         '-DUSE_OPENMP=ON '
         '-DUSE_BLAS=open '
         '-DUSE_LAPACK=ON '
         '-DUSE_DIST_KVSTORE=OFF '
-        '-DMXNET_CUDA_ARCH="5.2" '
+        '-DMXNET_CUDA_ARCH="7.5" '
 
 Review comment:
   This represents a fix for the issues experienced on g3 instances, which are blocking the Windows CI at this point in time.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608067529
 
 
   Well the new Ami should not have been moved into production then. Sorry, but changing a hundred knobs to facilitate one change isn't right. 
   
   These are some standards which I do not see aligned with the projects interest.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402565985
 
 

 ##########
 File path: ci/build_windows.py
 ##########
 @@ -106,29 +129,37 @@ class BuildFlavour(Enum):
         '-DCMAKE_BUILD_TYPE=Release')
 
     , 'WIN_GPU': (
+        '-DCMAKE_C_COMPILER=cl '
+        '-DCMAKE_CXX_COMPILER=cl '
         '-DUSE_CUDA=ON '
         '-DUSE_CUDNN=ON '
         '-DENABLE_CUDA_RTC=ON '
         '-DUSE_OPENCV=ON  '
+        '-DOpenCV_RUNTIME=vc15 '
+        '-DOpenCV_ARCH=x64 '
         '-DUSE_OPENMP=ON '
         '-DUSE_BLAS=open '
         '-DUSE_LAPACK=ON '
         '-DUSE_DIST_KVSTORE=OFF '
-        '-DMXNET_CUDA_ARCH="5.2" '
+        '-DMXNET_CUDA_ARCH="7.5" '
 
 Review comment:
   Please revert the CUDA change. That does not represent a fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608128467
 
 
   Based on offline discussion with Marco, let's use a patched version of the old AMI first to fix the CI. @josephevans helped to install VS Code 2019 on the old AMI. I have further reduced the diff of this PR to include only the minimal changes to switch to VS Code 2019 and the x64 toolchain.
   
   If this fixes the issue, we'll update to the new AMI with updated cuda and g4 instances at a later point after running it in the dev environment for a while.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402566380
 
 

 ##########
 File path: tests/python/unittest/test_gluon_data.py
 ##########
 @@ -285,6 +285,11 @@ def test_multi_worker_dataloader_release_pool():
 
 
 def test_dataloader_context():
+    if os.name == 'nt':
+        print("Skipping test_dataloader_context on Windows due to "
+              "https://github.com/apache/incubator-mxnet/issues/17961")
 
 Review comment:
   Please change back to G3.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608128467
 
 
   Based on offline discussion with Marco, let's use a patched version of the old AMI first to fix the CI. @josephevans helped to install VS Code 2019 on the old AMI. I have further reduced the diff of this PR to include only the minimal changes to switch to VS Code 2019 and the x64 toolchain.
   
   We'll update to the new AMI with update cuda etc, g4 instances at a later point after running it in the dev environment for a while.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402572899
 
 

 ##########
 File path: tests/python/unittest/test_gluon_data.py
 ##########
 @@ -285,6 +285,11 @@ def test_multi_worker_dataloader_release_pool():
 
 
 def test_dataloader_context():
+    if os.name == 'nt':
+        print("Skipping test_dataloader_context on Windows due to "
+              "https://github.com/apache/incubator-mxnet/issues/17961")
 
 Review comment:
   We are still verifying g3 on unix gpu tests. There is no difference in the cuda kernels between windows and unix. It's sufficient to have one pipeline run on g3 to verify the compatibility.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on a change in pull request #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on a change in pull request #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#discussion_r402568458
 
 

 ##########
 File path: tests/python/unittest/test_gluon_data.py
 ##########
 @@ -285,6 +285,11 @@ def test_multi_worker_dataloader_release_pool():
 
 
 def test_dataloader_context():
+    if os.name == 'nt':
+        print("Skipping test_dataloader_context on Windows due to "
+              "https://github.com/apache/incubator-mxnet/issues/17961")
 
 Review comment:
   Please provide a technical reason for using g3. Currently Jenkins faces connectivity issues on g3 instance.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608066411
 
 
   Re https://github.com/apache/incubator-mxnet/pull/17808#issuecomment-608055780
   
   > Connectivity errors were never an issue with Windows slaves
   
   Connectivity issues became a problem after switching to Windows Server 2019. The switch was done as the old AMI apparently can't be rebuilt anymore and a new AMI had to be started. I was not involved in that effort, but I think it's reasonable to get the Windows AMI instructions working again and choose the latest Windows Server for doing that.
   
   Jenkins connectivity issues typically result from system or network load problems. Moving to a g4 instance to run the Windows GPU tests resolved the connectivity issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608066411
 
 
   Re https://github.com/apache/incubator-mxnet/pull/17808#issuecomment-608055780
   
   > Connectivity errors were never an issue with Windows slaves
   
   Connectivity issues became a problem after switching to Windows Server 2019. The switch was done as the old AMI apparently can't be rebuilt anymore and a new AMI had to be started. I was not involved in that effort, but I think it's reasonable to get the Windows AMI instructions working again and choose the latest Windows Server for doing that.
   
   Jenkins connectivity issues typically result from system or network load problems. The newer version of Windows may have some issues causing network problems on a slower machine such as g3. If you have an alternative fix, please propose it.
   
   Moving to a g4 instance to run the Windows GPU tests resolved the connectivity issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] marcoabreu edited a comment on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
marcoabreu edited a comment on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608067529
 
 
   Well the new Ami should not have been moved into production then. Sorry, but changing a hundred knobs to facilitate one change isn't right. Either get a stable replacement and deploy that or leave stuff as it is. Replacing an existing system with an inferior version does not make sense to me. 
   
   These are some standards which I do not see aligned with the projects interest.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu edited a comment on issue #17962: Fix Windows GPU CI

Posted by GitBox <gi...@apache.org>.
leezu edited a comment on issue #17962: Fix Windows GPU CI
URL: https://github.com/apache/incubator-mxnet/pull/17962#issuecomment-608548689
 
 
   > Can you please summarize the changes and the reasonings in the PR description?
   
   Done
   
   > Also, setting up Windows for yourself, you might want to see how CI does it and why.
   
   https://github.com/apache/incubator-mxnet/pull/17808 will provide an updated setup. This PR is only a emergency fix.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services