You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/03/30 02:26:32 UTC

[GitHub] [incubator-mxnet] ChaiBapchya opened a new issue #17935: Windows CI CUDA Intermitted error C2993

ChaiBapchya opened a new issue #17935: Windows CI CUDA Intermitted error C2993
URL: https://github.com/apache/incubator-mxnet/issues/17935
 
 
   ## Description
   Intermittent failure seen on windows-gpu compilation phase (WIN_GPU/WIN_GPU_MKLDNN)
   
   Discovered in this PR : https://github.com/apache/incubator-mxnet/pull/17808
   
   Related to https://github.com/pytorch/pytorch/issues/25393
   
   ### Error Message
   
   It intermittently gives the error :
   
   ```
   C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2993: 'T': illegal type for non-type template parameter '__formal
   ```
   
   Errors:
   ```
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2993: 'T': illegal type for non-type template parameter '__formal'
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): note: see reference to class template instantiation 'thrust::detail::allocator_traits_detail::has_value_type<T>' being compiled
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2065: 'U1': undeclared identifier
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2923: 'std::_Select<__formal>::_Apply': 'U1' is not a valid template type argument for parameter '<unnamed-symbol>'
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2144: syntax error: 'unknown-type' should be preceded by ')'
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2144: syntax error: 'unknown-type' should be preceded by ';'
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2238: unexpected token(s) preceding ';'
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2059: syntax error: ')'
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2988: unrecognizable template declaration/definition
   [2020-03-29T04:47:50.014Z] C:\PROGRA~1\NVIDIA~2\CUDA\v10.2\bin/../include\thrust/detail/allocator/allocator_traits.h(42): error C2059: syntax error: '<end Parse>'
   ```
   
   Entire stack trace:
   http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/windows-gpu/branches/PR-17808/runs/15/nodes/39/log/?start=0
   
   ## To Reproduce
   Build using Windows AMI and run 
   Clone repo &
   `py -3 ci/build_windows.py -f WIN_GPU`
   
   
   
   ## What have you tried to solve it?
   
   1. Use cuda 10.2 instead of 9.2
   2. Updated VS2019
   3. Add cmake flag : /Zc:__cplusplus
   
   Currently, what is found to work:
   Introduced max retries = 5
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17935: Windows CI CUDA Intermitted error C2993

Posted by GitBox <gi...@apache.org>.
ChaiBapchya commented on issue #17935: Windows CI CUDA Intermitted error C2993
URL: https://github.com/apache/incubator-mxnet/issues/17935#issuecomment-605750318
 
 
   @mxnet-label-bot add [ci, windows]

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya removed a comment on issue #17935: Windows CI CUDA Intermitted error C2993

Posted by GitBox <gi...@apache.org>.
ChaiBapchya removed a comment on issue #17935: Windows CI CUDA Intermitted error C2993
URL: https://github.com/apache/incubator-mxnet/issues/17935#issuecomment-605870001
 
 
   ```
   OSError: [WinError 126] The specified module could not be found
   ```
   
   This one means the libmxnet.dll is not in the DLL Search path [acc to Stackoverflow]
   Fix would be to add it into the environment
   https://stackoverflow.com/questions/43987081/openslide-python-import-error
   
   What's surprising is : libmxnet.dll & mxnet_70.dll are being packed into `windows_package.7z` and unpacked during test phase correctly.
   While WIN_CPU tests don't give this error, WIN_GPU fails 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17935: Windows CI CUDA Intermittent error C2993

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17935: Windows CI CUDA Intermittent error C2993
URL: https://github.com/apache/incubator-mxnet/issues/17935#issuecomment-608873393
 
 
   Created an upstream issue: https://github.com/thrust/thrust/issues/1090

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] ChaiBapchya commented on issue #17935: Windows CI CUDA Intermitted error C2993

Posted by GitBox <gi...@apache.org>.
ChaiBapchya commented on issue #17935: Windows CI CUDA Intermitted error C2993
URL: https://github.com/apache/incubator-mxnet/issues/17935#issuecomment-605870001
 
 
   ```
   OSError: [WinError 126] The specified module could not be found
   ```
   
   This one means the libmxnet.dll is not in the DLL Search path [acc to Stackoverflow]
   Fix would be to add it into the environment
   https://stackoverflow.com/questions/43987081/openslide-python-import-error
   
   What's surprising is : libmxnet.dll & mxnet_70.dll are being packed into `windows_package.7z` and unpacked during test phase correctly.
   While WIN_CPU tests don't give this error, WIN_GPU fails 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-mxnet] leezu commented on issue #17935: Windows CI CUDA Intermittent error C2993

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17935:
URL: https://github.com/apache/incubator-mxnet/issues/17935#issuecomment-622244193


   @vexilligera did you test if the error also occurs on more recent versions of thrust? I suggest we try installing thrust 1.9.8 version on Windows CI, which is the version that'll be shipped with Cuda 11
   
   We do that on Ubuntu CI already
   
   https://github.com/apache/incubator-mxnet/blob/76fa58373636c57fee1e4e6cd7960723b39f455f/ci/docker/Dockerfile.build.ubuntu#L144-L150


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #17935: Windows CI CUDA Intermittent error C2993

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17935:
URL: https://github.com/apache/incubator-mxnet/issues/17935#issuecomment-622491496


   There is another suggested fix at https://github.com/pytorch/pytorch/issues/25393#issuecomment-619547577
   
   cc @vexilligera 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] leezu commented on issue #17935: Windows CI CUDA Intermittent error C2993

Posted by GitBox <gi...@apache.org>.
leezu commented on issue #17935:
URL: https://github.com/apache/incubator-mxnet/issues/17935#issuecomment-626100625


   Seems to be a nvcc bug https://github.com/thrust/thrust/issues/1090#issuecomment-626080333


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org