You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2021/02/24 02:59:40 UTC

[GitHub] [incubator-mxnet] Zha0q1 opened a new issue #19948: [v1.x] CD cu102 110 test stage [Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU]

Zha0q1 opened a new issue #19948:
URL: https://github.com/apache/incubator-mxnet/issues/19948


   This issue start to happen after we use the new ami for restricted-mxnetlinux-gpu.
   
   https://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/restricted-mxnet-cd%2Fmxnet-cd-release-job-1.x/detail/mxnet-cd-release-job-1.x/1553/pipeline
   
   I was able to reproduce by basically building the same image as in the cd pipeline, using the same g3 instance and the same ami
   ```
   docker build -f docker/Dockerfile.build.ubuntu_gpu_cu102 --build-arg USER_ID=1001 --build-arg GROUP_ID=1001 --cache-from 021742426385.dkr.ecr.us-west-2.amazonaws.com/mxnet-ci:build.ubuntu_gpu_cu102-81dcd5660530 -t 021742426385.dkr.ecr.us-west-2.amazonaws.com/mxnet-ci:build.ubuntu_gpu_cu102-81dcd5660530 docker
   ```
   
   after entering the docker container I did
   ```
   pip3 install mxnet-cu102
   ```
   I was able to reproduce the exact error by running
   ```python
   >>> import mxnet
   >>> import mxnet as mx
   >>> ctx = mx.gpu(0)
   >>> a = mx.nd.ones((100), ctx=ctx)
   ```
   
   ```
   [02:55:03] src/base.cc:49: GPU context requested, but no GPUs found.
   Traceback (most recent call last):
     File "<stdin>", line 1, in <module>
     File "/usr/local/lib/python3.7/dist-packages/mxnet/ndarray/ndarray.py", line 3295, in ones
       return _internal._ones(shape=shape, ctx=ctx, dtype=dtype, **kwargs)
     File "<string>", line 39, in _ones
     File "/usr/local/lib/python3.7/dist-packages/mxnet/_ctypes/ndarray.py", line 91, in _imperative_invoke
       ctypes.byref(out_stypes)))
     File "/usr/local/lib/python3.7/dist-packages/mxnet/base.py", line 246, in check_call
       raise get_last_ffi_error()
   mxnet.base.MXNetError: Traceback (most recent call last):
     File "src/engine/threaded_engine.cc", line 331
   MXNetError: Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Zha0q1 commented on issue #19948: [v1.x] CD cu102 110 test stage [Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU]

Posted by GitBox <gi...@apache.org>.
Zha0q1 commented on issue #19948:
URL: https://github.com/apache/incubator-mxnet/issues/19948#issuecomment-785654604


   on 1.x cd we use <cuda versoin>-devel-ubuntu16.04 as our base images. And I would enter the containers with `--gpus all`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Zha0q1 closed issue #19948: [v1.x] CD cu102 110 test stage [Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU]

Posted by GitBox <gi...@apache.org>.
Zha0q1 closed issue #19948:
URL: https://github.com/apache/incubator-mxnet/issues/19948


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Zha0q1 edited a comment on issue #19948: [v1.x] CD cu102 110 test stage [Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU]

Posted by GitBox <gi...@apache.org>.
Zha0q1 edited a comment on issue #19948:
URL: https://github.com/apache/incubator-mxnet/issues/19948#issuecomment-785654604


   on 1.x cd we use `<cuda versoin>-devel-ubuntu16.04` as our base images. And I would enter the containers with `--gpus all`


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Zha0q1 commented on issue #19948: [v1.x] CD cu102 110 test stage [Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU]

Posted by GitBox <gi...@apache.org>.
Zha0q1 commented on issue #19948:
URL: https://github.com/apache/incubator-mxnet/issues/19948#issuecomment-785653862


   I think this is a nvidia driver issue? I tried
   
   1. switching the instance type from g3 (m60 gpu) to g4(t4 gpu)
   2. creating the same image on my p3 (v100 gpu) instance
   3. uninstalling 460 drivers and reinstalling 450/460 drivers
   
   and none of these attempts worked.
   
   I also tried pytorch cuda102 within the same docker container and it also could not find gpu


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Zha0q1 commented on issue #19948: [v1.x] CD cu102 110 test stage [Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU]

Posted by GitBox <gi...@apache.org>.
Zha0q1 commented on issue #19948:
URL: https://github.com/apache/incubator-mxnet/issues/19948#issuecomment-790841137


   fixed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] Zha0q1 commented on issue #19948: [v1.x] CD cu102 110 test stage [Check failed: device_count_ > 0 (-1 vs. 0) : GPU usage requires at least 1 GPU]

Posted by GitBox <gi...@apache.org>.
Zha0q1 commented on issue #19948:
URL: https://github.com/apache/incubator-mxnet/issues/19948#issuecomment-784713183


   CC @josephevans @leezu @ptrendx 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org