You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mxnet.apache.org by GitBox <gi...@apache.org> on 2022/04/21 07:13:19 UTC

[GitHub] [incubator-mxnet] bgawrych opened a new issue, #21006: [CI] broken GPU testing stage

bgawrych opened a new issue, #21006:
URL: https://github.com/apache/incubator-mxnet/issues/21006

   ## Description
   CI jobs running on GPU (centos-gpu, unix-gpu and website) fails with following error:
   `[2022-04-20T13:09:33.419Z] docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown.`
   ...
   `[2022-04-20T13:09:33.419Z] docker: Error response from daemon: Unknown runtime specified nvidia.`
   
   ## Occurrences
   [PR#1
   ](https://jenkins.mxnet-ci.com/blue/organizations/jenkins/mxnet-validation%2Funix-gpu/detail/PR-21004/3/pipeline/246)
   [PR#2](https://jenkins.mxnet-ci.com/blue/organizations/jenkins/mxnet-validation%2Fcentos-gpu/detail/PR-20999/7/pipeline)
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] josephevans commented on issue #21006: [CI] broken GPU testing stage

Posted by GitBox <gi...@apache.org>.
josephevans commented on issue #21006:
URL: https://github.com/apache/incubator-mxnet/issues/21006#issuecomment-1115450331

   Hi, I actually fixed the original issue by creating updated AMIs.
   
   I believe the new issue is around new keys deployed by Nvidia for the Cuda and ML repos, but the docker images don't contain these keys. I'm expecting Nvidia to publish new docker images soon with the updated keys, based on these threads:
   
   https://forums.developer.nvidia.com/t/notice-cuda-linux-repository-key-rotation/212771
   https://gitlab.com/nvidia/container-images/cuda/-/issues/158
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] DickJC123 commented on issue #21006: [CI] broken GPU testing stage

Posted by GitBox <gi...@apache.org>.
DickJC123 commented on issue #21006:
URL: https://github.com/apache/incubator-mxnet/issues/21006#issuecomment-1115428816

   Before I dive into this more, could you check if the suggestions here for rebooting are helpful in this case: https://stackoverflow.com/questions/65721900/failed-to-initialize-nvml-driver-library-version-mismatch-is-ubuntu-server


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org


[GitHub] [incubator-mxnet] github-actions[bot] commented on issue #21006: [CI] broken GPU testing stage

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #21006:
URL: https://github.com/apache/incubator-mxnet/issues/21006#issuecomment-1104803171

   Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
   Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
   If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on [contributing to MXNet](https://mxnet.apache.org/community/contribute) and our [development guides wiki](https://cwiki.apache.org/confluence/display/MXNET/Developments).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@mxnet.apache.org
For additional commands, e-mail: issues-help@mxnet.apache.org