You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@mxnet.apache.org by GitBox <gi...@apache.org> on 2020/08/19 05:15:26 UTC

[GitHub] [incubator-mxnet] cybaj opened a new issue #18960: Dockerfile build with mxnet dependency package no gpus detected

cybaj opened a new issue #18960:
URL: https://github.com/apache/incubator-mxnet/issues/18960


   ## Description
   (A clear and concise description of what the bug is.)
   
   ### TL;DR
   I want to build image which contains the library needed `mxnet` dependecy.
   So I added installation of the library and `mxnet` at Dockerfile. `mxnet` package was installed fine.
   But build was failed with `OSError: libcuda.so.1: cannot open shared object file: No such file or directory` at installation of the library.
   So I added `LD_LIBRARY_PATH` too. But in this case, not like before, any gpus were detected.
   
   
   ### cuda
   I used `nvcr.io/nvidia/pytorch:19.10-py3` image.
   Which contains below. [ref](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_19-10.html#rel_19-10)
   - NVIDIA CUDA 10.1.243 including cuBLAS 10.2.1.243
   - NVIDIA cuDNN 7.6.4
   So I installed with pypi `mxnet-cu101`.
   And I have checked `libcuda.so.1` exists in `/usr/local/cuda/compat/lib.real`
   
   
   ### Error Message
   (Paste the complete error message. Please also include stack trace by setting environment variable `DMLC_LOG_STACK_TRACE_DEPTH=10` before running your script.)
   1. Cannot install python package which have mxnet dependency with log below.
   ```
   Step 5/9 : RUN pip install git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2
    ---> Running in 2ede86c70b10
   Collecting kogpt2 from git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2
     Cloning https://github.com/cybaj/KoGPT2.git to /tmp/pip-install-kawcrvv2/kogpt2
     Running command git clone -q https://github.com/cybaj/KoGPT2.git /tmp/pip-install-kawcrvv2/kogpt2
       ERROR: Command errored out with exit status 1:
        command: /opt/conda/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-kawcrvv2/kogpt2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-kawcrvv2/kogpt2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
            cwd: /tmp/pip-install-kawcrvv2/kogpt2/
       Complete output (21 lines):
       Traceback (most recent call last):
         File "<string>", line 1, in <module>
         File "/tmp/pip-install-kawcrvv2/kogpt2/setup.py", line 1, in <module>
           from kogpt2 import __version__
         File "/tmp/pip-install-kawcrvv2/kogpt2/kogpt2/__init__.py", line 15, in <module>
           from . import model
         File "/tmp/pip-install-kawcrvv2/kogpt2/kogpt2/model/__init__.py", line 17, in <module>
           from .gpt import *
         File "/tmp/pip-install-kawcrvv2/kogpt2/kogpt2/model/gpt.py", line 24, in <module>
           import mxnet as mx
         File "/opt/conda/lib/python3.6/site-packages/mxnet/__init__.py", line 24, in <module>
           from .context import Context, current_context, cpu, gpu, cpu_pinned
         File "/opt/conda/lib/python3.6/site-packages/mxnet/context.py", line 24, in <module>
           from .base import classproperty, with_metaclass, _MXClassPropertyMetaClass
         File "/opt/conda/lib/python3.6/site-packages/mxnet/base.py", line 214, in <module>
           _LIB = _load_lib()
         File "/opt/conda/lib/python3.6/site-packages/mxnet/base.py", line 205, in _load_lib
           lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_LOCAL)
         File "/opt/conda/lib/python3.6/ctypes/__init__.py", line 348, in __init__
           self._handle = _dlopen(self._name, mode)
       OSError: libcuda.so.1: cannot open shared object file: No such file or directory
       ----------------------------------------
   ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
   The command '/bin/sh -c pip install git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2' returned a non-zero code: 1
   ```
   
   2. If I add `LD_LIBRARY_PATH` like below (which contains `libcuda.so.1`), then installation succeeds, but NO gpu detected.
   ```
   Step 9/10 : RUN python -c "import torch; print(torch.__version__); print(torch.cuda.device_count());"
    ---> Running in 8273db124d7f
   1.3.0a0+24ae9b5
   0
   Removing intermediate container 8273db124d7f
    ---> a8f922092018
   Step 10/10 : RUN python -c "import mxnet; print(mxnet.__version__); print(mxnet.util.get_gpu_count());"
    ---> Running in 59c987a815bc
   1.6.0
   0
   ```
   Without installation python library which need mxnet dep, all gpus dectected.
   
   ## To Reproduce
   Docker build with Dockerfile below.
   ```
   FROM nvcr.io/nvidia/pytorch:19.10-py3
    
   ENV LD_LIBRARY_PATH $LD_LIBRARY_PATH:/usr/local/cuda/compat/lib.real
    
   RUN pip install --no-cache-dir mxnet_cu101
   RUN pip install --no-cache-dir gluonnlp sentencepiece
   RUN pip install git+https://github.com/cybaj/KoGPT2.git#egg=kogpt2 # this need mxnet
   RUN pip install transformers==2.11.0
    
   WORKDIR /workspace
    
   RUN python -c "import torch; print(torch.__version__); print(torch.cuda.device_count());"
   RUN python -c "import mxnet; print(mxnet.__version__); print(mxnet.util.get_gpu_count());"
   ```
   
   
   ### Steps to reproduce
   
   1. Docker build with the Dockerfile
   2. Check gpus detection.
   ```
   RUN python -c "import torch; print(torch.__version__); print(torch.cuda.device_count());"
   RUN python -c "import mxnet; print(mxnet.__version__); print(mxnet.util.get_gpu_count());"
   ``` 
   
   ## What have you tried to solve it?
   
   1. Use other docker base image. 10.12 version. But failed.
   2. Build within other machine (other docker, which of version is latest), but failed.
   
   ## Environment
   
   We recommend using our script for collecting the diagnositc information. Run the following command and paste the outputs below:
   ```
   curl --retry 10 -s https://raw.githubusercontent.com/dmlc/gluon-nlp/master/tools/diagnose.py | python
   
   # paste outputs here
   ```
   404 Not Found that diagnose.py url.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18960: Dockerfile build with mxnet dependency package no gpus detected

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18960:
URL: https://github.com/apache/incubator-mxnet/issues/18960#issuecomment-675859103


   @TristonC is there anyone on your team who's familiar with the setup in the nv pytorch docker releases?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] szha commented on issue #18960: Dockerfile build with mxnet dependency package no gpus detected

Posted by GitBox <gi...@apache.org>.
szha commented on issue #18960:
URL: https://github.com/apache/incubator-mxnet/issues/18960#issuecomment-676650566


   @cybaj great to know. If you could share what you did as a solution in this issue, it would help others who run into the same issue. Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] cybaj commented on issue #18960: Dockerfile build with mxnet dependency package no gpus detected

Posted by GitBox <gi...@apache.org>.
cybaj commented on issue #18960:
URL: https://github.com/apache/incubator-mxnet/issues/18960#issuecomment-676214131


   I was able to fix this issue by setting `env LD_LIBRARY_PATH` to this image's originals.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-mxnet] github-actions[bot] commented on issue #18960: Dockerfile build with mxnet dependency package no gpus detected

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on issue #18960:
URL: https://github.com/apache/incubator-mxnet/issues/18960#issuecomment-675856622


   Welcome to Apache MXNet (incubating)! We are on a mission to democratize AI, and we are glad that you are contributing to it by opening this issue.
   Please make sure to include all the relevant context, and one of the @apache/mxnet-committers will be here shortly.
   If you are interested in contributing to our project, let us know! Also, be sure to check out our guide on [contributing to MXNet](https://mxnet.apache.org/community/contribute) and our [development guides wiki](https://cwiki.apache.org/confluence/display/MXNET/Developments).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org