You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/29 19:27:44 UTC

[GitHub] [tvm] ResidentMario opened a new issue #7177: TVM fails CUDA version check incorrectly

ResidentMario opened a new issue #7177:
URL: https://github.com/apache/tvm/issues/7177


   ## Summary
   
   I am attempting to install TMV from source on an NVIDIA T4 GPU machine on AWS, following the instructions on the [Install From Source](https://tvm.apache.org/docs/install/from_source.html) page in the TVM docs. However, attempting to run the following demo code results a `CUDA: CUDA driver version is insufficient for CUDA runtime version` error.
   
   I am reporting this here as a bug because&mdash;to the best of my ability to do so&mdash;I have ruled out all possible reasons why this error would occur besides a bug in the TVM library itself.
   
   ## Traceback
   
   ```python
   import tvm
   print(tvm.gpu(0).exist)
   print(tvm.gpu(0).compute_version)
   ```
   
   ```
   ---------------------------------------------------------------------------
   TVMError                                  Traceback (most recent call last)
   <ipython-input-5-98d78ab480fc> in <module>
         1 import tvm
         2 print(tvm.gpu(0).exist)
   ----> 3 print(tvm.gpu(0).compute_version)
   
   /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/_ffi/runtime_ctypes.py in compute_version(self)
       235             The version string in `major.minor` format.
       236         """
   --> 237         return self._GetDeviceAttr(self.device_type, self.device_id, 4)
       238 
       239     @property
   
   /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/_ffi/runtime_ctypes.py in _GetDeviceAttr(self, device_type, device_id, attr_id)
       202         import tvm.runtime._ffi_api
       203 
   --> 204         return tvm.runtime._ffi_api.GetDeviceAttr(device_type, device_id, attr_id)
       205 
       206     @property
   
   /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/_ffi/_ctypes/packed_func.py in __call__(self, *args)
       235             != 0
       236         ):
   --> 237             raise get_last_ffi_error()
       238         _ = temp_args
       239         _ = args
   
   TVMError: Traceback (most recent call last):
     [bt] (3) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7fec2395b985]
     [bt] (2) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(+0x1211fa9) [0x7fec23959fa9]
     [bt] (1) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::CUDADeviceAPI::GetAttr(DLContext, tvm::runtime::DeviceAttrKind, tvm::runtime::TVMRetValue*)+0x9fd) [0x7fec23a03c2d]
     [bt] (0) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(+0x12bada2) [0x7fec23a02da2]
     File "/tmp/tvm/src/runtime/cuda/cuda_device_api.cc", line 62
   TVMError: 
   ---------------------------------------------------------------
   An internal invariant was violated during the execution of TVM.
   Please read TVM's error reporting guidelines.
   More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
   ---------------------------------------------------------------
     Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: CUDA driver version is insufficient for CUDA runtime version
   ```
   
   ## Machine details
   The machine in question is an NVIDIA T4 instance on AWS running an internal Ubuntu Linux image. Configuration details:
   
   ```bash
   $ find / -path **/libcuda.so -type f
   /usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libcuda.so
   $ find / -path **/nvcc -type f
   /usr/local/cuda-10.0/bin/nvcc
   $ which nvcc
   /usr/local/cuda/bin/nvcc
   $ nvcc --version
   nvcc: NVIDIA (R) Cuda compiler driver
   Copyright (c) 2005-2018 NVIDIA Corporation
   Built on Sat_Aug_25_21:08:01_CDT_2018
   Cuda compilation tools, release 10.0, V10.0.130
   $ ls /usr/local/cuda/
   bin/     doc/     include@  LICENSE  nvvm/   share/  targets/
   compat/  extras/  lib64@    nvml/    README  src/    version.txt
   $ ls /usr/local/cuda-10.0/
   bin/     doc/     include@  LICENSE  nvvm/   share/  targets/
   compat/  extras/  lib64@    nvml/    README  src/    version.txt
   $ cp /usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libcuda.so /usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
   cp: '/usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libcuda.so' and '/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so' are the same file
   $ nvidia-smi
   Tue Dec 29 19:19:08 2020       
   +-----------------------------------------------------------------------------+
   | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
   |-------------------------------+----------------------+----------------------+
   | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
   |                               |                      |               MIG M. |
   |===============================+======================+======================|
   |   0  Tesla T4            Off  | 00000000:00:1E.0 Off |                    0 |
   | N/A   33C    P0    26W /  70W |   1060MiB / 15109MiB |      0%      Default |
   |                               |                      |                  N/A |
   +-------------------------------+----------------------+----------------------+
                                                                                  
   +-----------------------------------------------------------------------------+
   | Processes:                                                                  |
   |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
   |        ID   ID                                                   Usage      |
   |=============================================================================|
   +-----------------------------------------------------------------------------+
   $ python /path/to/train_basic.py
   # Runs https://github.com/spellml/cnn-cifar10/blob/master/models/train_basic.py, an on-CUDA
   # Python training script. Succeeds.
   ```
   
   To my knowledge, this verifies that:
   * This machine _only_ has CUDA 10.0.130 installed, and hence when TVM builds it _should_ link to this version of CUDA.
   * Driver version is 450.80.02, runtime version is 10.0.130. [NVIDIA's compatibility table](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) states that `10.0.130` requires driver version `>=410.48`, we have `450.80.02` so we should be good.
   * The CUDA stack is in a working state that other libraries are able to use successfully (the PyTorch smoke test passes).
   
   ## Install process
   
   To build TVM, I first created a `conda` environment with the following packages installed:
   
   ```yaml
   name: spell
   channels:
     - conda-forge
   dependencies:
     - numpy
     - pandas
     - xgboost
     - tornado
     - pip:
        - torch
        - cloudpickle
        - psutil
   ```
   
   I then followed the instructions in the [Install From Source](https://tvm.apache.org/docs/install/from_source.html) page in the docs. [Here is the exact script I used](https://gist.github.com/ResidentMario/f9d9a3235c4862ab71bb80279745bfcd).
   
   ## Possible explanations
   
   From [this SO comment](https://stackoverflow.com/questions/65486872/according-to-apache-tvm-cuda-driver-and-runtime-versions-are-incompatible-even#comment115791481_65486872):
   
   * The CUDA stack on this machine is broken (ruled out with the PyTorch smoke test).
   * TVM is compiling against CUDA 11 (this shouldn't be possible, CUDA 11 is not installed on this machine, I think I've shown this to be true).
   * There is a bug in the TVM source code.
   * There is some other unknown unknown I do not know about (I am not a CUDA developer!).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] junrushao1994 closed issue #7177: TVM fails CUDA version check incorrectly

Posted by GitBox <gi...@apache.org>.

junrushao1994 closed issue #7177:
URL: https://github.com/apache/tvm/issues/7177


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] junrushao1994 commented on issue #7177: TVM fails CUDA version check incorrectly

Posted by GitBox <gi...@apache.org>.

junrushao1994 commented on issue #7177:
URL: https://github.com/apache/tvm/issues/7177#issuecomment-752223092


   I believe that it is a TVM bug, but may related to the cuda installation. I am happy to assist more on the forum: https://discuss.tvm.apache.org/t/cuda-driver-version-is-insufficient-for-cuda-runtime-version/8764


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [tvm] junrushao1994 edited a comment on issue #7177: TVM fails CUDA version check incorrectly

Posted by GitBox <gi...@apache.org>.

junrushao1994 edited a comment on issue #7177:
URL: https://github.com/apache/tvm/issues/7177#issuecomment-752223092






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org