You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/12/29 19:27:44 UTC
[GitHub] [tvm] ResidentMario opened a new issue #7177: TVM fails CUDA version check incorrectly
ResidentMario opened a new issue #7177:
URL: https://github.com/apache/tvm/issues/7177
## Summary
I am attempting to install TMV from source on an NVIDIA T4 GPU machine on AWS, following the instructions on the [Install From Source](https://tvm.apache.org/docs/install/from_source.html) page in the TVM docs. However, attempting to run the following demo code results a `CUDA: CUDA driver version is insufficient for CUDA runtime version` error.
I am reporting this here as a bug because—to the best of my ability to do so—I have ruled out all possible reasons why this error would occur besides a bug in the TVM library itself.
## Traceback
```python
import tvm
print(tvm.gpu(0).exist)
print(tvm.gpu(0).compute_version)
```
```
---------------------------------------------------------------------------
TVMError Traceback (most recent call last)
<ipython-input-5-98d78ab480fc> in <module>
1 import tvm
2 print(tvm.gpu(0).exist)
----> 3 print(tvm.gpu(0).compute_version)
/opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/_ffi/runtime_ctypes.py in compute_version(self)
235 The version string in `major.minor` format.
236 """
--> 237 return self._GetDeviceAttr(self.device_type, self.device_id, 4)
238
239 @property
/opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/_ffi/runtime_ctypes.py in _GetDeviceAttr(self, device_type, device_id, attr_id)
202 import tvm.runtime._ffi_api
203
--> 204 return tvm.runtime._ffi_api.GetDeviceAttr(device_type, device_id, attr_id)
205
206 @property
/opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/_ffi/_ctypes/packed_func.py in __call__(self, *args)
235 != 0
236 ):
--> 237 raise get_last_ffi_error()
238 _ = temp_args
239 _ = args
TVMError: Traceback (most recent call last):
[bt] (3) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(TVMFuncCall+0x65) [0x7fec2395b985]
[bt] (2) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(+0x1211fa9) [0x7fec23959fa9]
[bt] (1) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(tvm::runtime::CUDADeviceAPI::GetAttr(DLContext, tvm::runtime::DeviceAttrKind, tvm::runtime::TVMRetValue*)+0x9fd) [0x7fec23a03c2d]
[bt] (0) /opt/conda/envs/spell/lib/python3.9/site-packages/tvm-0.8.dev392+gb8ac8d94d-py3.9-linux-x86_64.egg/tvm/libtvm.so(+0x12bada2) [0x7fec23a02da2]
File "/tmp/tvm/src/runtime/cuda/cuda_device_api.cc", line 62
TVMError:
---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading == false: CUDA: CUDA driver version is insufficient for CUDA runtime version
```
## Machine details
The machine in question is an NVIDIA T4 instance on AWS running an internal Ubuntu Linux image. Configuration details:
```bash
$ find / -path **/libcuda.so -type f
/usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libcuda.so
$ find / -path **/nvcc -type f
/usr/local/cuda-10.0/bin/nvcc
$ which nvcc
/usr/local/cuda/bin/nvcc
$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Sat_Aug_25_21:08:01_CDT_2018
Cuda compilation tools, release 10.0, V10.0.130
$ ls /usr/local/cuda/
bin/ doc/ include@ LICENSE nvvm/ share/ targets/
compat/ extras/ lib64@ nvml/ README src/ version.txt
$ ls /usr/local/cuda-10.0/
bin/ doc/ include@ LICENSE nvvm/ share/ targets/
compat/ extras/ lib64@ nvml/ README src/ version.txt
$ cp /usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libcuda.so /usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so
cp: '/usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs/libcuda.so' and '/usr/local/cuda/targets/x86_64-linux/lib/stubs/libcuda.so' are the same file
$ nvidia-smi
Tue Dec 29 19:19:08 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 Off | 00000000:00:1E.0 Off | 0 |
| N/A 33C P0 26W / 70W | 1060MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
$ python /path/to/train_basic.py
# Runs https://github.com/spellml/cnn-cifar10/blob/master/models/train_basic.py, an on-CUDA
# Python training script. Succeeds.
```
To my knowledge, this verifies that:
* This machine _only_ has CUDA 10.0.130 installed, and hence when TVM builds it _should_ link to this version of CUDA.
* Driver version is 450.80.02, runtime version is 10.0.130. [NVIDIA's compatibility table](https://docs.nvidia.com/deploy/cuda-compatibility/index.html) states that `10.0.130` requires driver version `>=410.48`, we have `450.80.02` so we should be good.
* The CUDA stack is in a working state that other libraries are able to use successfully (the PyTorch smoke test passes).
## Install process
To build TVM, I first created a `conda` environment with the following packages installed:
```yaml
name: spell
channels:
- conda-forge
dependencies:
- numpy
- pandas
- xgboost
- tornado
- pip:
- torch
- cloudpickle
- psutil
```
I then followed the instructions in the [Install From Source](https://tvm.apache.org/docs/install/from_source.html) page in the docs. [Here is the exact script I used](https://gist.github.com/ResidentMario/f9d9a3235c4862ab71bb80279745bfcd).
## Possible explanations
From [this SO comment](https://stackoverflow.com/questions/65486872/according-to-apache-tvm-cuda-driver-and-runtime-versions-are-incompatible-even#comment115791481_65486872):
* The CUDA stack on this machine is broken (ruled out with the PyTorch smoke test).
* TVM is compiling against CUDA 11 (this shouldn't be possible, CUDA 11 is not installed on this machine, I think I've shown this to be true).
* There is a bug in the TVM source code.
* There is some other unknown unknown I do not know about (I am not a CUDA developer!).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] junrushao1994 closed issue #7177: TVM fails CUDA version check incorrectly
Posted by GitBox <gi...@apache.org>.
junrushao1994 closed issue #7177:
URL: https://github.com/apache/tvm/issues/7177
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] junrushao1994 commented on issue #7177: TVM fails CUDA version check incorrectly
Posted by GitBox <gi...@apache.org>.
junrushao1994 commented on issue #7177:
URL: https://github.com/apache/tvm/issues/7177#issuecomment-752223092
I believe that it is a TVM bug, but may related to the cuda installation. I am happy to assist more on the forum: https://discuss.tvm.apache.org/t/cuda-driver-version-is-insufficient-for-cuda-runtime-version/8764
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] junrushao1994 edited a comment on issue #7177: TVM fails CUDA version check incorrectly
Posted by GitBox <gi...@apache.org>.
junrushao1994 edited a comment on issue #7177:
URL: https://github.com/apache/tvm/issues/7177#issuecomment-752223092
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org