You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/12/28 04:43:05 UTC
[GitHub] [tvm] wangzy0327 opened a new issue, #13666: [Bug] rocm platform result are not correct
wangzy0327 opened a new issue, #13666:
URL: https://github.com/apache/tvm/issues/13666
I tried to execute mnist-model by tvm in rocm platform(rocm 5.2). The result of execution is error.
### Expected behavior
The result of rocm platform equals result of cuda platform or opencl platform
### Actual behavior
The result of rocm platform not equal result of cuda platform or opencl platform
### Environment
Operating System:Ubuntu 20.04
TVM version : 7f1856d34f03113dc3a7733c010be43446161944
platform: rocm 5.2
Any environment details, such as: Operating System, TVM version, etc
### Steps to reproduce
There is the test code.
<details>
<summary>onnx_rocm.py</summary>
```
from csv import writer
from pyexpat import model
import onnx
#from tvm.driver import tvmc
import numpy as np
import tvm
import tvm.relay as relay
from tvm.contrib import graph_executor
import tvm.testing
import numpy as np
import os
class NetworkData():
def __init__(self,name:str,net_set:list,prefix_str:str,suffix_str:str,input_name:str,input:tuple,output:tuple):
self.name = name
self.net_set = net_set
self.prefix_str = prefix_str
self.suffix_str = suffix_str
self.input_name = input_name
self.input = input
self.output = output
mnist_networkData = NetworkData(name = "mnist",
net_set = ["mnist-7","mnist-8"],
prefix_str = "mnist/model/",
suffix_str = ".onnx",
input_name = 'Input3',
input = (1,1,28,28),
output = (1,10))
MODEL_NAME = {
"mnist":mnist_networkData,
}
dtype="float32"
common_prefix_str = "onnx-model/vision/classification/"
tol_paras = [1e-7,1e-6,1e-5,1e-4,1e-3,1e-2]
import logging
logging.basicConfig(level=logging.ERROR)
import warnings
warnings.filterwarnings('ignore')
def build(target:str,mod:tvm.IRModule, params:dict, input_name:str, input_data:np.ndarray, input:tuple, output: tuple) -> np.ndarray:
tgt = tvm.target.Target(target=target, host="llvm")
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target=target, params=params)
# print(lib.get_lib().imported_modules[0].get_source())
# print("------------------------source code start----------------------------")
# print(lib.get_lib().imported_modules[0].get_source())
# print("------------------------source code end----------------------------")
dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))
module.set_input(input_name, input_data)
module.run()
output_shape = output
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()
return tvm_output
def main(model_network : NetworkData):
# 设置随机种子
np.random.seed(0)
I_np = np.random.uniform(size = model_network.input).astype(dtype)
print(I_np[0][0][0][:10])
header = ['network_name','network_sub_name','input','output','tolerance','rocm_cost_time','opencl_cost_time']
rows = []
for child_model_network in model_network.net_set:
print("--------"+child_model_network+"----start-------------")
onnx_model = onnx.load(common_prefix_str +
model_network.prefix_str +
child_model_network +
model_network.suffix_str)
shape_dict = {model_network.input_name: I_np.shape}
mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
import datetime
# opencl_starttime = datetime.datetime.now()
# opencl_output = build("opencl",mod = mod,params = params,input_name = model_network.input_name,input_data = I_np, input = I_np.shape, output = model_network.output)
# opencl_endtime = datetime.datetime.now()
# opencl_duringtime = opencl_endtime - opencl_starttime
# print("%15s network opencl cost time is %s s"%(child_model_network,opencl_duringtime))
rocm_starttime = datetime.datetime.now()
rocm_output = build("rocm",mod = mod,params = params,input_name = model_network.input_name,input_data = I_np, input = I_np.shape, output = model_network.output)
rocm_endtime = datetime.datetime.now()
rocm_duringtime = rocm_endtime - rocm_starttime
print("%15s network rocm cost time is %s s"%(child_model_network,rocm_duringtime))
opencl_starttime = datetime.datetime.now()
opencl_output = build("opencl",mod = mod,params = params,input_name = model_network.input_name,input_data = I_np, input = I_np.shape, output = model_network.output)
opencl_endtime = datetime.datetime.now()
opencl_duringtime = opencl_endtime - opencl_starttime
print("%15s network opencl cost time is %s s"%(child_model_network,opencl_duringtime))
if rocm_output.ndim > 2:
rocm_output = rocm_output.reshape(rocm_output.shape[0],rocm_output.shape[1])
opencl_output = opencl_output.reshape(opencl_output.shape[0],opencl_output.shape[1])
print(rocm_output[0][:10])
print(opencl_output[0][:10])
row = {'network_name': model_network.name,'network_sub_name':child_model_network, 'input':model_network.input, 'output':model_network.output, 'rocm_cost_time':rocm_duringtime,'opencl_cost_time':opencl_duringtime}
for para in tol_paras:
if np.allclose(rocm_output,opencl_output,rtol=para, atol=para):
row["tolerance"] = para
rows.append(row)
print("%15s opencl network tolerance is %g"%(child_model_network,para))
break
import csv
file_exist = False
access_mode = 'w+'
model_network_file = model_network.name+'_network_test_data.csv'
if os.path.exists(model_network_file):
file_exist = True
access_mode = 'a+'
with open(model_network_file,access_mode,encoding='utf-8',newline='') as f:
writer = csv.DictWriter(f,header)
if not file_exist :
writer.writeheader()
writer.writerows(rows)
pass
for name,each_network in MODEL_NAME.items():
print("-----------"+name+"----start----------------")
main(each_network)
```
</details>
The result of program as follow.
![image](https://user-images.githubusercontent.com/22990858/209758590-f9a88ac9-fdee-452b-9012-7b72d6c270d8.png)
### Triage
Please refer to the list of label tags [here](https://github.com/apache/tvm/wiki/Issue-Triage-Labels) to find the relevant tags and add them below in a bullet format (example below).
* needs-triage
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1387643084
Tried this with the following docker image I built from latest ROCm:
```
docker pull mevermeulen/rocm-tvm:5.4.2
```
I didn't have OpenCL built in that so I compared with CPU execution and I don't see an issue:
```
root@chilecito:/src/rocm-tvm/qa# python3 /home/mev/onnx_rocm.py
[0.5488135 0.71518934 0.60276335 0.5448832 0.4236548 0.6458941
0.4375872 0.891773 0.96366274 0.3834415 ]
[-0.22859086 -0.25806987 -0.43340546 0.4846983 -0.6018106 0.22698797
0.85465795 -0.9607101 0.5279621 -1.1830723 ]
[-0.22859041 -0.25806972 -0.43340546 0.4846975 -0.6018108 0.2269876
0.8546581 -0.9607104 0.527962 -1.1830723 ]
```
To compare against the CPU, I modified the last part of the program as follows:
```def main():
np.random.seed(0)
I_np = np.random.uniform(size = input_size).astype(dtype)
print(I_np[0][0][0][:10])
onnx_model = onnx.load("/home/mev/mnist-7.onnx")
mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape})
rocm_output = build("rocm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
cpu_output = build("llvm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
# opencl_output = build("opencl",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
print(rocm_output[0][:10])
print(cpu_output[0][:10])
# print(opencl_output[0][:10])
```
@wangzy0327 does my docker work for you? If so, a spot you can use for comparison.
Also can you cross check that your ROCm installation and driver is properly installed. For example you can try:
```
prompt% rocminfo
prompt% cd /opt/rocm/share/hip/samples/0_Intro/square
prompt% make
prompt% cat square.out
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1370942424
@mashi
The minimal test case as follow.
<details>
<summary>onnx_rocm.py</summary>
```
from csv import writer
from pyexpat import model
import onnx
#from tvm.driver import tvmc
import numpy as np
import tvm
import tvm.relay as relay
from tvm.contrib import graph_executor
import tvm.testing
import numpy as np
import os
import logging
logging.basicConfig(level=logging.ERROR)
import warnings
warnings.filterwarnings('ignore')
//onnx-model mnist-7
def build(target:str,mod:tvm.IRModule, params:dict, input_name:str, input_data:np.ndarray, input:tuple, output: tuple) -> np.ndarray:
tgt = tvm.target.Target(target=target, host="llvm")
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target=target, params=params)
dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))
module.set_input(input_name, input_data)
module.run()
output_shape = output
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()
return tvm_output
def main(model_network : NetworkData):
# 设置随机种子
np.random.seed(0)
I_np = np.random.uniform(size = model_network.input).astype(dtype)
print(I_np[0][0][0][:10])
onnx_model = onnx.load("onnx-model/vision/classification/mnist/model/mnist-7.onnx")
mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape})
rocm_output = build("rocm",mod = mod,params = params,input_name = model_network.input_name,input_data = I_np, input = I_np.shape, output = model_network.output)
opencl_output = build("opencl",mod = mod,params = params,input_name = model_network.input_name,input_data = I_np, input = I_np.shape, output = model_network.output)
print(rocm_output[0][:10])
print(opencl_output[0][:10])
for name,each_network in MODEL_NAME.items():
print("-----------"+name+"----start----------------")
main(each_network)
```
</details>
![image](https://user-images.githubusercontent.com/22990858/210567401-b56415ed-c605-49d4-b427-b17427dd253f.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "wangzy0327 (via GitHub)" <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1495285831
> [mvermeulen](/mvermeulen)
@mvermeulen hello,I tried to run the docker mevermeulen/rocm-tvm:5.4.2 with the python(anaconda conda virtual on host OS), but I get the error.
```
root@a67fdbf3dda4:/src/tvm/build# /home/wzy/anaconda3/envs/tvm/bin/python /home/wzy/tvm-sycl/onnx_rocm.py
Traceback (most recent call last):
File "/home/wzy/tvm-sycl/onnx_rocm.py", line 5, in <module>
import tvm
File "/src/tvm/python/tvm/__init__.py", line 26, in <module>
from ._ffi.base import TVMError, __version__, _RUNTIME_ONLY
File "/src/tvm/python/tvm/_ffi/__init__.py", line 28, in <module>
from .base import register_error
File "/src/tvm/python/tvm/_ffi/base.py", line 71, in <module>
_LIB, _LIB_NAME = _load_lib()
File "/src/tvm/python/tvm/_ffi/base.py", line 57, in _load_lib
lib = ctypes.CDLL(lib_path[0], ctypes.RTLD_GLOBAL)
File "/home/wzy/anaconda3/envs/tvm/lib/python3.7/ctypes/__init__.py", line 364, in __init__
self._handle = _dlopen(self._name, mode)
OSError: /home/wzy/anaconda3/envs/tvm/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.29' not found (required by /src/tvm/build/libtvm.so)
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1372001733
Remove NetworkData stuff.The "minimal" test which compare two outputs between rocm and opencl as follow.
<details>
<summary>onnx_rocm.py</summary>
```
from pyexpat import model
import onnx
#from tvm.driver import tvmc
import numpy as np
import tvm
import tvm.relay as relay
from tvm.contrib import graph_executor
import tvm.testing
import numpy as np
dtype="float32"
common_prefix_str = "onnx-model/vision/classification/"
tol_paras = [1e-7,1e-6,1e-5,1e-4,1e-3,1e-2]
input_name = "Input3"
input_size = (1,1,28,28)
output_size = (1,10)
import logging
logging.basicConfig(level=logging.ERROR)
import warnings
warnings.filterwarnings('ignore')
def build(target:str,mod:tvm.IRModule, params:dict, input_name:str, input_data:np.ndarray, input:tuple, output: tuple) -> np.ndarray:
tgt = tvm.target.Target(target=target, host="llvm")
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target=target, params=params)
dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))
module.set_input(input_name, input_data)
module.run()
output_shape = output
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()
return tvm_output
def main():
np.random.seed(0)
I_np = np.random.uniform(size = input_size).astype(dtype)
print(I_np[0][0][0][:10])
onnx_model = onnx.load("onnx-model/vision/classification/mnist/model/mnist-7.onnx")
mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape})
rocm_output = build("rocm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
opencl_output = build("opencl",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
print(rocm_output[0][:10])
print(opencl_output[0][:10])
main()
```
</details>
The result of output is as follow.
![image](https://user-images.githubusercontent.com/22990858/210751928-8590959c-731d-4480-9cd9-f91820d40126.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "masahi (via GitHub)" <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1495370643
see https://github.com/apache/tvm/issues/13666#issuecomment-1397176511
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1372042675
Okay reproduced.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1396406356
@mvermeulen I tried to use your image but got the following error
```
# rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: No such file or directory
Failed to get user name to check for video group membership
```
while the command works outside of docker.
Also, to use rocm 5 with TVM we need the bitcode file change in https://github.com/masahi/tvm/commit/26d2701b7823ab4d93b8d980bc8689e9c03b2ee1 (which is not merged to `main`). Is this a correct fix, and have you already integrated this change for rocm 5.4 testing?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1396327672
ok, I will try to run it later. Can you tell me about the docker tvm on code commit?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "wangzy0327 (via GitHub)" <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413100409
> Tried this with the following docker image I built from latest ROCm:
>
> ```
> docker pull mevermeulen/rocm-tvm:5.4.2
> ```
>
> I didn't have OpenCL built in that so I compared with CPU execution and I don't see an issue:
>
> ```
> root@chilecito:/src/rocm-tvm/qa# python3 /home/mev/onnx_rocm.py
> [0.5488135 0.71518934 0.60276335 0.5448832 0.4236548 0.6458941
> 0.4375872 0.891773 0.96366274 0.3834415 ]
> [-0.22859086 -0.25806987 -0.43340546 0.4846983 -0.6018106 0.22698797
> 0.85465795 -0.9607101 0.5279621 -1.1830723 ]
> [-0.22859041 -0.25806972 -0.43340546 0.4846975 -0.6018108 0.2269876
> 0.8546581 -0.9607104 0.527962 -1.1830723 ]
> ```
>
> To compare against the CPU, I modified the last part of the program as follows:
>
> ```
> np.random.seed(0)
> I_np = np.random.uniform(size = input_size).astype(dtype)
> print(I_np[0][0][0][:10])
> onnx_model = onnx.load("/home/mev/mnist-7.onnx")
> mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape})
> rocm_output = build("rocm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
> cpu_output = build("llvm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
> # opencl_output = build("opencl",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
> print(rocm_output[0][:10])
> print(cpu_output[0][:10])
> # print(opencl_output[0][:10])
> ```
>
> @wangzy0327 does my docker work for you? If so, a spot you can use for comparison.
>
> Also can you cross check that your ROCm installation and driver is properly installed. For example you can try:
>
> ```
> prompt% rocminfo
>
> prompt% cd /opt/rocm/share/hip/samples/0_Intro/square
> prompt% make
> prompt% cat square.out
> ```
@mvermeulen
I compare two platform `rocm` and `rocm -libs=miopen` on tvm v0.10.0 version to run the code as follow.
<details>
<summary>onnx_rocm.py</summary>
```
from pyexpat import model
import onnx
#from tvm.driver import tvmc
import numpy as np
import tvm
import tvm.relay as relay
from tvm.contrib import graph_executor
import tvm.testing
import numpy as np
dtype="float32"
common_prefix_str = "onnx-model/vision/classification/"
tol_paras = [1e-7,1e-6,1e-5,1e-4,1e-3,1e-2]
input_name = "Input3"
input_size = (1,1,28,28)
output_size = (1,10)
import logging
logging.basicConfig(level=logging.ERROR)
import warnings
warnings.filterwarnings('ignore')
def build(target:str,mod:tvm.IRModule, params:dict, input_name:str, input_data:np.ndarray, input:tuple, output: tuple) -> np.ndarray:
tgt = tvm.target.Target(target=target, host="llvm")
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target=target, params=params)
dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))
module.set_input(input_name, input_data)
module.run()
output_shape = output
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()
return tvm_output
def main():
np.random.seed(0)
I_np = np.random.uniform(size = input_size).astype(dtype)
print(I_np[0][0][0][:10])
onnx_model = onnx.load("onnx-model/vision/classification/mnist/model/mnist-7.onnx")
mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape})
rocm_lib_output = build("rocm -libs=miopen",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
rocm_output = build("rocm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
opencl_output = build("opencl",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
print(rocm_output[0][:10])
print(rocm_lib_output[0][:10])
print(opencl_output[0][:10])
main()
```
</details>
![image](https://user-images.githubusercontent.com/22990858/216224311-04f5fa56-f038-4015-b578-8d6ae8fc7bdd.png)
I get the error on AMD gfx908 device . The error is `ValueError:Cannot find global function tvm.contrib.miopen.conv2d.setup` .
How to fix it ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "masahi (via GitHub)" <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1404538202
Ok I don't know what is wrong with the original schedule, but using the other one fixes this issue https://github.com/apache/tvm/pull/13847
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1371979718
I said "minimal" test. Remove `NetworkData` stuff which is not even defined.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "wangzy0327 (via GitHub)" <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413131336
> Try building with `USE_ROCBLAS=ON` and use `-libs=rocblas` (not miopen)
And there is the result output that is not correct
![image](https://user-images.githubusercontent.com/22990858/216230610-0bba0c6b-0d9b-4e27-b811-cbd48f2ed6da.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "wangzy0327 (via GitHub)" <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413075362
@masahi Hi,I have tried to fix the code as #13847 and run the code on gfx908(MI100)
<summary>onnx_rocm.py</summary>
```
from pyexpat import model
import onnx
#from tvm.driver import tvmc
import numpy as np
import tvm
import tvm.relay as relay
from tvm.contrib import graph_executor
import tvm.testing
import numpy as np
dtype="float32"
common_prefix_str = "onnx-model/vision/classification/"
tol_paras = [1e-7,1e-6,1e-5,1e-4,1e-3,1e-2]
input_name = "Input3"
input_size = (1,1,28,28)
output_size = (1,10)
import logging
logging.basicConfig(level=logging.ERROR)
import warnings
warnings.filterwarnings('ignore')
def build(target:str,mod:tvm.IRModule, params:dict, input_name:str, input_data:np.ndarray, input:tuple, output: tuple) -> np.ndarray:
tgt = tvm.target.Target(target=target, host="llvm")
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target=target, params=params)
dev = tvm.device(str(target), 0)
module = graph_executor.GraphModule(lib["default"](dev))
module.set_input(input_name, input_data)
module.run()
output_shape = output
tvm_output = module.get_output(0, tvm.nd.empty(output_shape)).numpy()
return tvm_output
def main():
np.random.seed(0)
I_np = np.random.uniform(size = input_size).astype(dtype)
print(I_np[0][0][0][:10])
onnx_model = onnx.load("onnx-model/vision/classification/mnist/model/mnist-7.onnx")
mod,params = relay.frontend.from_onnx(onnx_model,{"Input3":I_np.shape})
rocm_output = build("rocm",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
opencl_output = build("opencl",mod = mod,params = params,input_name = input_name,input_data = I_np, input = I_np.shape, output = output_size)
print(rocm_output[0][:10])
print(opencl_output[0][:10])
main()
```
The result as follows:
![image](https://user-images.githubusercontent.com/22990858/216219137-f8b39e1b-b4ac-4e99-9b51-ec2a8b58e019.png)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "wangzy0327 (via GitHub)" <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413906080
> > I get the error on AMD gfx908 device . The error is ValueError:Cannot find global
> > function tvm.contrib.miopen.conv2d.setup .
> > How to fix it ?
>
> What is your setting for USE_MIOPEN configuration variable?
@mvermeulen
Set(USE_MIOPEN ON)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "masahi (via GitHub)" <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1404527150
Interesting! I can also confirm that `-libs=rocblas` works. So this is specific to `dense` op. It's also interesting to hear that gfx906 works fine.
I'll look into what's wrong with dense codegen for rocm.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1397176511
@masahi most likely you are missing arguments when starting up the docker container. Here is how I run it:
```
docker run -it --device=/dev/dri --device=/dev/kfd --network=host --group-add=render -v /home/mev:/home/mev mevermeulen/rocm-tvm:5.4.2 /bin/bash
```
The --device options make sure the GPU devices are also available inside the docker image. When this is done, /dev/kfd is created and has read/write permissions by the "render" group. On my system, I happened to run as root so it worked anyways but if I were somehow running as a non-root user (either inside or outside the docker), I would want to be part of the group to get permissions to the device files.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1397695640
@mvermeulen Thanks I got it working, but the result is the same as my non-docker execution: LLVM and rocm results are different.
```
LLVM output
[-0.22859041 -0.25806972 -0.43340546 0.4846975 -0.6018108 0.2269876
0.8546581 -0.9607104 0.527962 -1.1830723 ]
rocm output
[ 0.64347816 -1.4370097 -1.527026 1.1573262 -0.03408854 -1.1726259
0.8344087 -1.231696 0.9886506 -1.0510902 ]
```
Also, running https://github.com/apache/tvm/blob/main/gallery/how_to/compile_models/from_pytorch.py inside the container (after changing the target to rocm) I get
```
Relay top-1 id: 277, class name: red fox, Vulpes vulpes
Torch top-1 id: 281, class name: tabby, tabby cat
```
which is an incorrect result (the same as non-docker execution).
It would be interesting if the result depends on the GPU. Mine is RX 6600xt which is not officially supported.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] junrushao closed issue #13666: [Bug] rocm platform result are not correct
Posted by "junrushao (via GitHub)" <gi...@apache.org>.
junrushao closed issue #13666: [Bug] rocm platform result are not correct
URL: https://github.com/apache/tvm/issues/13666
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "masahi (via GitHub)" <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413128673
Try building with `USE_ROCBLAS=ON` and use `-libs=rocblas` (not miopen)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] masahi commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
masahi commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1370777170
Does https://github.com/apache/tvm/blob/main/apps/topi_recipe/gemm/cuda_gemm_square.py run successfully for you?
Also, please make your test case minimal. It is not obvious what it is doing at a quick glance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "mvermeulen (via GitHub)" <gi...@apache.org>.
mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1404483929
@masahi - some further characterization:
1. Using Radeon VII (gfx906): both the onnx_rocm.py and the from_pytorch.py work as expected. In particular, the relay/torch graphs both indicate id 281 and other results are as reported above.
2. Using RX 6900XT (gfx1030); I see similar failure to what you report above. However, if I change the target specification to be: ```target = tvm.target.Target("rocm -libs=rocblas", host="llvm")``` then it behaves the same as Radeon VII.
3. Using RX 6800m (gfx1031); then I need to set the environment variable ```HSA_OVERRIDE_GFX_VERSION=10.3.0``` and then using ```-libs=rocblas``` again causes things to pass.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] mvermeulen commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "mvermeulen (via GitHub)" <gi...@apache.org>.
mvermeulen commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1413902505
> I get the error on AMD gfx908 device . The error is ValueError:Cannot find global
> function tvm.contrib.miopen.conv2d.setup .
> How to fix it ?
What is your setting for USE_MIOPEN configuration variable?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by "wangzy0327 (via GitHub)" <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1495361255
@mvermeulen I execute ```rocminfo``` in docker rocm-tvm,but get error as below.
```
root@0e7c68384def:/opt/rocm/bin# ./rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: No such file or directory
Failed to get user name to check for video group membership
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [tvm] wangzy0327 commented on issue #13666: [Bug] rocm platform result are not correct
Posted by GitBox <gi...@apache.org>.
wangzy0327 commented on issue #13666:
URL: https://github.com/apache/tvm/issues/13666#issuecomment-1372045435
> Okay reproduced.
How to solve this problem?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org