You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by "MingkangW (via GitHub)" <gi...@apache.org> on 2024/02/26 05:35:22 UTC
[I] [Bug] int8 slower than float32 on GPU [tvm]
MingkangW opened a new issue, #16643:
URL: https://github.com/apache/tvm/issues/16643
### Expected behavior
The paper on QNN mentions that TVM can accelerate INT8 models on the GPU.
### Actual behavior
but my tests didn't yield corresponding results.
### Environment
TVM Version: Unity
GPU: 3090
OS: Ubuntu
### Steps to reproduce
I am using the following script to run convolution with INT8 and float32.
```
import tvm
from tvm import relay
import numpy as np
from tvm.contrib import graph_executor
##int8
data = relay.var(shape=(1, 3, 2048, 2048), name_hint="data", dtype="int8")
weight = relay.const(value=((np.random.uniform(size=(64, 3, 3, 3)) * 127).astype("int8")), dtype="int8")
conv = relay.nn.conv2d(data, weight, channels=64, kernel_size=(3, 3), out_dtype="int32")
f = relay.function.Function([data, ], conv)
mod = tvm.IRModule.from_expr(f)
mod = relay.transform.InferType()(mod)
print(mod)
target = tvm.target.cuda(arch="sm_86")
lib = relay.build(mod, target=target)
dev = tvm.cuda()
runtime = graph_executor.GraphModule(lib['default'](dev))
res = runtime.benchmark(dev)
print(res.mean)
#float32
data = relay.var(shape=(1, 3, 2048, 2048), name_hint="data", dtype="float32")
weight = relay.const(value=((np.random.uniform(size=(64, 3, 3, 3)) * 127)), dtype="float32")
conv = relay.nn.conv2d(data, weight, channels=64, kernel_size=(3, 3), out_dtype="float32")
f = relay.function.Function([data, ], conv)
mod = tvm.IRModule.from_expr(f)
mod = relay.transform.InferType()(mod)
print(mod)
target = tvm.target.cuda(arch="sm_86")
lib = relay.build(mod, target=target)
dev = tvm.cuda()
runtime = graph_executor.GraphModule(lib['default'](dev))
res = runtime.benchmark(dev)
print(res.mean)
```
and the output is
```
def @main(%data: Tensor[(1, 3, 2048, 2048), int8] /* ty=Tensor[(1, 3, 2048, 2048), int8] */) -> Tensor[(1, 64, 2046, 2046), int32] {
nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), int8] */, padding=[0, 0, 0, 0], channels=64, kernel_size=[3, 3], out_dtype="int32") /* ty=Tensor[(1, 64, 2046, 2046), int32] */
}
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
0.23173750480000002
def @main(%data: Tensor[(1, 3, 2048, 2048), float32] /* ty=Tensor[(1, 3, 2048, 2048), float32] */) -> Tensor[(1, 64, 2046, 2046), float32] {
nn.conv2d(%data, meta[relay.Constant][0] /* ty=Tensor[(64, 3, 3, 3), float32] */, padding=[0, 0, 0, 0], channels=64, kernel_size=[3, 3], out_dtype="float32") /* ty=Tensor[(1, 64, 2046, 2046), float32] */
}
0.00252002296
```
For INT8, it takes 0.23 seconds, while for float32, it takes 0.0025 seconds.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org