You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by jonso via TVM Discuss <no...@discuss.tvm.ai> on 2020/04/02 19:55:03 UTC
[TVM Discuss] [Questions] CUDA FP16 example
Is converting a model to FP16 with target = "cuda" supported? If so, is there an example pass I could look at to convert my model?
cc @vinx13 @Hzfengsy
Thanks!
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/1) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a132b2833affb55799113f74fe614d7c2986e58921f931e94e981125342fd0cc).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.
Got it. Is there any plan to do this in the future?
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/4) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/e185d70132e713ef7e2c6abffc6619e8a7f4b2ae111bc36214096050cbc55bf8).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.
Thanks a lot. I've been playing around with this on a BERT model, but I'm hitting some issues when calling `relay.build` with opt level 3. The target is `cuda`. The error message looks like this:
```
unresolved intrinsic sqrt with return type float16x4
```
It comes from `codegen_c.cc`. Does this mean that `sqrt` isn't supported with float16?
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/10) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a3dbe3977fc7fb7e23fbe5c5a77b4a665dbbebda947ffb7efee04091b585f37a).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by Josh Fromm via TVM Discuss <no...@discuss.tvm.ai>.
unfortunately i havent really tested the impact on accuracy. I've been using this pass primarily for perf measurements. I think it should be fairly easy to modify to improve the conversion logic though.
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/7) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/96bea922b97162826fd1975dbce660fe5a5b938aa241254a3e256900d508b2f6).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.
Awesome, thanks a lot @jwfromm! Do you have any experience in how this impacts accuracy? For example, I know that CUDA's `__float2half` function has a decent amount of logic.
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/6) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/274b96c958981a65a7f6fe093c7572338a1393c25b72b337dbd85f8a3ce43f2a).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by Josh Fromm via TVM Discuss <no...@discuss.tvm.ai>.
@jonso here's a little relay pass I've been using to downcast a model from FP32 to FP16. I don't think the target really matters as you would apply this pass before compiling to cuda.
```
from tvm.relay import transform as _transform
from tvm.ir import IRModule
from tvm.relay import cast
def downcast_fp16(func):
# pylint: disable=line-too-long
"""Downcast to fp16 mutator
Parameters
---------
graph: Function
The original graph.
Retruns
-------
The graph after dowmcasting to half-precision floating-point.
"""
# get_valid_counts and non_max_suppression does not support fp16 so we create a filter list for them
filter_list = ['vision.get_valid_counts', 'vision.non_max_suppression']
class DowncastMutator(ExprMutator):
"""Downcast to fp16 mutator"""
def visit_call(self, call):
dtype = 'float32' if call.op.name in filter_list else 'float16'
new_fn = self.visit(call.op)
# Collec the original dtypes
type_list = []
if call.op.name in filter_list:
# For nms
for arg in call.args:
if isinstance(arg, TupleGetItem) and isinstance(arg.tuple_value, Call):
tuple_types = arg.tuple_value.checked_type.fields
type_list.append(tuple_types[arg.index].dtype)
if call.op.name == 'vision.get_valid_counts':
tuple_types = call.checked_type.fields
for cur_type in tuple_types:
type_list.append(cur_type.dtype)
args = [self.visit(arg) for arg in call.args]
new_args = list()
arg_idx = 0
for arg in args:
if isinstance(arg, (Var, Constant)):
new_args.append(cast(arg, dtype=dtype))
else:
if call.op.name in filter_list:
if isinstance(arg, TupleGetItem) and type_list[arg_idx] == 'int32':
new_args.append(arg)
else:
new_args.append(cast(arg, dtype=dtype))
else:
new_args.append(arg)
arg_idx += 1
if call.op.name in filter_list and call.op.name != 'vision.get_valid_counts':
return cast(Call(new_fn, new_args, call.attrs), dtype='float16')
return Call(new_fn, new_args, call.attrs)
class UpcastMutator(ExprMutator):
"""upcast output back to fp32 mutator"""
def visit_call(self, call):
return cast(call, dtype='float32')
def infer_type(expr):
"""A method to infer the type of an intermediate node in the relay graph"""
mod = IRModule.from_expr(expr)
mod = _transform.InferType()(mod)
entry = mod["main"]
return entry if isinstance(expr, Function) else entry.body
func = infer_type(func)
downcast_pass = DowncastMutator()
func = downcast_pass.visit(func)
upcast_pass = UpcastMutator()
func = upcast_pass.visit(func)
func = infer_type(func)
return func
```
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/5) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/89c9ac154c4037eb6e1a13724943704f57ab75de345e7c9071b0b76eeaed080f).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by Josh Fromm via TVM Discuss <no...@discuss.tvm.ai>.
Assuming you're looking for the low level code, you can find the cuda cast generator in `tvm/src/target/source/ codegen_cuda` in the `VisitExpr(const CastNode* op, std::ostream& os)` function. However, you probably want to do the casting ahead of time in relay rather than on device. If you use a pass like the one I posted above, you convert the operations int he graph before they're compiled.
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/9) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/d2d53b5ee42c5b9ac9724b4601043ab627b6b4646d131ea882343ec40805a3cf).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by Wuwei Lin via TVM Discuss <no...@discuss.tvm.ai>.
also cc @anijain2305 @xyzhou
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/2) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a88a010db72df27b9f313d793cec89a84d02da7136edf2ff2d6cb5e679f76fb9).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by Siyuan Feng via TVM Discuss <no...@discuss.tvm.ai>.
Unfortunately, I'm afraid there isn't. We usually use native fp16 models directly.
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/3) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/189f70126c1e6b4c7fcae5ec0c261ef78862343afaeeb2c6e14d9aa41dcccf7b).
[TVM Discuss] [Questions] CUDA FP16 example
Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.
I am trying to find the actual implementation of `Cast` for each device, but am having trouble finding it. @jwfromm do you know where it is?
---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/8) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/06147c6b51d520bfed5343478cba86c54421419aca7ebe12909eb221eb6090f3).