You are viewing a plain text version of this content. The canonical link for it is here.

Posted to discuss-archive@tvm.apache.org by jonso via TVM Discuss <no...@discuss.tvm.ai> on 2020/04/02 19:55:03 UTC

[TVM Discuss] [Questions] CUDA FP16 example


Is converting a model to FP16 with target = "cuda" supported? If so, is there an example pass I could look at to convert my model?

cc @vinx13 @Hzfengsy  

Thanks!





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a132b2833affb55799113f74fe614d7c2986e58921f931e94e981125342fd0cc).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.


Got it. Is there any plan to do this in the future?





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/4) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/e185d70132e713ef7e2c6abffc6619e8a7f4b2ae111bc36214096050cbc55bf8).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.


Thanks a lot. I've been playing around with this on a BERT model, but I'm hitting some issues when calling `relay.build` with opt level 3. The target is `cuda`. The error message looks like this:

```
unresolved intrinsic sqrt with return type float16x4
```

It comes from `codegen_c.cc`. Does this mean that `sqrt` isn't supported with float16?





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/10) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a3dbe3977fc7fb7e23fbe5c5a77b4a665dbbebda947ffb7efee04091b585f37a).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by Josh Fromm via TVM Discuss <no...@discuss.tvm.ai>.


unfortunately i havent really tested the impact on accuracy. I've been using this pass primarily for perf measurements. I think it should be fairly easy to modify to improve the conversion logic though.





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/7) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/96bea922b97162826fd1975dbce660fe5a5b938aa241254a3e256900d508b2f6).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.


Awesome, thanks a lot @jwfromm! Do you have any experience in how this impacts accuracy? For example, I know that CUDA's `__float2half` function has a decent amount of logic.





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/6) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/274b96c958981a65a7f6fe093c7572338a1393c25b72b337dbd85f8a3ce43f2a).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by Josh Fromm via TVM Discuss <no...@discuss.tvm.ai>.


@jonso here's a little relay pass I've been using to downcast a model from FP32 to FP16. I don't think the target really matters as you would apply this pass before compiling to cuda.

```
from tvm.relay import transform as _transform
from tvm.ir import IRModule
from tvm.relay import cast

def downcast_fp16(func):
    # pylint: disable=line-too-long
    """Downcast to fp16 mutator
    Parameters
    ---------
    graph: Function
        The original graph.
    Retruns
    -------
    The graph after dowmcasting to half-precision floating-point.
    """
    # get_valid_counts and non_max_suppression does not support fp16 so we create a filter list for them
    filter_list = ['vision.get_valid_counts', 'vision.non_max_suppression']
    class DowncastMutator(ExprMutator):
        """Downcast to fp16 mutator"""
        def visit_call(self, call):
            dtype = 'float32' if call.op.name in filter_list else 'float16'
            new_fn = self.visit(call.op)
            # Collec the original dtypes
            type_list = []
            if call.op.name in filter_list:
                # For nms
                for arg in call.args:
                    if isinstance(arg, TupleGetItem) and isinstance(arg.tuple_value, Call):
                        tuple_types = arg.tuple_value.checked_type.fields
                        type_list.append(tuple_types[arg.index].dtype)
                if call.op.name == 'vision.get_valid_counts':
                    tuple_types = call.checked_type.fields
                    for cur_type in tuple_types:
                        type_list.append(cur_type.dtype)

            args = [self.visit(arg) for arg in call.args]
            new_args = list()
            arg_idx = 0
            for arg in args:
                if isinstance(arg, (Var, Constant)):
                    new_args.append(cast(arg, dtype=dtype))
                else:
                    if call.op.name in filter_list:
                        if isinstance(arg, TupleGetItem) and type_list[arg_idx] == 'int32':
                            new_args.append(arg)
                        else:
                            new_args.append(cast(arg, dtype=dtype))
                    else:
                        new_args.append(arg)
                arg_idx += 1
            if call.op.name in filter_list and call.op.name != 'vision.get_valid_counts':
                return cast(Call(new_fn, new_args, call.attrs), dtype='float16')
            return Call(new_fn, new_args, call.attrs)

    class UpcastMutator(ExprMutator):
        """upcast output back to fp32 mutator"""
        def visit_call(self, call):
            return cast(call, dtype='float32')

    def infer_type(expr):
        """A method to infer the type of an intermediate node in the relay graph"""
        mod = IRModule.from_expr(expr)
        mod = _transform.InferType()(mod)
        entry = mod["main"]
        return entry if isinstance(expr, Function) else entry.body

    func = infer_type(func)
    downcast_pass = DowncastMutator()
    func = downcast_pass.visit(func)
    upcast_pass = UpcastMutator()
    func = upcast_pass.visit(func)
    func = infer_type(func)
    return func
```





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/5) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/89c9ac154c4037eb6e1a13724943704f57ab75de345e7c9071b0b76eeaed080f).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by Josh Fromm via TVM Discuss <no...@discuss.tvm.ai>.


Assuming you're looking for the low level code, you can find the cuda cast generator in `tvm/src/target/source/ codegen_cuda` in the `VisitExpr(const CastNode* op, std::ostream& os)` function. However, you probably want to do the casting ahead of time in relay rather than on device. If you use a pass like the one I posted above, you convert the operations int he graph before they're compiled.





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/9) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/d2d53b5ee42c5b9ac9724b4601043ab627b6b4646d131ea882343ec40805a3cf).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by Wuwei Lin via TVM Discuss <no...@discuss.tvm.ai>.


also cc @anijain2305 @xyzhou





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a88a010db72df27b9f313d793cec89a84d02da7136edf2ff2d6cb5e679f76fb9).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by Siyuan Feng via TVM Discuss <no...@discuss.tvm.ai>.


Unfortunately, I'm afraid there isn't. We usually use native fp16 models directly.





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/189f70126c1e6b4c7fcae5ec0c261ef78862343afaeeb2c6e14d9aa41dcccf7b).

[TVM Discuss] [Questions] CUDA FP16 example

Posted by jonso via TVM Discuss <no...@discuss.tvm.ai>.


I am trying to find the actual implementation of `Cast` for each device, but am having trouble finding it. @jwfromm do you know where it is?





---
[Visit Topic](https://discuss.tvm.ai/t/cuda-fp16-example/6190/8) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/06147c6b51d520bfed5343478cba86c54421419aca7ebe12909eb221eb6090f3).