You are viewing a plain text version of this content. The canonical link for it is here.

Posted to discuss-archive@tvm.apache.org by adb via TVM Discuss <no...@discuss.tvm.ai> on 2020/04/02 21:47:13 UTC

[TVM Discuss] [Questions] Heterogeneous RNNs & Quantization Accuracy (an interesting case)


While post-training quantization from float32 to int8 hidden/cell states remains an open research topic, one work around we've found is to compute hidden states at higher precision on CPU rather than on the low-precision accelerator in order to reach our accuracy requirements. 

From a frontend perspective, this might mean that when an RNN op is encountered and decomposed into smaller relay ops, the hidden and cell states are annotated to fall back to CPU. 

Likewise, this could also be a Relay pass that annotates all parts of an RNN to be offloaded to the accelerator aside from the hidden state computations which will remain on CPU. It isn't obvious to me if this can be achieved with current passes, could someone with more Relay experience give their thoughts?





---
[Visit Topic](https://discuss.tvm.ai/t/heterogeneous-rnns-quantization-accuracy-an-interesting-case/6191/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/a881516a367485eba8ebf4046a19b06aec4a4e21917b0df4291688b872f7ed62).