You are viewing a plain text version of this content. The canonical link for it is here.

Posted to discuss-archive@tvm.apache.org by Junru Shao via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/05/06 18:12:56 UTC

[Apache TVM Discuss] [Questions] Do we have any way to process codegen with more fine grade control?


We have similar observation that LLVM is unable to produce what we exactly want when it comes to very low-level control (e.g. registers, pipeline depth, etc). A way to obtain fine-grained control is to embed TVM intrinsics that could be lowered to ASM.

BTW, if you would like to play around with TIR, you might be interested in the new round-trippable TVM script that @spectrometerHBH and @Hzfengsy developed (API: `tvm.script.asscript`, `tvm.script.tir`). We can actually print out the IR, manually manipulate it, then parse it back. It means that we don't need to be limited by those existing schedule primitives, but can control the TIR at any stage of those passes.





---
[Visit Topic](https://discuss.tvm.apache.org/t/do-we-have-any-way-to-process-codegen-with-more-fine-grade-control/9908/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/0c06106ab97cfd09d7602b665b5998bb170175168e61437f26f351bee4e3be8c).

[Apache TVM Discuss] [Questions] Do we have any way to process codegen with more fine grade control?

Posted by Zhao Wu via Apache TVM Discuss <no...@discuss.tvm.ai>.


Yeah, it is unfriendly for Ansor. However, I think it is not contradict. We could not expect we could generate asm like ACL, but we could expect we could achieve the same optimization. For example, your example is we can not do `register blocking` optimization easily, but we could expect we have done `FMA` optimization like ACL, so we generate `fmla` correctly too. For the CPU part, in my opinion, even we can not generate the same asm snippet, but we maybe could get the same level of performance if we could generate key instruction like `fmla`. If we can not, there must be one factor we ignore, maybe memory access unfriendly so that we have high rate of cache miss or what else.

back to ansor, we of course should improve our ansor's performance, however, for the most performance gemm micro part, I think the most practical way in the current time, is we should leverage micro gemm kernel (4x4/8x8) and let ansor or metaschedule to schedule other part (like tiling parameter / unroll / parallel or what else)





---
[Visit Topic](https://discuss.tvm.apache.org/t/do-we-have-any-way-to-process-codegen-with-more-fine-grade-control/9908/7) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/8df7cf6538086479a1f6a0a7c82725d74ed35fba9b1abb148392751b9fcb1448).

[Apache TVM Discuss] [Questions] Do we have any way to process codegen with more fine grade control?

Posted by Chenfan via Apache TVM Discuss <no...@discuss.tvm.ai>.


@junrushao1994 Yeah I see, but seems we're not yet able to lower & build a TIR module in the master branch now? :laughing:
(Maybe I can have a try on the tensorir private branch...)

@FrozenGene I agree.





---
[Visit Topic](https://discuss.tvm.apache.org/t/do-we-have-any-way-to-process-codegen-with-more-fine-grade-control/9908/5) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/f8412844b4dae491dd8db495ef04bbdc8d30ce431dfd9aae92d44cefaf947d16).