You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/11/09 23:41:28 UTC

[GitHub] [tvm-rfcs] Lyken17 commented on pull request #89: [RFC] Relax Upstreaming

Lyken17 commented on PR #89:
URL: https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1309546688

I learn a lot from reading through the thread, and find most people here are from a system background: either doing related research in schools or heading an engineering team in companies. I would like to share some of my thoughts from a different perspective, as a **TVM user** and **ML algorithm developer**.

I am a graduate student at MIT and studying efficient deep learning algorithms and co-designs (details in [my page](http://lzhu.me/), [lab site](https://tinyml.mit.edu/) and [our recent project that trains NN on a 256kB MCU](https://tinytraining.mit.edu/)). We have been honest TVM users because of its flexibility, high performance and open-source. But, when we want to dive deeper and make some customizations, things are becoming complex and relay is no longer friendly

* **Unnecessary long call stack between python and cpp**: Take `relay.build` as an example, a relay graph (in python) first does shape check (in cpp), then calls to wrapper (python), later feeds into TensorExpression (either in python or cpp), and then feed into VM for compilation (packed functions). ANY step in the middle can raise errors and developers can easily get lost in the pipeline. Actually you can find a lot of users reporting similar issues on the forum and only very few of them can fortunately get an answer from experienced developers.
* **Difficult to add a new operator because of complex pipeline**: In our research, and also many other users development, adding new operators is a common request. But in current relay, even if we just want to add a simple Identity operator (y = x), we need to
1. declare an attribute node.
2. write type relation check in CPP.
3. register OP in CPP.
4. describe the compute.
5. describe the schedule.
6. wrap up with CPP.
7. wrap up with python.
Seven steps just to define an identity function? Seriously? In PyTorch it won't cost more than 20 lines. This significantly slows the growth of TVM community and if you check the [PR history](https://github.com/apache/tvm/commits/main/python/tvm/relay/op), the numbers of new operators and new contributors are quite limited this year, while PyTorch receives new operator implementations from the community every day.
* **Missing capability to call third-party implementations**: Relay syntax does not, at least not easily, support users from call 3rd party backend like CuDNN, OpenVino, TensorRT. For the cloud, CuDNN and TensorRT are still SoTA for most benchmarks and without simple integration means inferior performance, which will make fewer people choose TVM. For the edge, the situation is even more serious because of hardware diversity. Take Qualcomm DSP as an example: even though the TVM hexagon support is in progress, but the best solution is still those manually written kernels in [SNPE](https://developer.qualcomm.com/sites/default/files/docs/snpe/overview.html). It is not trivial to call other backends in current relay: BYOC is difficult to use and register custom operators can be quite complex as discussed in last point.

I understand those who want the backward compatibility so existing projects are not broken. But we cannot build a ship of Theseus in the real world and the above issues cannot be easily "improved" with current relay. If TVM do not embrace new designs and improve its user-friendliness, then, eventually developers will switch to other tools and this is indeed happening:
* [Oneflow uses MLIR to rewrite their compiler pass](https://github.com/Oneflow-Inc/diffusers/wiki/How-to-Run-OneFlow-Stable-Diffusion) to accelerate diffusion models by 4x compared with pytorch and 1.6x compared with TensorRT.
* [Megvii adapts MLIR to minimize runtime build](https://github.com/MegEngine/MegCC) to generate YoloX binary with just 95kB.
* [PyTorch proposes TorchDynamo to speedup training](https://github.com/pytorch/torchdynamo/) and achieves average 1.34x speedup over previous NVFuser.
* ...

I like the TVM project and hope the community can be always active. TVM has a huge user base of researchers and Relax can allow them to easily contribute their code and idea to the repo, instead of tricky hacking and creating separate repos for each project. This is important for an open-source community -- just recall how mxnet loses its market and why PyTorch can beat TensorFlow even released one year later. TVM should consider Relax's upstreaming given its more thoughtful and user-friendly design, well-written documentation/tutorials, and S0,1,2 painless upgrading.

I would like to discuss more if there is any comments and questions.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org