You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/09/16 22:25:41 UTC

[GitHub] [tvm-rfcs] junrushao1994 edited a comment on pull request #22: [RFC][TIR] TIR Non-scalar Constants

junrushao1994 edited a comment on pull request #22:
URL: https://github.com/apache/tvm-rfcs/pull/22#issuecomment-921297958

Thanks @manupa-arm for the reply! These are all good points 👍

Again, I am supportive of this change, and the reason I am asking so many questions is that it is a change to the generic IR, which is supposed to be the most stable part of the system. As a generic concept, "constant" is supposed to be something people will assume to work generically in the future. Therefore, I wanted to be serious and make sure it does work generically.

On A). As the primary author in TensorIR scheduling, I feel like reading/writing attributes seem to be expected behavior by design of our unified IR infrastructure, I would love to learn more about why these are issues here.

On B). Thanks for supporting this in TVM script! I definitely see a path that we can complete this feature. The reason I am asking about "parsing tensors" is that tensors can be large and not sure they could fit in python's AST (too slow or something). In Relay, the workaround is to treat tensors as metadata, so that it could be parsed in a separate file with a much faster parser.

Again, would you to hear about your opinions about what the best design is to parse tensors in TVM script, which handles:
* multi-dimensional tensors with different dtypes
* large tensors
* numerical precision issues
* do we want to support parsing from a different file?

On C). I think we are on the same page, but to clarify, `cache_read` on GPU isn't able to copy CPU data to GPU. To move data correctly across devices, we need to call TVM's C API `TVMArrayCopyFromTo` as I suggested. To summarize, there is some way where we can get `tir.allocate_constant` to support all devices, on CPU it's pretty easy, and on GPUs, we need to link those parameters in host-side PrimFunc and call TVM's C API to put them on device accordingly.

On D). Thanks for clarification! I am definitely not an expert on these SoCs. I think it's cool as long as we have control in compilation time 😄

On E). Yeah I think we are on the same page.

To summarize the discussion, I think we have consensus on most of the points, but the final sticky point here is about the unified IR infrastructure. Currently Relay has support for constant NDArrays but TIR doesn't, and I would hope to have a unified representation of such constants, because divergence is always less desirable (and TVM developers did pay the price in divergence before). It would be nice if all of us could think of this representation, as well as text format, serialization of these constants.

Here is my proposal:
- Store constants in IRModule attributes for both Relay and TIR
- Refer to constants in the IRModule in `tir.allocate_constant` as well as `relay.constant`
- Serialization infra supports dumping/loading constants inline or in a separate file
- On GPU and other devices, insert TVM's C API call to copy from host-side constant arrays to CUDA's global memory.

Note that I am not indicating that we have to get everything implemented initially, but wanted to lay out a path to make this feature complete and work in a generic way.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org