You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by Max Sponner via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/03/30 07:58:04 UTC

[Apache TVM Discuss] [Questions] Profile on Relay Level?


For a project, I want to train a number of models that can predict the execution time of a layer (from its relay description) on different hardware targets.

My current problem is, that I am unable to find a nice option to do this.
The Debug Runtime measures the execution time for the low level functions, which include fused layers and cannot be directly mapped to relay nodes.

I looked into the Auto-Scheduler, as Ansor also works on a subgraph level, but it seems like it it also using measuring individual TIR functions.

I would like to work with the Relay representation as it enables the targeting of BYOC backends, which might be more relevant for highly heterogeneous targets.





---
[Visit Topic](https://discuss.tvm.apache.org/t/profile-on-relay-level/9568/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/689521f453c02839490298528b6bff24860f82cb4ab180c1cfbcbb038ca22f7f).

[Apache TVM Discuss] [Questions] Profile on Relay Level?

Posted by "Cody H. Yu via Apache TVM Discuss" <no...@discuss.tvm.ai>.

Since Relay is a graph-level IR, its ops do not have the compute and schedule but just the input and output types, latency measurement has to happen at the TIR level. If you want to profile the latency of each op, you could turn off op fusion.

However, simply turn off fusion will result in errors, because TVM requires every op to be in a primitive function during lowering. The right way to turn off fusion is writing a simple Relay pass that puts every single op to a function. For example:

```
%1 = nn.conv2d(...)
%2 = nn.bias_add(%1, ...)
%3 = nn.relu(%2)
```

becomes

```
%1 = fn(..., Primitive=1) {
  nn.conv2d(...)
}
%2 = %1(...)
%3 = fn(..., Primitive=1) {
  nn.bias_add(...)
}
%4 = %3(%2, ...)
%5 = fn(..., Primitive=1) {
  nn.relu(...)
}
%6 = %5(...)
```

Then each function will contain a single op.

On the other hand, I personally don't recommend this profiling approach, because in the normal compilation flow op fusion would definitely happen. If you would like to know whether offloading some ops to your device could improve the end-to-end performance, you should compare the latency of a fused function vs. the latency of offloading this function to your device to get a fair conclusion.





---
[Visit Topic](https://discuss.tvm.apache.org/t/profile-on-relay-level/9568/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/accf77cf0188e585a87c60ad89b06296ab599c05f53556aba8e38ca7cb1ed6c2).

[Apache TVM Discuss] [Questions] Profile on Relay Level?

Posted by Max Sponner via Apache TVM Discuss <no...@discuss.tvm.ai>.

I am bit confused,
maybe I misunderstood your suggestion.

I am using the debug executor to measure the latency of the individual (fused) TIRfunctions,
but I cannot tell which function corresponds to which part of the original/optimized relay graph.
(example of TIR function name: fused_layout_transform_nn_batch_flatten)

So I am aware of the n:m mapping between Relay nodes and TIR functions, however, I would like
to keep information about filter sizes and which operations are fused in the TIR functions.
As the model to predict the performance needs additional information.





---
[Visit Topic](https://discuss.tvm.apache.org/t/profile-on-relay-level/9568/5) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/d10b066cd67568d1f570326694040424bea91f09f14c9858f0c03f2ff441656b).

[Apache TVM Discuss] [Questions] Profile on Relay Level?

Posted by "Cody H. Yu via Apache TVM Discuss" <no...@discuss.tvm.ai>.

The approach I suggested is the most straightforward one. Relay to TIR is not one-to-one mapping. A Relay node may be lowered to different TIR functions for different target and input shapes/dtype.





---
[Visit Topic](https://discuss.tvm.apache.org/t/profile-on-relay-level/9568/4) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/11f0dc68bb90cfb598b7f1e413cb36d2860c5b842d12800e4e82f2a8bfa2be4c).