You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by Cijie Xia via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/05/28 09:27:26 UTC

[Apache TVM Discuss] [Questions] Combining Separate tir.Block using compute_at()


Hello,

I notice that the plan to support `compute_at()` on Tir is on this issue [https://github.com/apache/tvm/issues/7527#](https://Issue) 

The `compute_at()` API in TVM schedules primitives enables the combination of separate stages of computation when possible. This operation would bring performance gain or other potential optimization opportunities. For example, two loops with the exact same looping range can be combined into one loop by `compute_at()`.

```
A = te.placeholder((m,), name="A")
B = te.compute((m,), lambda i: A[i] + 1, name="B")
C = te.compute((m,), lambda i: B[i] * 2, name="C")

s = te.create_schedule(C.op)
s[B].compute_at(s[C], C.op.axis[0])
```

Similarly, when defining computations using tir.Block, we notice that there are situations where combining Blocks are beneficial. For example, 

```
for i, j in tir.grid(128, 128):
     with tir.block([128, 128], "A_Block") as [vi, vj]:
         A[vi, vj] = tir.float32(0)
for i, j in tir.grid(128, 128):
     with tir.block([128, 128], "B_Block") as [vi, vj]:
         B[vi, vj] = A[vi, vj]
```

There are two separate blocks in the above example. We would like to know if `compute_at()` will support combining two blocks into one block in the future? If so, what are the conditions on those two blocks needed to be satisfied to enable ` compute_at()` performing the combination?





---
[Visit Topic](https://discuss.tvm.apache.org/t/combining-separate-tir-block-using-compute-at/10113/1) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/4d9d82da67ee3b42096374e791cafa441e76820ec6653341ed8049bdfc41dff8).

[Apache TVM Discuss] [Questions] Combining Separate tir.Block using compute_at()

Posted by Junru Shao via Apache TVM Discuss <no...@discuss.tvm.ai>.

Please let me know if it answers your question. Happy to assist further if you are interested





---
[Visit Topic](https://discuss.tvm.apache.org/t/combining-separate-tir-block-using-compute-at/10113/12) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/73b1f566b7066b0b514c61f5d89dbf4c53305f2910b00b734c897923cccb0959).

[Apache TVM Discuss] [Questions] Combining Separate tir.Block using compute_at()

Posted by Junru Shao via Apache TVM Discuss <no...@discuss.tvm.ai>.

Thanks for asking!

To be clear, it is not necessary to break our block isolation when using compute-at. For example, after compute-at, the IR may become:

```python
for i in tir.range(0, 128):
    for j in tir.range(0, 128):
        with tir.block([128, 128], "A_Block") as [vi, vj]:
            A[vi, vj] = tir.float32(0)
        with tir.block([128, 128], "B_Block") as [vi, vj]:
            B[vi, vj] = A[vi, vj]

```





---
[Visit Topic](https://discuss.tvm.apache.org/t/combining-separate-tir-block-using-compute-at/10113/2) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/6e6384e2e962a02a7a7233ff912983a8d20c6d5c0e92c346bb628a7486a996e4).

[Apache TVM Discuss] [Questions] Combining Separate tir.Block using compute_at()

Posted by Junru Shao via Apache TVM Discuss <no...@discuss.tvm.ai>.

Thanks @xiacijie for the reply!

1. Yes, I got what you mean and definitely agree with the definition.
2. No. We have a primitive called "blockize" that does the opposite (not exactly, but it creates more blocks), and have thought of such a primitive. Development should be fairly simple (~200 lines in core implementation), and we are more than happy to assist if you want :-)
3.  To be clear, merging blocks is a transformation that itself doesn't bring performance gain: `Block` in TensorIR is a construct that creates conceptual isolation, but lowers to nothing - merging blocks or not, it doesn't affect generated code.

The reason that it is not developed is that we haven't found a real-world scenario yet where this primitive is useful, and I definitely appreciate it a lot if you could bring up with an example :+1:





---
[Visit Topic](https://discuss.tvm.apache.org/t/combining-separate-tir-block-using-compute-at/10113/6) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/b9a06ee38fef2b4549adcfee746ebbf075cf0cc938a6ba2fb7534e6a702d6986).

[Apache TVM Discuss] [Questions] Combining Separate tir.Block using compute_at()

Posted by Junru Shao via Apache TVM Discuss <no...@discuss.tvm.ai>.

Thanks @xiacijie for the question!

```python
# Snippet-1:
for i, j in tir.grid(128, 128):
    with tir.block([128, 128], "init") as [vi, vj]:
        C[vi, vj] = 0.0

# Snippet-2:
with tir.block([128, 128], "init") as [vi, vj]:
    C[vi, vj] = 0.0
```

Yes, the two snippets above are strictly equivalent, and the second one is the syntactic sugar for the first. TVM script has a built-in functionality called "auto-completion" that desugars snippets.

To get a sense what the fully desugared IR looks like, you may print it out with the following command in python:

```python
print(tvm.script.asscript(PrimFunc-or-IRModule))
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/combining-separate-tir-block-using-compute-at/10113/10) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/928d22fbbccc01d91a9bcfefaf05771787af05ebe7b44e5e2ab89c8e5573fefd).

[Apache TVM Discuss] [Questions] Combining Separate tir.Block using compute_at()

Posted by Cijie Xia via Apache TVM Discuss <no...@discuss.tvm.ai>.

Let me clarify some more:
1. Two blocks do not have to be adjacent to be merged though in this example they are.
2. I would like to know if block merging is an already planned feature to be added in the future instead of knowing the difficulty of implementing this.
3. I believe it is a useful optimization (possibly reduce the computation) when we have two separate blocks and combine them into one when possible.





---
[Visit Topic](https://discuss.tvm.apache.org/t/combining-separate-tir-block-using-compute-at/10113/5) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/d533395efce5469e14ec5d7c8ca76e807ea0f12b9352a979d1519b3839edc586).

[Apache TVM Discuss] [Questions] Combining Separate tir.Block using compute_at()

Posted by Junru Shao via Apache TVM Discuss <no...@discuss.tvm.ai>.

[quote="xiacijie, post:9, topic:10113"]
I am confused when we already declare a 128 * 128 loop outer, we still have to put `[128, 128]` as a parameter when using `tir.block`
[/quote]

The syntax below describes the signature of a block:

```python
with tir.block([128, 128], "init") as [vi, vj]:
```

TensorIR is designed with the "block isolation" philosophy, and a block here describes a chunk of computation without needing context. When desugared, your particular example above expands to:

```python
for i in range(0, 128):
  for j in range(0, 128):
    with tir.block([128, 128], "init") as [vi, vj]:
      # vi's domain is [0, 128), and it is data-parallel
      # vj's domain is [0, 128), and it is data-parallel
      tir.bind(vi, i)  # binds `i` to `vi`
      tir.bind(vi, j)  # binds `j` to `vj`
      tir.reads([])  # reads nothing
      tir.writes(C[vi : vi + 1, vj : vj + 1]) 
      C[vi, vj] = 0
```

The property of the block is that:
* Instances of block execution are described with pair `(vi, vj)`, where `vi, vj \in [0, 128)`.
*  For a certain instance of a block `(vi, vj)`, it doesn't read anything, and writes to a buffer region `C[vi : vi + 1, vj : vj + 1]`
* `vi, vj` are both data parallel, which means block instances `(vi, vj)` can be executed in arbitrary orders or in parallel

Block bindings (`tir.bind`) describe how those loops "drags" the block execution. It is possible that we execute in another order:

```python
for i in range(0, 128):
  for j in range(0, 128):
    with tir.block([128, 128], "init") as [vi, vj]:
      tir.bind(vi, 127 - i)  # binds `127 - i` to `vi`
      tir.bind(vi, 127 - j)  # binds `127 - j` to `vj`
```

In short, in TensorIR, we decouple "in which order loop runs" and "the computation in the block body". Therefore, over-complete information may occur (as you described) when the binding is trivial, and we provide syntactic sugars for this case.





---
[Visit Topic](https://discuss.tvm.apache.org/t/combining-separate-tir-block-using-compute-at/10113/11) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/52f01f94c94b2908e2847278951e45a9464a2393fdd3cac14a8d9b38c5609d6d).