You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by Pratik Fegade via TVM Discuss <no...@discuss.tvm.ai> on 2020/05/20 15:52:48 UTC
[TVM Discuss] [Application] Applying compute_at on scans
Hi all,
I have a simple computation as follows
```
m = 100
n = 256
X = te.placeholder((m, n), name="X")
s_state = te.placeholder((m, n))
s_init = te.placeholder((1, n))
s_update = te.compute((m, n), lambda t, i: s_state[t-1, i] + X[t, i], name = 'update')
s_scan = te.scan(s_init, s_update, s_state, inputs=[X], name = 'scan')
c_out = te.compute((n,), lambda i: s_scan[m - 1, i] * 17)
s = te.create_schedule(c_out.op)
s[s_scan].compute_at(s[c_out], s[c_out].op.axis[0])
s[s_scan].set_scope("local")
s[c_out].bind(s[c_out].op.axis[0], te.thread_axis('threadIdx.x'))
print(tvm.lower(s, [X, c_out], simple_mode=True))
```
This generates the following IR on lowering
```
primfn(X_1: handle, compute_1: handle) -> ()
attr = {"tir.noalias": True, "global_symbol": "main"}
buffers = {X: Buffer(X_2: handle, float32, [100, 256], []),
compute: Buffer(compute_2: handle, float32, [256], [])}
buffer_map = {compute_1: compute, X_1: X} {
attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 256;
attr [scan: handle] "storage_scope" = "local";
allocate(scan, float32, [100]) {
for (scan.idx: int32, 0, 99) {
for (i: int32, 0, 256) {
scan[(((scan.idx + i) + 1) - threadIdx.x)] = ((float32*)scan[((scan.idx + i) - threadIdx.x)]) + (float32*)X_2[(((scan.idx*256) + i) + 256)]))
}
}
compute_2[threadIdx.x] = ((float32*)scan[99])*17f32)
}
}
```
I don't understand why the inner i-loop in the scan operation is generated. This seemingly results in redundant recomputations. In this case, shouldn't i be bound to threadIdx.x? If this is the intended bahvior of compute_at, is there another way to generate IR similar to the following for the same computation?
```
primfn(X_1: handle, compute_1: handle) -> ()
attr = {"tir.noalias": True, "global_symbol": "main"}
buffers = {X: Buffer(X_2: handle, float32, [100, 256], []),
compute: Buffer(compute_2: handle, float32, [256], [])}
buffer_map = {compute_1: compute, X_1: X} {
attr [IterVar(threadIdx.x: int32, (nullptr), "ThreadIndex", "threadIdx.x")] "thread_extent" = 256;
attr [scan: handle] "storage_scope" = "local";
allocate(scan, float32, [1]) {
for (scan.idx: int32, 0, 99) {
scan[(scan.idx + 1)] = ((float32*)scan[scan.idx]) + (float32*)X_2[((scan.idx*256) + threadIdx.x)]))
}
compute_2[threadIdx.x] = ((float32*)scan[99])*17f32)
}
}
```
Thanks!
---
[Visit Topic](https://discuss.tvm.ai/t/applying-compute-at-on-scans/6745/1) to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click here](https://discuss.tvm.ai/email/unsubscribe/12aaf468d62ed279b20709ad79e5e09bfaafbedc5e6f6b10d74baa24cf72f62e).