You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by Tianqi Chen <no...@github.com.INVALID> on 2022/10/02 14:26:55 UTC

Re: [apache/tvm-rfcs] [RFC] CodeGenAArch64 backend with Scalable Vector Extension (SVE) (PR #94)

Thanks @ekalda . It is great to see us having conversations on bringing in SVE. The main question we want to resolve likely is going to be **what is the TIR spec goes into codegen that contains SVE info**.

Three alternatives have been discussed so far:

### A0: Loop with annotation but body as scalar

```python
  for (i: int32, 0, 20;i, annotation={"VLA"}) {
    C_2[i] = A_2[i] + B_2[i];
  }
```
### A1: Vectorized loop with constant vector factor 

```python
  for (i: int32, 0, 20; i) {
    C_2[ramp(i, 0, 5)] = A_2[ramp(i, 0, 5)] + B_2[ramp(i, 0, 5)];
  }
```

### A2: Vectorized loop with some form of TIR repr for sve vector

```python
  for (i: int32, 0, 20; i) {
    C_2[ramp(i, 0, vscale)] = A_2[ramp(i, 0, vscale)] + B_2[ramp(i, 0, vscale)];
  }
```

This would involve updates to the ramp note TIR. See ```kScalableVectorLaneMark``` comment in [previous discussion](https://github.com/apache/tvm-rfcs/pull/18)

## Discussion
The above three perspective are to setup the stage for discussion. This RFC proposes A1. 

Because it is a proposed change to codegen only, which does not change TIR. If A1 can be implemented correctly, then it think it is a positive step(close to S0 type change we had in other conversations) even if we want to do things in several stages(with follow up S1 changes).

The main question of  discussion is how can we implement A1 robustly.  

Since turning a specialized code into general one is a bit like raising (from special case to general ones). It would be good to add high-level description about the pattern match and conversation rules.  For some background, initially I thought that there might be some traps when the code contains some specializations to lane, but thinking a bit more I find my initial thought of counter example actually is fine under A1. So I am more convinced of this approach. 


Something around the following:

We would only turn SVE specialization if the code satisfies the following pattern

- Pattern match all ramped load/store `A[ramp(iter*lanes, 0, lanes)]` to ensure they have same lanes, change lane to VL with predication
- Change the outer loop iter to vector loop.
- If there is a vector/load that does not satisfy the pattern, we abort.











-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/94#issuecomment-1264656688
You are receiving this because you are subscribed to this thread.

Message ID: <ap...@github.com>