You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by ziheng <no...@github.com> on 2019/11/14 01:24:55 UTC

[apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

A significant driver of progress in deep learning has been advances in computational resources. While those resources are often limited, the is a trend to replace dense computation in DNN with sparse computation for speeding up / saving memory to enable larger models. For example: [neural network pruning](https://github.com/he-y/Awesome-Pruning), [sparse transformer](https://openai.com/blog/sparse-transformer/). Also some new workloads like GNN relies on sparse support. It would be great if TVM can represent sparse computation workload.

There exists some sparse support in TVM already. Overall, it use the dense tensors to describe the sparse CSR/BSR tensors based on the existing Tensor DSL like here https://github.com/apache/incubator-tvm/blob/master/topi/python/topi/nn/sparse.py. However, this approach have some obvious drawbacks:
- it is quite tedious to describe the sparse computation, while you need to deal with the indexing manually.
- It does not provide proper abstractions for sparse kernel scheduling.

This RFC would like to discuss how to add native sparse support in TVM. 

## Sparse Workloads

Here are some sparse workloads that we would like to keep in mind and take into consideration during design.

- **Graph Neural Networks**: GNN is a type of Neural Network which directly operates on the graph structure, which has gained increasing popularity in various domains, including social network, knowledge graph, recommender system, and even life science. The graph data are often sparse, so that there exist urgent demand that optimizing sparse kernels for GNN workloads, like: sparse matrix-matrix multiplication (SPMM), Sampled dense-dense matrix product (SDDMM). segment_sum, segment_min, segment_max, segment_mm, etc.
- **Block Sparse**:  Even though sparse operations need less compute and memory relative to their dense counterparts, the speed-up observed by using sparse operations is less than expected on different hardware platforms. The block sparse representation (BSR) would be more friendly for hardwares and easier to be optimized. There also exist some works to induce block sparsity in RNNs/Transformer by pruning blocks of weights. 

From the above workloads, we can summary some requirements that our sparse support need to achieve:
- It should be able to represent *common sparse formats*: CSR, RSR, RSR, etc.
- Although most workloads are focused on 2D sparse matrics, but it would be better that if it can represent *multiple dimension tensor* so that fit with the original TVM Tensor abstraction.

After some investigation, we found that the tree hierarchy representation used by TACO and ExTensor is a good candidate.

## The Tree Hierarchy Representation

The tree hierachy representation can represent tensors of any order, by constructing formats from a bounded number of primitives, e.g., specifying whether each dimension is dense of sparse. (TACO also supports many other types like *range*, *hash*, etc. but we can expand it in the future depends on the demand.) With this approach, a CSR matrix can be represented as `SparseTensor([Dense, Sparse])`, RSR as `SparseTensor([Sparse, Dense])`, BSR as `SparseTensor([Dense, Dense, Sparse, Sparse])`.


We can found that a general/sparse tensor is actually composed by several dense arrays with the tree hierarchy representation:

- An array `A_val` is used to represent the non-zero elements of tensor A.
- For every dense axis: an integer `Ai_size` is used to represent the size of tensor A's i-th dimension.
- For every sparse axis: two index arrays, `Ai_pos` and `Ai_idx`, together form a segmented vector with one segment per entry in the previous dimension (parent node in the tree). The `Ai_idx` array stores all the non-zero indices in the dimension, while the `Ai_pos` array stores the location in the idx array where each segment begins.



### Understanding the Representation with Examples

Here we will show with a 2D case to understand how the sparse tensor is represented under different formats:
```
example tensor:
[
 a, 0, b, c,
 0, 0, 0, 0,
 d, 0, 0, e,
]
```
```
Format:
    [Dense, Dense]

Storage:
    axis 0
    A0_size = 3
    axis 1
    A1_size = 4
    values of A
    A_val = [a, 0, b, c, 0, 0, 0, 0, d, 0, 0, e]

Access:
    produce B {
      for (i, 0, m) {
        for (j, 0, n) {
          B[((i*n) + j)] = A[((i*n) + j)]
        }
      }
    }
```
```
Format:
    [Dense, Sparse]

Storage:
    axis 0
    A0_size = 3
    axis 1
    A1_pos = [0, 2, 2, 5]
    A1_idx = [0, 3, 0, 2, 3]
    values of A
    A_val = [a, b, c, d, e]

Access:
    for (i, 0, A0_size) {
      for (j, A1_pos[i], A1_pos[i+1]) {
        idx = {i, A1_idx[j]}
        val = A_vals[j];
      }
    }
```
```
Format:
    [Sparse, Dense]

Storage:
    axis 0
    A0_pos = [0, 2]
    A0_idx = [0, 2]
    axis 1
    A1_size = 4
    A_val = [a, 0, b, c, 0, 0, 0, 0, d, 0, 0, e]

Access:
    for (i, A0_pos[0], A0_pos[1]) {
      for (j, 0, A1_size) {
        idx = {A0_idx[i], j}
        val = A_vals[A0_idx[i] * A1_size + j];
      }
    }
```
```
Format:
    [Sparse, Sparse]

Storage:
    axis 0
    A0_pos = [0, 2]
    A0_idx = [0, 2]
    axis 1
    A1_pos = [0, 2, 5]
    A1_idx = [0, 3, 0, 2, 3]
    values of A
    A_val = [a, b, c, d, e]

Access:
    for (i, A0_pos[0], A0_pos[1]) {
      for (j, A1_pos[i], A1_pos[i+1]) {
        idx = {A0_idx[i], A1_idx[j]}
        val = A_vals[j];
      }
    }
```

## Implementation

### Format Declaration

A tuple-like data structure can be introduced to declare the format with the sparsity on dimensions: `SparseFormat([Dense, Sparse])`

### Sparse Tensor

As a counterpart of original dense `Tensor`, a `SparseTensor` class is a symbolic representation for sparse tensor, which is used during sparse code generation, composed by `pos_arrs`, `idx_arrs`, `val_arr`.

### DSL Enhancement

To enhance existed DSL with ability to declare sparse computation, here are some approaches we can try.

- Option 1: Adding New Sparse Operations

```
# demo code snippet
import tvm.sparse as tvmsp

# declare sparse format
in_sformat = tvmsp.sparse_format([Dense, Sparse])
# declare dense format
out_sformat = tvmsp.sparse_format([Dense, Dense])

# computation declaration
n = tvm.var("n")
m = tvm.var("m")
A = tvmsp.placeholder((m, n), sformat=in_sformat, name='A')
B = tvmsp.compute(A.shape, lambda i, j: A[i, j], sformat=out_sformat, name='B')
s = tvm.create_schedule(B.op)
ir = tvm.lower(s, [A, B], simple_mode=True)
```

This approach adds new operators like `SparsePlaceholder`, `SparseComputeOp`, with additional `sformat` field compared with origial operations.


- Option 2: Enhancing `decl_buffer`

```
import tvm.sparse as tvmsp

# declare sparse format
sformat = tvmsp.sparse_format([Dense, Sparse])
# declare dense format
sformat = tvmsp.decl_dense(3)

# computation declaration
n = tvm.var("n")
m = tvm.var("m")
A = tvm.placeholder((m, n), name='A')
B = tvm.compute(A.shape, lambda i, j: A[i, j], name='B')
s = tvm.create_schedule(B.op)

Ab = tvm.decl_buffer(A.shape, A.dtype, sformat=in_sformat, name="Ab")
Bb = tvm.decl_buffer(B.shape, B.dtype, sformat=out_sformat, name="Bb")

ir = tvm.lower(s, [A, B], binds={A: Ab, B: Bb}, simple_mode=True)
print(ir)

```

We can also enhances `decl_buffer` that let user can declare a sparse buffer and bind it with tensor while building. One thing is this approach separate sparse property with computation declaration. It sounds cool that we can represent dense and sparse computation with the same declaration, but it also means that we don't have such information until scheduling.


## Extension for Scheduling

TODO



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by Chien-Yu Lin <no...@github.com>.
Thank @ZihengJiang for bringing up the RFC.
One question. It seems like, in this RFC, the axis to store the data values is always the last axis.
Am I right?
And for the non-storing value axis, like the axis 0 in the example, will it have different formats to represent?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-554158427

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by Sheng Zha <no...@github.com>.
cc @eric-haibin-lin @sxjscience 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-554159846

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by Tianqi Chen <no...@github.com.INVALID>.
Closed #4332.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#event-3856174887

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by Haibin Lin <no...@github.com>.
Thanks for the proposal. In scipy sparse, the csr_matrix has a flag indicating whether the column indices are sorted per row. Such a state may affect the performance of certain operations on the matrix. Will this information be stored in the proposed format? 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-576098432

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by ziheng <no...@github.com>.
Welcome comment and discussion! @cylinbao @yuluny2 @tmoreau89 @Huyuwei @tqchen 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-553681206

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by Zhao Wu <no...@github.com>.
cc @sf-wind  

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-553691132

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by Thierry Moreau <no...@github.com>.
Those are all great questions @liangfu. 

Question 3 is an interesting one w.r.t to what kinds of scheduling primitives we'll need for sparse operators. One easy workaround is to apply vectorization along a dense tensor dimension if there is one. For many practical examples, tensors won't be sparse along all dimensions. But the question gets more tricky when that assumption does not hold.

And due to the dynamism of varying idx/val arrays, this can create an interesting discussion around how autoTVM will be used on more dynamic operators. Right now TOPI is built around the assumption that shapes are known statically. This will be an interesting implementation challenge, which probably deserves its own RFC. Perhaps @ZihengJiang the `Extension for Scheduling` can point to a follow-up RFC?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-557996282

Re: [apache/incubator-tvm] [RFC] Support for Sparse Computation (#4332)

Posted by Liangfu Chen <no...@github.com>.
Thanks @ZihengJiang for bringing up the RFC, especially the in-depth thinking to bring the representation in TACO.
I think we shall also address some detailed issues to deal with sparse tensors.

1. How shall we implement `SparsePlaceholder` with varying length in the `idx` and `val` variables?
2. As TVM current `ComputeOp` don't support computation upon varying length vectors, how do we bring this into `SparseComputeOp`?
3. Have you consider how to perform `Vectorize` and `Tensorize` operation upon sparse tensors? The lowering steps would be very much different with dense tensors, and we might need to maintain `masks` for performing `Vectorize`, along with varying length vectors like `indices` and `values`.
4. Do we support automatic sparsity regularization in this RFC, or just inference with existing sparse tensors? If the answer is we only support inference, how shall we import exiting sparse tensor in existing frameworks to demonstrate the capability?
5. Which layout shall we start with? NHWC or NCHW ?
6. I think we should preserve the `SparseTensor` operators to be easily quantized with existing code base, at least with small modifications.

Another challenge we also have to consider is that it's hard to introduce sparsity into depth-wise convolution operators, while depth-wise convolution is very useful in modern neural networks. It will be very challenging to work on **sparse** depth-wise convolution.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4332#issuecomment-557969198