You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by Yulun Yao <no...@github.com> on 2019/08/08 03:42:31 UTC

[dmlc/tvm] [Relay][RFC] Relay support for Sparse Tensor (#3731)

### **** Motivation
With new sparse operators being added to TVM on a weekly basis, we would like to propose a change that would address the lack of expressiveness in sparse tensors of the current relay type system. 

Currently, we represent a CSR matrix with `namedtuple("CSR", ["data", "indices", "index pointer"])`. The problem is that this representation is not only unnatural but also missing shape information of the matrix, as only the first dimension can be inferred from the above tuple.

### **** Proposed Design
A `TensorTypeNode` has two fields: `shape` and `dtype`. We would like to add a new field called `format` to represent the format of Tensor. The `format` here represents the sparse format the tensor is using, and it works like `dtype`, so it's effectively just a tag here. We don't have to explicitly handle or define each data format. Instead, we pass the `format` to tvm codegen and let codegen handle it. This would help us unify expression of sparse model/tensors at relay level and would allow us to do shape inference easily.

### **** Note
I am trying to push the TVM support on Graph Neural Networks and other sparse models. This is a huge design decision, but I don't have much background in Programming Languages, so you might just ignore my proposal and propose your design. We would very much appraciate community's idea or feedback.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3731

Re: [dmlc/tvm] [Relay][RFC] Relay support for Sparse Tensor (#3731)

Posted by 雾雨魔理沙 <no...@github.com>.
It look good to me.
Another design decision will be to not have any change, but move this field to the scheudling language, and allow autotvm to search for best schedule.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3731#issuecomment-521108043

Re: [dmlc/tvm] [Relay][RFC] Relay support for Sparse Tensor (#3731)

Posted by Zihao Ye <no...@github.com>.
@yuluny2 Hi, glad to hear that you have plan to support sparse tensor. I think it's a good starting point for dgl team to collaborate with you, there are a lot of opportunity for tvm to search best schedule for sparse matrix operations. It would be great if relay would be powerful enough so that user-defined message/reduce/apply_nodes(edges)/... in dgl could be converted to relay.

I also agree that you should not couple sparse tensors and dense tensors. The sparse tensor usually act as adjacency matrix in graph neural networks, while dense tensor acts as features in most cases. They are quite different and there is no need to unify them.

Yes, CSR is not enough, you have no access to number of columns, and at least you should maintain the attributes in scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.sparse.csr_matrix.html. Note that there might be multiple values correspond to one position in sparse matrix (this is a case in multigraph, where there are more than ones edges between two nodes).

My concern is that should we allow user to define a favoured format for each sparse tensor. For different combination of sparse matrix (different density, different degree variance, ...) and operations (spmm, sparse attention, sparse softmax and so on), the best parallel strategy with its required sparse format is quite different. Though it would be great if autotvm could search for the best format, a given format would make the work much easier.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3731#issuecomment-521370100

Re: [dmlc/tvm] [Relay][RFC] Relay support for Sparse Tensor (#3731)

Posted by Thierry Moreau <no...@github.com>.
@ZihengJiang @tqchen @jroesch @MarisaKirisame @yidawang @zheng-da @ajtulloch 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3731#issuecomment-519361871

Re: [dmlc/tvm] [Relay][RFC] Relay support for Sparse Tensor (#3731)

Posted by Junru Shao <no...@github.com>.
One thing I don’t quite get is: why couple sparse and dense tensors into a single type?

When we are building a system incrementally, for example, adding a new feature on top of the old system, the better way would be adding a new code path, instead of hacking old ones. This would allow clearer separation of functionalities, clearer error message and easier debugging.

In our particular case, let’s say, what if there are some code paths not treating the extra field “format”? For example, we forgot to touch code generation on a weird device to be aware of the extra field”format”. It would probably silently fail (or even not failing but produce wrong results) without proper error messages.

So my point is, if possible, it is always preferable to add a new type instead of hacking old ones, because we cannot guarantee that the rest of the system still treats the hacked type correctly.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3731#issuecomment-521275500

Re: [dmlc/tvm] [Relay][RFC] Relay support for Sparse Tensor (#3731)

Posted by Yulun Yao <no...@github.com>.
Thank you for all the feedback! @junrushao1994 @yzh119 @MarisaKirisame 

Sorry for the ambiguity in my proposal. I want to make some clarifications and answer some of the questions.

When we declare a `relay.var(TensorType(shape=(m,n), format='CSR'))`, we would lower it to a `tvm.placeholder(shape=(m,n), sformat=SparseFormat([Dense, Sparse])`, which is WIP by @ZihengJiang . We adopt the representation used in [TACO compiler] (https://github.com/tensor-compiler/taco) in which user can easily register `CSR = SparseFormat([Dense, Sparse])`. For each topi, we can then define operations and generate schedules for each sparse type (Maybe add an interface for registering ops & schedules?). The benefit of having an extra field in relay TensorType is that we can easily infer format and shape information from the Relay end. w.r.t. The tradeoff between having a new type or new field, I wonder if we want to maintain some consistency in between Relay and TVM Tensor Representation. If that's the case, a new type might be preferable.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3731#issuecomment-521456113

Re: [dmlc/tvm] [Relay][RFC] Relay support for Sparse Tensor (#3731)

Posted by Junru Shao <no...@github.com>.
Nice to see proposals on sparse format.

There are design trade offs between “having extra fields” and “design a new type”

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3731#issuecomment-519578537