You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tvm.apache.org by Tianqi Chen <no...@github.com> on 2019/04/17 19:03:39 UTC

[dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Recently, there are a few problems arise wrt to the fusor and tuples. The main problem is the incompatibility of the calling convention when dealing with tuples.

- At the lowest level, the primitive function is not aware of the tuple, and we simply flatten everything including inputs and outputs when there is a tuple input or return values.
- The function that calls into the primitive function is aware of tuples.

This creates tension between at which point we should run the fusion algorithm. Originally, both Tuple and TupleGetItem are opaque, which means we should not fuse them. Then throughout the course, we started to fuse tuple-related node because they are useful as **intermediate values**, and can be optimized away in primitive operations. 

However, tuple itself is bad as return values and can cause a bunch of problems for low-level code gen. In particular, we can get functions as follows(note the duplication in return values)

```
fn (%data: Tensor[(1, 32, 32, 3), float32]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
  %3 = fn (%p0: Tensor[(1, 32, 32, 3), float32], __dict__=meta[StrMap][0]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
    %0 = log(%p0)
    %1 = (%0, %0)
    %1
  }
  %3 = %2(%data)
  %3
}
```

Ideally, what we want is to fuse TupleNode to follow up consumer nodes if they are intermediate nodes, but not do so when they are the return values. So we won't suffer from this problem(as TupleNode itself cannot be the final master node). 

Let us use this thread to do some consolidated discussion on the related topic 
 


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Tianqi Chen <no...@github.com>.

implemented in #3092 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-487437651

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by 雾雨魔理沙 <no...@github.com>.

@tqchen where is %2?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484229709

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by masahi <no...@github.com>.

@zhiics It looks like the tuple with duplicated tensors is only problematic if it is the return value of a subfunction (i.e. a function that is lowered to topi and compiled by TVM). If we lift the tuple out of a subfunction and put it under the global function, it seems to work fine. The test below works on my local.

```
import tvm
from tvm import relay

data = relay.var("data", relay.ty.TensorType((1, 32, 32, 3), "float32"))
log = relay.log(data)
func = relay.Function([data],  relay.Tuple(tvm.convert([log, log])))
func = relay.ir_pass.infer_type(func)
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(func, target="llvm")
```

The tuple is now lifted out of subfunction %0.
```
fn (%data: Tensor[(1, 32, 32, 3), float32]) -> (Tensor[(1, 32, 32, 3), float32], Tensor[(1, 32, 32, 3), float32]) {
  %0 = fn (%p0: Tensor[(1, 32, 32, 3), float32], __dict__=meta[StrMap][0]) -> Tensor[(1, 32, 32, 3), float32] {
    log(%p0)
  }
  %1 = %0(%data)
  (%1, %1)
}
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484382985

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Zhi <no...@github.com>.

@junrushao1994 

> @tqchen where is %2?

There might be some code emitted, but the idea is to the problem when dealing with duplicate values in return tuples.

> why is the example bad for codegen

The output tensor is scheduled twice in compute_engine here:
https://github.com/dmlc/tvm/blob/552d4aa3587aa3a0443d050ceb324386489ff793/src/relay/backend/compile_engine.cc#L128

There are some problems and discussions in #2932 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484313254

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by masahi <no...@github.com>.

@zhiics can you replace [this block](https://github.com/dmlc/tvm/blob/master/src/relay/pass/fuse_ops.cc#L822-L848) with this

```
  Expr VisitExpr_(const TupleNode* tuple) {
    auto* ret_group = gmap_.at(tuple)->FindRoot();
    if (ret_group == gmap_.at(tuple)) {
      return ExprMutator::VisitExpr_(tuple);
    }
    Array<Expr> new_fields = GetNewArguments(tuple->fields, ret_group);
    // This tuple is an intermediate node in the group
    return TupleNode::make(new_fields);
  }

```

and try your PR?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484367143

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by masahi <no...@github.com>.

We should be aware that if we disable tuple fusion when tuple is the return value, we might lose some efficiency gain.

So this function 
```
fn (%x: Tensor[(64, 64), float32]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
  %2 = fn (%p0: Tensor[(64, 64), float32], __dict__=meta[StrMap][0]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
    %0 = strided_slice(%p0, begin=[0, 0], end=[32, 64], strides=[1, 1])
    %1 = strided_slice(%p0, begin=[32, 0], end=[64, 64], strides=[1, 1])
    (%0, %1)
  }
  %2(%x)
}
```

will become this
```
fn (%x: Tensor[(64, 64), float32]) -> (Tensor[(32, 64), float32], Tensor[(32, 64), float32]) {
  %0 = fn (%p0: Tensor[(64, 64), float32], __dict__=meta[StrMap][0]) -> Tensor[(32, 64), float32] {
    strided_slice(%p0, begin=[0, 0], end=[32, 64], strides=[1, 1])
  }
  %1 = %0(%x)
  %2 = fn (%p01: Tensor[(64, 64), float32], __dict__=meta[StrMap][1]) -> Tensor[(32, 64), float32] {
    strided_slice(%p01, begin=[32, 0], end=[64, 64], strides=[1, 1])
  }
  %3 = %2(%x)
  (%1, %3)
}
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484373366

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by masahi <no...@github.com>.

We can update TOPI to prevent scheduling the same op twice. It is partially done in #1556 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484324430

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Tianqi Chen <no...@github.com>.

Closed #3039.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#event-2305608583

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Tianqi Chen <no...@github.com>.

one possible way to make the fusor smarter. So that we forbid fusing of a node into TupleNode if that is the last in the group, but allow fuse TupleNode into subsequent injective ops, then in the second iteration the other node can fuse into tuple node because the group no longer have the tuple as return value 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484368962

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by masahi <no...@github.com>.

It seems #2412 is also related to this issue. @srkreddy1238 [mentioned](https://github.com/dmlc/tvm/pull/2412#issuecomment-453912379) that enabling fusion caused a LLVM error.

When fusion support for tuple was added in #2187, I handled the case where 
* the tuple is root of its group
* the tuple is fused with other ops before it 

by making a new function that returns the tuple (see [my comment](https://github.com/dmlc/tvm/pull/2187#issuecomment-442699052)). Is this what needs to be changed? @tqchen @zhiics @vinx13 ?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484362238

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Tianqi Chen <no...@github.com>.

@masahi In your exmaple of two strided_slice, given that we do not yet cooperatively fuse the two parallel strided_slice together atm, the second version is mostly as efficient as the first one.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484570960

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by masahi <no...@github.com>.

@zhiics I don't know a solution for your problem without breaking other parts (like breaking calling convention as you and @vinx13 said). I'll trying fixing the fuser so that a tuple like (%0, %0) won't happen. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484360302

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Zhi <no...@github.com>.

@masahi I see, thanks. Another option is probably using a copy operator if there are duplicates.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484385353

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by masahi <no...@github.com>.

@tqchen I'm looking into the following approach:

* Initially, the tuple and its fields are marked as Injective, so that they can be fused as much as possible.
* Later, if we detect that a tuple is the root of its group in [this line](https://github.com/dmlc/tvm/blob/master/src/relay/pass/fuse_ops.cc#L825), we do all necessary patchwork to make each tuple fields become root in its group and return the tuple itself as a isolated node.

I think this approach requires minimum code change. The patchwork only includes updating some pointers (parent, root_ref, etc). 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484388597

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Zhi <no...@github.com>.

@masahi Can we prevent from passing duplicated tensors instead? It looks that we otherwise need to change all schedules for all targets in topi, right?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484340450

Re: [dmlc/tvm] [RFC][DISCUSS] Tuple-related Fusion (#3039)

Posted by Zhi <no...@github.com>.

BTW, I am not certain that stopping fusing return tuple will fully solve the problem because it looks to me that we will still have two identical tensor in the tuple, right? Am I missing something?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3039#issuecomment-484376563