You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tvm.apache.org by 雾雨魔理沙 <no...@github.com> on 2019/10/03 17:42:45 UTC

[dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

I am trying to train mnist using tvm, and I hit an issue:
There is two function, loss and infer, which one calculate loss and gradient, and one just does infer.
However, create_executor/aot/vm all only take one single entry point. If there is multiple entry point, the passes will be called multiple time.
Furthurmore, I think the notion of 'compile then execute' is not enough for all the deep learning workload. For example, using our partial evaluator to specialize training/validation/testing data mean we must compile only after we had loaded all the data. Even if we accept that, since PE is slow, an optimization (at the client side) will be to pipeline the PEs: only do them per minibatch at epoch 0, then reuse. Or just has a thread that constantly PE things and fire in NON-PE mode if it did not PE that batch yet. Another example is NAS/AdaNet where the Network structure change all the time.

I think we can make the passes incremental (they are given a set of new things whenever invoked), and have the module keep track of the compiled vs uncompiled.
@jroesch @tqchen @zhiics @wweic @vinx13 @junrushao1994 what do you guys think?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Junru Shao <no...@github.com>.

Let's get back to the original topic, which is broader imo.

First of all, depending on your scenario, incremental compilation may be doable or not, like on edge devices where space is only allowed for tvm runtime, not compiler.

Then, I am actually in favor of incremental compilation, or some profiling guided stuff. I do think that it is inevitable for training, especially when people are increasingly interested in more flexible models.

However, it is not quite clear for me whether we need multiple entries. Can we do better than a single module? Maybe @comaniac can offer some thoughts.

BTW, in the treelstm example, i am curious why we need a training/dev/test-data-dependent full code replication? It seems to be possible to do that partially so that it could be more controllable.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538160033

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by 雾雨魔理沙 <no...@github.com>.

@junrushao1994 it is only possible for lstm. for treelstm if you do it it will blow up exponentially, and lots of time will be spend on testing the match on all the cases - unless some tricks are used (for example, a decision tree for pattern matching instead of linear)
still, this doesnt allow batching for treelstm

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538183244

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by 雾雨魔理沙 <no...@github.com>.

@junrushao1994 we unroll the control flow to be more efficient.
maybe multiple module can work, but there cant be code sharing between multiple ones.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538178832

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Tianqi Chen <no...@github.com>.

Closed #4054.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#event-2687946365

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Junru Shao <no...@github.com>.

I see. So for mnist, there is no such issue; but for treelstm, it is true that we are not able to do more optimization if we don't do code replication.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538158092

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by 雾雨魔理沙 <no...@github.com>.

@junrushao1994 yes, that is what I mean, inference mode and training mode, each mode compiled to one function.
we do the partial evaluator primarily for partially evaluating the control flow w.r.t data. Other framework also does this, but they require manual loop unrolling

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538135922

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by 雾雨魔理沙 <no...@github.com>.

However, that require the ability to create multiple entry (one entry per batch). If we also want to use relay for any form of jit, we must be able to interleave running relay/adding more definitions to a relay module.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538155989

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Junru Shao <no...@github.com>.

@MarisaKirisame Do you mean you want the training loop itself to be part of Relay?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538140726

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by 雾雨魔理沙 <no...@github.com>.

No. suppose I have a treelstm. Now normally, I cannot do any operator fusion/batching, because of control flow everywhere. Using the partial eval on batches of training data individually will solve this problem.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538155261

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Tianqi Chen <no...@github.com>.

By default we have one entry point("main") but there is nothing preventing us from calling into other compiled functions as long as they are exposed as PackedFunc. So we should have already supported that.

The incremental compilation is supported in the level of modules, which means we can have one module that get compiled, and as you have additional needs, generate new modules that can then get compiled.

Given that the topic of discussion does not yet have a actionable item, i would recommend we move the discussion to https://discuss.tvm.ai/ (examples like https://discuss.tvm.ai/c/rfc)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538463177

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Junru Shao <no...@github.com>.

Just to clarify:

> There is two function, loss and infer, which one calculate loss and gradient, and one just does infer.

Do you mean that there are two modes, training and inference, so that there are multiple entry points in relay Module?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538058861

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Junru Shao <no...@github.com>.

And also:

>  'compile then execute' is not enough for all the deep learning workload. For example, using our partial evaluator to specialize training/validation/testing data mean we must compile only after we had loaded all the data.

So in DL, common practice is that we specify the input shape in an ad-hoc way. Particularly, in MNIST, we know that our input is in shape `(batch_size, 784)`. For more complicated workloads, like models containing complicated control flow. I don't really think loading all the data would suffice. Probably compilation should happen in basic block level if say the IR is in CFG (so you need jit)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538060918

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Yao Wang <no...@github.com>.

Can we have a more detailed example to help clarify this issue?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538060274

Re: [dmlc/tvm] [RFC][Relay] Multiple Entries & Incremental compilation (#4054)

Posted by Junru Shao <no...@github.com>.

my point is that we don't have to do full data-dependent unrolling, but unroll a deterministic 4 steps or 10 steps to make it data independent

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/4054#issuecomment-538179490