You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tvm.apache.org by Yao Wang <no...@github.com> on 2020/02/28 18:56:21 UTC

[apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

### Summary
In today’s TVM TensorFlow frontend, there is only limited support for control flow, which resulting in difficult in covering TensorFlow object detection models. In this RFC, we will discuss how to improve current TVM TF frontend to fully support TensorFlow control flow.

### Solution
#### Visit TensorFlow Graph Def
Currently TVM parses TensorFlow Graph Def node by node in topological order. However, topological order might not stand any more if control flow is introduced into graph def. We do the following to alter the visiting order:

1. Visit all nodes in original graph def order and fetch all control flow nodes.
2. For all control flows nodes, modify the order so that for each while_loop, all Exit nodes sit at the end.
3. Visit all control flow nodes in modified order and generate corresponding Relay Functions. For each control flow node, we backtrack all its ancestor nodes until input nodes.
4. Visit and parse other unvisited nodes.

#### Parse Control Flow nodes
In TVM, while loop is represented as a recursive function. To convert a TensorFlow while_loop block into a Relay function, the following assumption must stand:

1. All `Exit` nodes under the same while_loop must have the same node name prefix as this while loop block, which is got with `node.name.rsplit('/', 1)[0]`. In TensorFlow 1.x, user can specify the name of while_loop, but `Exit` node prefix is automated generated with ancestor nodes’ names. 
2. `Merge`, `Switch`, `NextIteration` and `Exit` nodes with the same postfix number are belong to the same iteration body. For example, `Merge_1`, `Switch_1`, `NextIteration_1` and `Exit_1` should refer to the same iteration body. Since these four nodes are generated when user creating while_loop, the postfix number can’t be arbitrarily set.

We convert a while loop block to a Relay function with the following method:

1. When we get a `Merge` node inside a while_loop block: if it is a direct child of a `while_loop` block, we create a Loop wrapper to start adding all sub-nodes under this loop block. Otherwise, we create a complete branch op with previously stored Switch condition.
2. When we get an `Enter` node: we simply convert its input node to Relay node.
3. When we get a `LoopCond` node: we first convert its input node to Relay node. Then we set the converted node as our Loop wrapper condition.
4. When we get a `Switch` node: we first convert both its input and condition to Relay nodes. If it is a direct child of a  while_loop block, we add converted input to loop variable list of the corresponding Loop wrapper. Otherwise, we create a Branch wrapper, set its condition to be converted condition node and add it to a global branch dictionary.
5. When we get an `Exit` node: we generate Relay function for the corresponding while loop body.

### Tested models
With above changes(together with pending PR of dynamic NMS), we can convert and compile TensorFlow object detection models. The following models from TensorFlow object detection model zoo are tested: 
ssd_mobilenet_v1_coco
ssd_mobilenet_v2_coco 
ssd_resnet_50_fpn_coco
mask_rcnn_inception_v2_coco
mask_rcnn_resnet50_atrous_coco
faster_rcnn_resnet50_coco

###TODO
Investigate TensorFlow 2.0 control flow and what changes are required.

@tqchen @jroesch @srkreddy1238 @zhiics @yongwww @wweic 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by Zhi <no...@github.com>.
Just a reminder, to support these models we need some patches for tensor array as well. mask_rcnn seems requiring some more debugging.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-592675406

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by heliqi <no...@github.com>.
Have you try the nlp model? I use latest code, dynamic sahpe don't working.
recursively find the input to the control flow nodes , have some problems with fixed dynamic input.

For example, I set the 'Placeholder' op(the original shape is dynamic (-1,-1)) shape as (1,30) in the from_tensorflow interface.First, for each control flow node, we backtrack all its ancestor nodes until input nodes.But all the nodes we first backtracked do not necessarily contain 'Placeholder' node, then the shape of these nodes is dynamic.until some control node search for and get the 'Placeholder' node.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-607026858

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by Yao Wang <no...@github.com>.
This feature is now supported in TVM.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-643081334

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by masahi <no...@github.com>.
I'm not familiar with dynamic shape support in Relay, so I cannot comment on how it relates to fusion. But it sounds like a interesting problem.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-592795910

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by Yao Wang <no...@github.com>.
One blocking issue for dynamic NMS is that it needs to use dynamic strided_slice. However, current fusion pass can't handle dynamic shape well yet. As a result, we can't fuse dynamic strided_slice and have to temporarily mark strided_slice as Opaque. However, this will result in other normal static strided_slice not fused and cause performance issue. Maybe we need to directly handle fusion for dynamic shape. @masahi do you have any idea on how to improve this?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-592759262

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by Yao Wang <no...@github.com>.
Closed #4969.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#event-3437024852

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by Zhi <no...@github.com>.
ahh, one more reminder, for all these models, we will have OOM problem for pretty printing after the ANF pass. It is very likely because recursively visiting the AST saves all the intermediate results. 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-592677784

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by masahi <no...@github.com>.
wow, supporting mask rcnn is a great achievement! I was also trying to support mask rcnn in torch, but I was stuck at dynamic nms. https://discuss.tvm.ai/t/supporting-non-maximum-suppresion-op-with-dynamic-output-shape/5818/

Is "pending dynamic NMS PR" going to be useful to pytorch use case?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-592743884

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by masahi <no...@github.com>.
great, will take a look. Hopefully the PR would become active soon.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-592751982

Re: [apache/incubator-tvm] [RFC] Enhance TensorFlow Frontend Control Flow Support (#4969)

Posted by Zhi <no...@github.com>.
Yes, @yongwww had one months back

https://github.com/apache/incubator-tvm/pull/4312


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/issues/4969#issuecomment-592746247