You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/06/16 09:50:08 UTC

[GitHub] [tvm] apeskov commented on pull request #11642: Enable QNN primitives for DNNL runtime

apeskov commented on PR #11642:
URL: https://github.com/apache/tvm/pull/11642#issuecomment-1157461403

   Hi @yangulei,
   
   As I remember zero-copy input/output of tensors should already works in TVM main. If you define relay layouts match with DNNL expectations it will use external buffer as is without any copying or processing. It was one of the goal of PR  https://github.com/apache/tvm/pull/11345. Could you please be more specific about scenarios you would like to optimise?
   
   Regarding `post-op sum`. You are absolutely right, non in-place op before `add` break correctness. Mem copy is inevitable. In case of post op sum input data should be put into DST tensor of DNNL primitive and execution of primitive will rewrite this data. In contrast of `post-op binary add` read data from separate input tensor. Currently `binary add` has a limited support by primitives which lead to `ref:any`. Also it has slightly worse performance because it lead to one more memory access pass.
   
   In case of missing layouts DNNL BOC runtime automatically inject required reorder primitives. And it will looks like next:
   
   ``` 
                       bias --
                              \
   in1 -> RORDER_1 -> tmp_1 -> CONV -> tmp3 -> RORDER_3 -> out
                                     /
                  in2 -> RORDER_2 --
   ```
   Problem in tensor tmp_3. There is 2 primate which produce data of tmp_3. That's break concept of data flow graph. `REORDER_2` should be executed strongly before `CONV` primitive. If you take a look in this patch in code relate with in-place simulation ([link to it](https://github.com/apeskov/tvm/blob/054901196b5c562f70208b0d9394d16e305e6269/src/runtime/contrib/dnnl/dnnl_json_runtime.cc#L771-L788)) you will see exactly I said. Essentially it's just copy input data to dst tensor **exactly** before convolution primitive.
   
   `Post op sum` is very tricky and has a lot of requirements. It works only if proper layouts for `conv` and `add` was selected. It requires validation of ability to rewrite input tensor memory (for Resnet50  this is correct but in arbitrary case it should be checked).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org