You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/07/21 09:51:14 UTC

[GitHub] [incubator-tvm] FrozenGene commented on pull request #6095: Improve NHWC depthwise convolution for AArch64

FrozenGene commented on pull request #6095:
URL: https://github.com/apache/incubator-tvm/pull/6095#issuecomment-661756915

> ACL implementation

Hi @giuseros Thanks for the work. I fully understand your purpose and smoothy development path. As this schedule will be the default NHWC depthwise convolution, my opinion is we should try to achieve a good performance as far as we could achieve. Notably I don't mean we mush achieve like ACL ultimate performance then we could merge, optimization is not one-shot deal. But here I think we could enable auto tvm to help us to achieve better performance. I think it is worthy introducing into this pr.

- This schedule will be applied for arm32 and arm64 both, we shouldn't only consider arm64. So auto tvm could help us to avoid this issue.

- Tuning knob of `compute_at` (especially `data_pad`) could help us solve `parallel-compute-locality` issue (we can not assume we only run kernel only in one single core). see more detail: http://people.csail.mit.edu/jrk/halide-pldi13.pdf Figure 2

I agree we should reduce tuning knob and improve tuning time experience, but if it could help us improve performance, I think we should introduce it in, otherwise we could avoid it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org