You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/04/30 12:08:08 UTC

[GitHub] [incubator-tvm] FrozenGene edited a comment on pull request #5485: [TOPI][Winograd] Optimization of Conv2d Winograd algorithm on Tensor …

FrozenGene edited a comment on pull request #5485:
URL: https://github.com/apache/incubator-tvm/pull/5485#issuecomment-621791710


   For performance, have you tried some other layouts on GPU? I have some exp on CPU. The more suitable layout on CPU of NHWC input is:
   
   ```
     input_tile: alpha, alpha, P, CI
     data_pack: alpha, alpha, P, CI
     bgemm: alpha, alpha, P, CO
     inverse: m, m, P, CO
     output: N H W CO
     kernel: alpha alpha CO CI
   ```
   For kernel, I design `alpha alpha CO CI`, because I want to vectorize CI. Maybe on GPU, alpha alpha CI CO is better.
   
   I test your layout compared the layout I mentioned, your layout on skylake-512 is 0.388ms, but my layout I mentioned is 0.375ms. I use 20 threads on workload (1, 56, 56, 64, 64). The performance could be reproduced stabilized.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org