You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/03/04 19:37:19 UTC

[GitHub] [incubator-tvm] wpan11nv opened a new pull request #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

wpan11nv opened a new pull request #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983
 
 
   - This patch allows CUDA backend to emit correct code for
     selects with vector conditions, which may be produced
     by floordiv op lowering etc..
   
   - This already works for llvm BE, as llvm select instruction
     supports vector conditions.
   
   Signed-off-by: Wei Pan <we...@nvidia.com>
   
   Thanks for contributing to TVM!   Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-597408983
 
 
   LGTM, but it seems indentation is broken in cuda source codegen. Not important, but would be nice to clean it up.
   
   ```
   extern "C" __global__ void default_function_kernel0( float* __restrict__ B,  float* __restrict__ A) {
       float4 _1;
               int4 _2 = make_int4(37, 37, 37, 37);
               int4 _3 = make_int4(0, 0, 0, 0);
               ushort4 _4;
               _4.x = (_2.x>=_3.x);
               _4.y = (_2.y>=_3.y);
               _4.z = (_2.z>=_3.z);
               _4.w = (_2.w>=_3.w);
                 int4 _5 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
                 int4 _6 = make_int4(37, 37, 37, 37);
                 int4 _7;
                 _7.x = (_5.x%_6.x);
                 _7.y = (_5.y%_6.y);
                 _7.z = (_5.z%_6.z);
                 _7.w = (_5.w%_6.w);
               int4 _8 = make_int4(0, 0, 0, 0);
               ushort4 _9;
               _9.x = (_7.x>=_8.x);
               _9.y = (_7.y>=_8.y);
               _9.z = (_7.z>=_8.z);
               _9.w = (_7.w>=_8.w);
             ushort4 _10;
             _10.x = (_4.x&&_9.x);
             _10.y = (_4.y&&_9.y);
             _10.z = (_4.z&&_9.z);
             _10.w = (_4.w&&_9.w);
               int4 _11 = make_int4(37, 37, 37, 37);
               int4 _12 = make_int4(0, 0, 0, 0);
               ushort4 _13;
               _13.x = (_11.x<_12.x);
               _13.y = (_11.y<_12.y);
               _13.z = (_11.z<_12.z);
               _13.w = (_11.w<_12.w);
                 int4 _14 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
                 int4 _15 = make_int4(37, 37, 37, 37);
                 int4 _16;
                 _16.x = (_14.x%_15.x);
                 _16.y = (_14.y%_15.y);
                 _16.z = (_14.z%_15.z);
                 _16.w = (_14.w%_15.w);
               int4 _17 = make_int4(0, 0, 0, 0);
               ushort4 _18;
               _18.x = (_16.x<=_17.x);
               _18.y = (_16.y<=_17.y);
               _18.z = (_16.z<=_17.z);
               _18.w = (_16.w<=_17.w);
             ushort4 _19;
             _19.x = (_13.x&&_18.x);
             _19.y = (_13.y&&_18.y);
             _19.z = (_13.z&&_18.z);
             _19.w = (_13.w&&_18.w);
           ushort4 _20;
           _20.x = (_10.x||_19.x);
           _20.y = (_10.y||_19.y);
           _20.z = (_10.z||_19.z);
           _20.w = (_10.w||_19.w);
           int4 _21 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
           int4 _22 = make_int4(37, 37, 37, 37);
           int4 _23;
           _23.x = (_21.x/_22.x);
           _23.y = (_21.y/_22.y);
           _23.z = (_21.z/_22.z);
           _23.w = (_21.w/_22.w);
             int4 _24 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
             int4 _25 = make_int4(37, 37, 37, 37);
             int4 _26;
             _26.x = (_24.x/_25.x);
             _26.y = (_24.y/_25.y);
             _26.z = (_24.z/_25.z);
             _26.w = (_24.w/_25.w);
           int4 _27 = make_int4(1, 1, 1, 1);
           int4 _28;
           _28.x = (_26.x-_27.x);
           _28.y = (_26.y-_27.y);
           _28.z = (_26.z-_27.z);
           _28.w = (_26.w-_27.w);
         int4 _29;
         _29.x = (bool(_20.x)?_23.x:_28.x);
         _29.y = (bool(_20.y)?_23.y:_28.y);
         _29.z = (bool(_20.z)?_23.z:_28.z);
         _29.w = (bool(_20.w)?_23.w:_28.w);
       _1.x = A[_29.x];
       _1.y = A[_29.y];
       _1.z = A[_29.z];
       _1.w = A[_29.w];
     (( float4*)(B + ((((int)threadIdx.x) * 4))))[0] = _1;
   }
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-597721169
 
 
   > LGTM, but it seems indentation is broken in cuda source codegen. Not important, but would be nice to clean it up.
   > 
   > ```
   > extern "C" __global__ void default_function_kernel0( float* __restrict__ B,  float* __restrict__ A) {
   >     float4 _1;
   >             int4 _2 = make_int4(37, 37, 37, 37);
   >             int4 _3 = make_int4(0, 0, 0, 0);
   >             ushort4 _4;
   >             _4.x = (_2.x>=_3.x);
   >             _4.y = (_2.y>=_3.y);
   >             _4.z = (_2.z>=_3.z);
   >             _4.w = (_2.w>=_3.w);
   >               int4 _5 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
   >               int4 _6 = make_int4(37, 37, 37, 37);
   >               int4 _7;
   >               _7.x = (_5.x%_6.x);
   >               _7.y = (_5.y%_6.y);
   >               _7.z = (_5.z%_6.z);
   >               _7.w = (_5.w%_6.w);
   >             int4 _8 = make_int4(0, 0, 0, 0);
   >             ushort4 _9;
   >             _9.x = (_7.x>=_8.x);
   >             _9.y = (_7.y>=_8.y);
   >             _9.z = (_7.z>=_8.z);
   >             _9.w = (_7.w>=_8.w);
   >           ushort4 _10;
   >           _10.x = (_4.x&&_9.x);
   >           _10.y = (_4.y&&_9.y);
   >           _10.z = (_4.z&&_9.z);
   >           _10.w = (_4.w&&_9.w);
   >             int4 _11 = make_int4(37, 37, 37, 37);
   >             int4 _12 = make_int4(0, 0, 0, 0);
   >             ushort4 _13;
   >             _13.x = (_11.x<_12.x);
   >             _13.y = (_11.y<_12.y);
   >             _13.z = (_11.z<_12.z);
   >             _13.w = (_11.w<_12.w);
   >               int4 _14 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
   >               int4 _15 = make_int4(37, 37, 37, 37);
   >               int4 _16;
   >               _16.x = (_14.x%_15.x);
   >               _16.y = (_14.y%_15.y);
   >               _16.z = (_14.z%_15.z);
   >               _16.w = (_14.w%_15.w);
   >             int4 _17 = make_int4(0, 0, 0, 0);
   >             ushort4 _18;
   >             _18.x = (_16.x<=_17.x);
   >             _18.y = (_16.y<=_17.y);
   >             _18.z = (_16.z<=_17.z);
   >             _18.w = (_16.w<=_17.w);
   >           ushort4 _19;
   >           _19.x = (_13.x&&_18.x);
   >           _19.y = (_13.y&&_18.y);
   >           _19.z = (_13.z&&_18.z);
   >           _19.w = (_13.w&&_18.w);
   >         ushort4 _20;
   >         _20.x = (_10.x||_19.x);
   >         _20.y = (_10.y||_19.y);
   >         _20.z = (_10.z||_19.z);
   >         _20.w = (_10.w||_19.w);
   >         int4 _21 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
   >         int4 _22 = make_int4(37, 37, 37, 37);
   >         int4 _23;
   >         _23.x = (_21.x/_22.x);
   >         _23.y = (_21.y/_22.y);
   >         _23.z = (_21.z/_22.z);
   >         _23.w = (_21.w/_22.w);
   >           int4 _24 = (make_int4)(((((int)threadIdx.x) * 4))+(1*0), ((((int)threadIdx.x) * 4))+(1*1), ((((int)threadIdx.x) * 4))+(1*2), ((((int)threadIdx.x) * 4))+(1*3));
   >           int4 _25 = make_int4(37, 37, 37, 37);
   >           int4 _26;
   >           _26.x = (_24.x/_25.x);
   >           _26.y = (_24.y/_25.y);
   >           _26.z = (_24.z/_25.z);
   >           _26.w = (_24.w/_25.w);
   >         int4 _27 = make_int4(1, 1, 1, 1);
   >         int4 _28;
   >         _28.x = (_26.x-_27.x);
   >         _28.y = (_26.y-_27.y);
   >         _28.z = (_26.z-_27.z);
   >         _28.w = (_26.w-_27.w);
   >       int4 _29;
   >       _29.x = (bool(_20.x)?_23.x:_28.x);
   >       _29.y = (bool(_20.y)?_23.y:_28.y);
   >       _29.z = (bool(_20.z)?_23.z:_28.z);
   >       _29.w = (bool(_20.w)?_23.w:_28.w);
   >     _1.x = A[_29.x];
   >     _1.y = A[_29.y];
   >     _1.z = A[_29.z];
   >     _1.w = A[_29.w];
   >   (( float4*)(B + ((((int)threadIdx.x) * 4))))[0] = _1;
   > }
   > ```
   Yes, I noticed that indention issue too. I will have a look. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi edited a comment on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
masahi edited a comment on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-597271775
 
 
   @wpan11nv I'll take a look (@vinx13 is currently a grad student, busy). I have a commit right

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jmorrill edited a comment on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
jmorrill edited a comment on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-596712676
 
 
   > Kindly ping. Can someone help review this PR?
   
   @wpan11nv, I think you have to tag some reviewers from here:
   https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi merged pull request #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
masahi merged pull request #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-597409089
 
 
   Thanks @wpan11nv @jmorrill @vinx13 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-596715298
 
 
   > While I am not qualified to give a review of this, I have applied your changes on this PR and it I was able to compile a mxnet model to a cuda tvm graphruntime.
   > 
   > autotvm also looks like it is working correctly.
   
   @jmorrill  Thanks for confirming this fix!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-594786914
 
 
   The test fails without this patch. It is also exposed by https://github.com/apache/incubator-tvm/pull/4968, in which a simple kernel fails during the CUDA codegen
   
   // attr [iter_var(blockIdx.x, , blockIdx.x)] thread_extent = 98
   // attr [iter_var(threadIdx.x, , threadIdx.x)] thread_extent = 1024
   T_relu[ramp(((blockIdx.x*2048) + (threadIdx.x*2)), 1, 2)] = max((placeholder[ramp(((blockIdx.x*2048) + (threadIdx.x*2)), 1, 2)] + placeholder[**floordiv(ramp(((blockIdx.x*2048) + (threadIdx.x*2)), 1, 2), x2(3136))**]), x2(0f))
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jmorrill commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
jmorrill commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-596710773
 
 
   While I am not qualified to give a review of this, I have applied your changes on this PR and it I was able to compile a mxnet model to a cuda tvm graphruntime.
   
   autotvm also looks like it is working correctly.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-596695940
 
 
   Kindly ping. Can someone help review this PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
wpan11nv commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-596714625
 
 
   @vinx13 Could you help review this PR?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] masahi commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
masahi commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-597271775
 
 
   @wpan11nv I'll take a look (@vinx13 is currently a grad student, busy)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-tvm] jmorrill commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode

Posted by GitBox <gi...@apache.org>.
jmorrill commented on issue #4983: [CodeGen][CUDA] Enhance CUDA codegen for SelectNode
URL: https://github.com/apache/incubator-tvm/pull/4983#issuecomment-596712676
 
 
   > Kindly ping. Can someone help review this PR?
   @wpan11nv, I think you have to tag some reviewers from here:
   https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services