You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/10/15 20:19:28 UTC

[GitHub] [incubator-tvm] trevor-m opened a new issue #6691: [Performance] Large performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

trevor-m opened a new issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691


   I've started noticing a large performance regression affecting Keras MobileNetV2 caused by `INDEX_DEFAULT_I64=ON` (PR #6143). This is on an AWS m5.12xlarge instance.
   
   INDEX_DEFAULT_I64 | Frames per second
   ------------ | -------------
   ON | 66.56
   OFF | 435.49
   
   I profiled the ops and found the slowdown comes from the
   
   ## Profile with `INDEX_DEFAULT_I64=OFF` (fast)
   
   ```
   Node Name                                            Ops                                                 Time(us)  Time(%)  Shape                 Inputs  Outputs  
   ---------                                            ---                                                 --------  -------  -----                 ------  -------  
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_7   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_7  64.704    3.571    (1, 9, 56, 56, 16)    3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_6   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_6  53.362    2.945    (1, 2, 112, 112, 16)  3       1        
   fused_nn_pad_3                                       fused_nn_pad_3                                      50.582    2.791    (1, 6, 113, 113, 16)  1       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_5   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_5  47.874    2.642    (1, 6, 56, 56, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_6             fused_nn_contrib_conv2d_NCHWc_add_clip_6            46.828    2.584    (1, 6, 112, 112, 16)  3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_8   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_8  42.364    2.338    (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_91  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_9  39.554    2.183    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_81  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_8  39.418    2.175    (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_4              fused_nn_contrib_conv2d_NCHWc_add_add_4             38.871    2.145    (1, 2, 56, 56, 12)    4       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_9   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_9  37.926    2.093    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_5             fused_nn_contrib_conv2d_NCHWc_add_clip_5            37.407    2.064    (1, 9, 56, 56, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_51            fused_nn_contrib_conv2d_NCHWc_add_clip_5            35.349    1.951    (1, 9, 56, 56, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip               fused_nn_contrib_conv2d_NCHWc_add_clip              34.692    1.915    (1, 80, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_6                  fused_nn_contrib_conv2d_NCHWc_add_6                 34.052    1.879    (1, 1, 112, 112, 16)  3       1        
   fused_nn_contrib_conv2d_NCHWc_add                    fused_nn_contrib_conv2d_NCHWc_add                   33.58     1.853    (1, 20, 7, 7, 16)     3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_21  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  33.298    1.838    (1, 24, 14, 14, 16)   3       1        
   fused_nn_pad_2                                       fused_nn_pad_2                                      33.201    1.832    (1, 9, 57, 57, 16)    1       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_22  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  33.057    1.824    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  33.027    1.823    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_23  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  32.787    1.809    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_5                  fused_nn_contrib_conv2d_NCHWc_add_5                 32.332    1.784    (1, 2, 56, 56, 12)    3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip     fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip    32.156    1.775    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip1    fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip    31.68     1.748    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip2    fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip    30.832    1.701    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_7             fused_nn_contrib_conv2d_NCHWc_add_clip_7            30.521    1.684    (1, 2, 112, 112, 16)  3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_11             fused_nn_contrib_conv2d_NCHWc_add_add_1             30.012    1.656    (1, 6, 14, 14, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_1              fused_nn_contrib_conv2d_NCHWc_add_add_1             29.914    1.651    (1, 6, 14, 14, 16)    4       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_4   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_4  28.642    1.581    (1, 9, 28, 28, 16)    3       1        
   fused_nn_global_avg_pool2d                           fused_nn_global_avg_pool2d                          28.552    1.576    (1, 80, 1, 1, 16)     1       1        
   fused_layout_transform_40                            fused_layout_transform_40                           26.741    1.476    (1, 8, 56, 56, 12)    1       1        
   fused_layout_transform_41                            fused_layout_transform_41                           25.793    1.423    (1, 12, 56, 56, 12)   1       1        
   fused_nn_contrib_conv2d_NCHWc_add_add1               fused_nn_contrib_conv2d_NCHWc_add_add               25.759    1.422    (1, 10, 7, 7, 16)     4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_2              fused_nn_contrib_conv2d_NCHWc_add_add_2             25.566    1.411    (1, 4, 14, 14, 16)    4       1        
   fused_nn_dense_add                                   fused_nn_dense_add                                  25.52     1.408    (1, 1000)             3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add                fused_nn_contrib_conv2d_NCHWc_add_add               25.391    1.401    (1, 10, 7, 7, 16)     4       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_21            fused_nn_contrib_conv2d_NCHWc_add_clip_2            25.345    1.399    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_2             fused_nn_contrib_conv2d_NCHWc_add_clip_2            25.262    1.394    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_22            fused_nn_contrib_conv2d_NCHWc_add_clip_2            24.895    1.374    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_3              fused_nn_contrib_conv2d_NCHWc_add_add_3             24.679    1.362    (1, 2, 28, 28, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_31             fused_nn_contrib_conv2d_NCHWc_add_add_3             24.553    1.355    (1, 2, 28, 28, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_2                  fused_nn_contrib_conv2d_NCHWc_add_2                 23.364    1.289    (1, 6, 14, 14, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_21             fused_nn_contrib_conv2d_NCHWc_add_add_2             23.264    1.284    (1, 4, 14, 14, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_22             fused_nn_contrib_conv2d_NCHWc_add_add_2             23.006    1.27     (1, 4, 14, 14, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_11            fused_nn_contrib_conv2d_NCHWc_add_clip_1            22.724    1.254    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_32            fused_nn_contrib_conv2d_NCHWc_add_clip_3            22.722    1.254    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_41            fused_nn_contrib_conv2d_NCHWc_add_clip_4            22.522    1.243    (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_1             fused_nn_contrib_conv2d_NCHWc_add_clip_1            22.247    1.228    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_33            fused_nn_contrib_conv2d_NCHWc_add_clip_3            21.648    1.195    (1, 24, 14, 14, 16)   3       1        
   fused_nn_pad                                         fused_nn_pad                                        21.439    1.183    (1, 36, 15, 15, 16)   1       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_12            fused_nn_contrib_conv2d_NCHWc_add_clip_1            21.437    1.183    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_4                  fused_nn_contrib_conv2d_NCHWc_add_4                 21.426    1.182    (1, 2, 28, 28, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_1                  fused_nn_contrib_conv2d_NCHWc_add_1                 21.227    1.171    (1, 10, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_31            fused_nn_contrib_conv2d_NCHWc_add_clip_3            20.739    1.145    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_3             fused_nn_contrib_conv2d_NCHWc_add_clip_3            20.719    1.143    (1, 24, 14, 14, 16)   3       1        
   fused_nn_softmax                                     fused_nn_softmax                                    19.798    1.093    (1, 1000)             1       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_42            fused_nn_contrib_conv2d_NCHWc_add_clip_4            19.751    1.09     (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_4             fused_nn_contrib_conv2d_NCHWc_add_clip_4            19.679    1.086    (1, 12, 28, 28, 16)   3       1        
   fused_nn_pad_1                                       fused_nn_pad_1                                      18.729    1.034    (1, 12, 29, 29, 16)   1       1        
   fused_nn_contrib_conv2d_NCHWc_add_3                  fused_nn_contrib_conv2d_NCHWc_add_3                 18.411    1.016    (1, 4, 14, 14, 16)    3       1        
   fused_nn_pad_layout_transform                        fused_nn_pad_layout_transform                       18.159    1.002    (1, 1, 225, 225, 3)   1       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_3   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_3  15.938    0.88     (1, 12, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_1   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_1  15.438    0.852    (1, 36, 7, 7, 16)     3       1        
   fused_layout_transform_transpose_nn_batch_flatten    fused_layout_transform_transpose_nn_batch_flatten   1.563     0.086    (1, 1280)             1       1        
   Total_time                                           -                                                   1812.033  -        -                     -       -        
   ```
   
   ## Profile with `INDEX_DEFAULT_I64=ON` (slow)
   ```
   Node Name                                            Ops                                                 Time(us)   Time(%)  Shape                 Inputs  Outputs  
   ---------                                            ---                                                 --------   -------  -----                 ------  -------  
   fused_nn_contrib_conv2d_NCHWc_add_add_1              fused_nn_contrib_conv2d_NCHWc_add_add_1             3105.8     21.391   (1, 6, 14, 14, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_11             fused_nn_contrib_conv2d_NCHWc_add_add_1             3104.62    21.382   (1, 6, 14, 14, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_2              fused_nn_contrib_conv2d_NCHWc_add_add_2             2200.03    15.152   (1, 4, 14, 14, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_21             fused_nn_contrib_conv2d_NCHWc_add_add_2             2189.84    15.082   (1, 4, 14, 14, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_22             fused_nn_contrib_conv2d_NCHWc_add_add_2             2185.71    15.054   (1, 4, 14, 14, 16)    4       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_7   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_7  60.094     0.414    (1, 9, 56, 56, 16)    3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_91  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_9  52.82      0.364    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_6   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_6  51.393     0.354    (1, 2, 112, 112, 16)  3       1        
   fused_nn_pad_3                                       fused_nn_pad_3                                      51.19      0.353    (1, 6, 113, 113, 16)  1       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_5   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_5  49.058     0.338    (1, 6, 56, 56, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_6             fused_nn_contrib_conv2d_NCHWc_add_clip_6            46.637     0.321    (1, 6, 112, 112, 16)  3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  43.381     0.299    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_8   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_8  40.165     0.277    (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_23  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  39.355     0.271    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_22  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  39.205     0.27     (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_4              fused_nn_contrib_conv2d_NCHWc_add_add_4             38.595     0.266    (1, 2, 56, 56, 12)    4       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_9   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_9  38.019     0.262    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_81  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_8  37.559     0.259    (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_5             fused_nn_contrib_conv2d_NCHWc_add_clip_5            36.159     0.249    (1, 9, 56, 56, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_51            fused_nn_contrib_conv2d_NCHWc_add_clip_5            35.269     0.243    (1, 9, 56, 56, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip               fused_nn_contrib_conv2d_NCHWc_add_clip              34.755     0.239    (1, 80, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_2                  fused_nn_contrib_conv2d_NCHWc_add_2                 34.248     0.236    (1, 6, 14, 14, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_6                  fused_nn_contrib_conv2d_NCHWc_add_6                 33.65      0.232    (1, 1, 112, 112, 16)  3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_7             fused_nn_contrib_conv2d_NCHWc_add_clip_7            33.163     0.228    (1, 2, 112, 112, 16)  3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_21  fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_2  32.593     0.224    (1, 24, 14, 14, 16)   3       1        
   fused_nn_pad_2                                       fused_nn_pad_2                                      32.542     0.224    (1, 9, 57, 57, 16)    1       1        
   fused_nn_contrib_conv2d_NCHWc_add                    fused_nn_contrib_conv2d_NCHWc_add                   32.471     0.224    (1, 20, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_5                  fused_nn_contrib_conv2d_NCHWc_add_5                 31.587     0.218    (1, 2, 56, 56, 12)    3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip     fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip    30.659     0.211    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip1    fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip    30.109     0.207    (1, 60, 7, 7, 16)     3       1        
   fused_nn_pad                                         fused_nn_pad                                        29.258     0.202    (1, 36, 15, 15, 16)   1       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_4   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_4  29.083     0.2      (1, 9, 28, 28, 16)    3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_2             fused_nn_contrib_conv2d_NCHWc_add_clip_2            28.273     0.195    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip2    fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip    28.052     0.193    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_22            fused_nn_contrib_conv2d_NCHWc_add_clip_2            27.855     0.192    (1, 36, 14, 14, 16)   3       1        
   fused_layout_transform_40                            fused_layout_transform_40                           27.811     0.192    (1, 8, 56, 56, 12)    1       1        
   fused_nn_global_avg_pool2d                           fused_nn_global_avg_pool2d                          27.724     0.191    (1, 80, 1, 1, 16)     1       1        
   fused_layout_transform_41                            fused_layout_transform_41                           27.308     0.188    (1, 12, 56, 56, 12)   1       1        
   fused_nn_dense_add                                   fused_nn_dense_add                                  26.655     0.184    (1, 1000)             3       1        
   fused_nn_contrib_conv2d_NCHWc_add_1                  fused_nn_contrib_conv2d_NCHWc_add_1                 26.406     0.182    (1, 10, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add                fused_nn_contrib_conv2d_NCHWc_add_add               25.447     0.175    (1, 10, 7, 7, 16)     4       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_21            fused_nn_contrib_conv2d_NCHWc_add_clip_2            25.433     0.175    (1, 36, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add1               fused_nn_contrib_conv2d_NCHWc_add_add               25.276     0.174    (1, 10, 7, 7, 16)     4       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_11            fused_nn_contrib_conv2d_NCHWc_add_clip_1            24.78      0.171    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_31             fused_nn_contrib_conv2d_NCHWc_add_add_3             24.132     0.166    (1, 2, 28, 28, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_12            fused_nn_contrib_conv2d_NCHWc_add_clip_1            23.359     0.161    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_add_3              fused_nn_contrib_conv2d_NCHWc_add_add_3             23.226     0.16     (1, 2, 28, 28, 16)    4       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_31            fused_nn_contrib_conv2d_NCHWc_add_clip_3            22.999     0.158    (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_1             fused_nn_contrib_conv2d_NCHWc_add_clip_1            22.372     0.154    (1, 60, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_41            fused_nn_contrib_conv2d_NCHWc_add_clip_4            21.948     0.151    (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_4                  fused_nn_contrib_conv2d_NCHWc_add_4                 21.359     0.147    (1, 2, 28, 28, 16)    3       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_1   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_1  21.269     0.146    (1, 36, 7, 7, 16)     3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_33            fused_nn_contrib_conv2d_NCHWc_add_clip_3            20.916     0.144    (1, 24, 14, 14, 16)   3       1        
   fused_nn_softmax                                     fused_nn_softmax                                    20.415     0.141    (1, 1000)             1       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_3             fused_nn_contrib_conv2d_NCHWc_add_clip_3            20.37      0.14     (1, 24, 14, 14, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_4             fused_nn_contrib_conv2d_NCHWc_add_clip_4            19.395     0.134    (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_32            fused_nn_contrib_conv2d_NCHWc_add_clip_3            19.306     0.133    (1, 24, 14, 14, 16)   3       1        
   fused_nn_pad_1                                       fused_nn_pad_1                                      19.284     0.133    (1, 12, 29, 29, 16)   1       1        
   fused_nn_contrib_conv2d_NCHWc_add_clip_42            fused_nn_contrib_conv2d_NCHWc_add_clip_4            18.807     0.13     (1, 12, 28, 28, 16)   3       1        
   fused_nn_contrib_conv2d_NCHWc_add_3                  fused_nn_contrib_conv2d_NCHWc_add_3                 17.728     0.122    (1, 4, 14, 14, 16)    3       1        
   fused_nn_pad_layout_transform                        fused_nn_pad_layout_transform                       15.683     0.108    (1, 1, 225, 225, 3)   1       1        
   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_3   fused_nn_contrib_depthwise_conv2d_NCHWc_add_clip_3  15.236     0.105    (1, 12, 14, 14, 16)   3       1        
   fused_layout_transform_transpose_nn_batch_flatten    fused_layout_transform_transpose_nn_batch_flatten   1.607      0.011    (1, 1280)             1       1        
   Total_time                                           -                                                   14519.449  -        -                     -       -        
   ```
   
   Here is a script to reproduce:
   ```
   import time
   import numpy as np
   import tvm
   from tvm import relay
   from tvm.contrib import graph_runtime
   import tensorflow as tf
   input_shape = (1, 3, 224, 224)
   model = tf.keras.applications.MobileNetV2()
   mod, params = relay.frontend.from_keras(model, shape={'input_1': input_shape})
   dtype = 'float32'
   with relay.build_config(opt_level=3):
     graph, lib, params = relay.build(mod, "llvm -mcpu=skylake-avx512", params=params)
   i_data = np.random.uniform(0, 1, input_shape).astype(dtype)
   mod = graph_runtime.create(graph, lib, ctx=tvm.cpu(0))
   mod.set_input(**params)
   # Time
   times = []
   for i in range(100):
       start_time = time.time()
       mod.run(input_1=i_data)
       res = mod.get_output(0)
       times.append(time.time() - start_time)
   print('Mean latency:', 1000.0 * np.mean(times[10:]))
   print('Mean FPS:', 1.0 / np.mean(times[10:]))
   ```
   
   Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] hzfan commented on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
hzfan commented on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-719368757


   @tqchen Yeah, sure. Perhaps I can start with figuring out why cast i32 is inserted.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] trevor-m commented on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
trevor-m commented on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-718051523


   Thanks @hzfan and @tqchen for the help!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen edited a comment on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-718038395


   This can be resolved by #6771. However, it will still be great to followup on this case of why cast i32 is inserted and we should work to simplify that case


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen edited a comment on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
tqchen edited a comment on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-718038395


   This can be resolved by #6771. thanks @hzfan .
   
   However, it will still be great to followup on this case of why cast i32 is inserted and we should work to simplify that case. It would be great if we can use the chance to dig into the root of the issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-718038395


   Fixed by #6771. However, it will still be great to followup on this case


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-709583351


   Thanks @trevor-m cc @hzfan  Given that those are constant shapes, we should expect NarrowDataTypes to narrow dtypes to i32, it would be great to look into the IR those kernels and see why the NarrowDataType does not produce the right optimization 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen commented on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
tqchen commented on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-719100370


   ping @hzfan can you please follow a bit further ? It is a great chance for us to improve the simplifcation and i64 flow


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] trevor-m commented on issue #6691: [Performance] Large performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
trevor-m commented on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-709569953


   FYI @kevinthesun @hzfan @zhiics @tqchen


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen closed issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
tqchen closed issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] hzfan commented on issue #6691: [Performance] Performance regression with int64 indices INDEX_DEFAULT_I64=ON (PR #6143)

Posted by GitBox <gi...@apache.org>.
hzfan commented on issue #6691:
URL: https://github.com/apache/incubator-tvm/issues/6691#issuecomment-709956681


   Sure. I will look into that. Thanks @trevor-m 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org