You are viewing a plain text version of this content. The canonical link for it is here.
Posted to discuss-archive@tvm.apache.org by Alan Nair via Apache TVM Discuss <no...@discuss.tvm.ai> on 2021/10/19 06:59:38 UTC

[Apache TVM Discuss] [Questions] Difference in profiler outputs


@tkonolige Thank you for responding.
I just want to find out the amount of time spent on data layout transformations while running inference on ResNet-50. profiler_vm seems to report a much lower inference cost (1) than debug_executor (2). Does this not contradict your statement that profiler_vm may be slower than graph executor? 
Also I ran benchmarking via `tvm.contrib.graph_executor`:
```
with autotvm.apply_graph_best(opt_sch_file):
    with tvm.transform.PassContext(opt_level=3):
                lib = relay.build_module.build(mod, target=target, params=params)
                # runtime is tvm.contrib.graph_executor
                module = runtime.GraphModule(lib["default"](dev))
                module.set_input("data", data)
                print("Evaluate inference time cost...")
                print(module.benchmark(dev, func_name="main", number=100, repeat=3, end_to_end=True))
```
The inference costs I get via this (3) is always close but lower than (1). Do you have any idea why this is so? 

The Outputs:
(1) [profiler_vm]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                                    Duration (us)  Percent   layout  Count  out_layout  Device  data_layout  kernel_layout              Hash                                                                                                                                                       Argument Shapes  src_layout  dst_layout  weight_layout  
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                38,648.93    14.19               5     NCHW16c    cpu0      NCHW64c     OIHW64i16o  5c16c122a657ba21                                                         float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                 31,069.39    11.41               4      NCHW8c    cpu0      NCHW16c      OIHW16i8o  f2c6de1cbe5c0ddb                                                            float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16, 1, 1, 8], float32[1, 16, 28, 28, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13                23,726.42     8.71               3      NCHW8c    cpu0       NCHW2c       OIHW2i8o  cb108aaf00eff9e2                                                              float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                18,153.16     6.66               5      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  e4cba4831bd46d2c                                                        float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                 15,697.88     5.76               2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b2d690588ecaac96                                                            float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_3                         14,098.72     5.18               4     NCHW16c    cpu0      NCHW16c     OIHW16i16o  84bec82add215ebe                                                     float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                 10,840.88     3.98               3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  d930aa7bf46c34e1                                                          float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_1                         10,638.57     3.91               3     NCHW16c    cpu0       NCHW8c      OIHW8i16o  6beba43d92784786                                                       float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu                    8,112.57     2.98               1      NCHW8c    cpu0       NCHW3c       OIHW3i8o  2f8575d36cac57f0                                                             float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 112, 112, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  7,847.28     2.88               1      NCHW8c    cpu0      NCHW16c      OIHW16i8o  7baee5c8a4d8e4ab                                                          float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  7,684.11     2.82               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  25fd1c3d9d4e561e                                                            float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add                            7,625.64     2.80               2     NCHW32c    cpu0      NCHW16c     OIHW16i32o  667036afd5deee1b                                                          float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                  7,622.32     2.80               2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  6e49d3c836077ac7                                                            float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
fused_nn_contrib_conv2d_NCHWc_2                              7,530.83     2.76               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b6e66601adaeb1e3                                                                                 float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1, 16, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_4                          7,305.51     2.68               2     NCHW16c    cpu0       NCHW4c      OIHW4i16o  d0d1536228842867                                                        float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_3                              7,303.69     2.68               1      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  493c374dd5e37c2b                                                                                 float32[1, 1, 14, 14, 1024], float32[256, 1, 1, 1, 1024, 8], float32[1, 256, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14                 7,199.44     2.64               2      NCHW8c    cpu0    NCHW2048c    OIHW2048i8o  af5e7bf563de2757                                                            float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
fused_nn_contrib_conv2d_NCHWc_1                              7,185.16     2.64               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  5e7a95757d65e24e                                                                                   float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1, 32, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                3,905.42     1.43               1     NCHW32c    cpu0      NCHW16c     OIHW16i32o  18ea4e7c768c292e                                 float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc                                3,776.76     1.39               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  7ff40af88acd710e                                                                                       float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1, 8, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       3,693.25     1.36               1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  a3a86603f87a1daa  float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              3,616.06     1.33               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  faa415ce8e443d42                             float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_2                          3,601.05     1.32               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  c3c48546ccd1c8e4                                                       float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2              3,509.62     1.29               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  237b36f60eadc660                           float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 2,119.10     0.78               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8d07031ff51d0737                                                         float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                  1,969.95     0.72               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8ec1781e87f7f62e                                                       float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                  1,869.16     0.69               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  39975a03990f0ed6                                                            float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                    920.43     0.34               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  ce29dd2da9289ac4                                                              float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1, 1, 32], float32[1, 2, 56, 56, 32]                                         
fused_add_nn_relu_layout_transform                             814.00     0.30               5                cpu0                              7590737f314ee1d9                                                                                     float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
fused_add_nn_relu                                              751.40     0.28               2                cpu0                              f6724216088f2bf7                                                                                         float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
fused_nn_contrib_dense_pack_add                                658.90     0.24               1                cpu0                              ced18cccebfa2ada                                                                                           float32[1, 2048], float32[125, 2048, 8], float32[1, 1000], float32[1, 1000]                                   NC8n  
fused_add_nn_relu_1                                            624.30     0.23               3                cpu0                              848825acfc73218b                                                                                      float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
fused_nn_max_pool2d_add_nn_relu                                378.72     0.14   NCHW8c      1                cpu0                              4883943910905d24                                                                                          float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 56, 56, 8]                                         
fused_layout_transform                                         173.49     0.06               5                cpu0                              0693edb3d97dc77f                                                                                                                  float32[1, 32, 14, 14, 8], float32[1, 4, 14, 14, 64]      NCHW8c     NCHW64c                 
fused_add_nn_relu_layout_transform_1                           172.54     0.06               2                cpu0                              468080b095af509a                                                                                       float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 1, 7, 7, 2048]     NCHW16c   NCHW2048c                 
fused_layout_transform_3                                       138.92     0.05               1                cpu0                              6dda5720a553f260                                                                                                               float32[1, 64, 14, 14, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
fused_add_layout_transform                                      90.72     0.03               1                cpu0                              69355d3cc810f874                                                                                                 float32[1, 3, 224, 224], float32[3, 1, 1], float32[1, 1, 224, 224, 3]        NCHW      NCHW3c                 
fused_nn_global_avg_pool2d                                      83.09     0.03  NCHW16c      1                cpu0                              f18307e2786f4cb3                                                                                                                  float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16]                                         
fused_layout_transform_4                                        79.73     0.03               1                cpu0                              aad3e266e27c5054                                                                                                                   float32[1, 256, 7, 7, 8], float32[1, 128, 7, 7, 16]      NCHW8c     NCHW16c                 
fused_layout_transform_2                                        50.75     0.02               3                cpu0                              bd0b0c2ae84f7e09                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 128, 7, 7, 4]      NCHW8c      NCHW4c                 
fused_layout_transform_5                                        39.88     0.01               2                cpu0                              69f132fa7e1d6749                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 256, 7, 7, 2]      NCHW8c      NCHW2c                 
fused_layout_transform_1                                        14.62     0.01               1                cpu0                              9bd937910d443787                                                                                                                    float32[1, 32, 7, 7, 16], float32[1, 256, 7, 7, 2]     NCHW16c      NCHW2c                 
fused_nn_softmax                                                 7.80     0.00               1                cpu0                              ca61e79ea24e53f0                                                                                                                                    float32[1, 1000], float32[1, 1000]                                         
fused_layout_transform_nn_batch_flatten                          1.41     0.00               1                cpu0                              2db99463d18696a4                                                                                                                           float32[1, 128, 1, 1, 16], float32[1, 2048]     NCHW16c        NCHW                 
----------                                                                                                                                                                                                                                                                                                                                                                     
Sum                                                       2,71,351.60    99.61              84                                                                                                                                                                                                                                                                                 
Total                                                     2,72,418.15                        1                cpu0                                                                                                                                                                                                                                                             
```

(2): [debug_executor]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Name                                                                   Duration (us)  Percent   layout  Count  out_layout  Device  data_layout  kernel_layout              Hash                                                                                                                                                       Argument Shapes  src_layout  dst_layout  weight_layout  
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_14              5,68,263.92    48.76               1      NCHW8c    cpu0       NCHW3c       OIHW3i8o  2f8575d36cac57f0                                                             float32[1, 1, 224, 224, 3], float32[8, 1, 7, 7, 3, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 112, 112, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_2           1,75,988.78    15.10               1     NCHW32c    cpu0      NCHW16c     OIHW16i32o  18ea4e7c768c292e                                 float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_10                82,241.79     7.06               2     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b2d690588ecaac96                                                            float32[1, 4, 56, 56, 16], float32[4, 4, 3, 3, 16, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc                               67,905.70     5.83               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  7ff40af88acd710e                                                                                       float32[1, 8, 56, 56, 8], float32[8, 8, 1, 1, 8, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_3                 39,639.91     3.40               5     NCHW16c    cpu0      NCHW64c     OIHW64i16o  5c16c122a657ba21                                                         float32[1, 4, 14, 14, 64], float32[16, 4, 3, 3, 64, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_7                 31,242.14     2.68               4      NCHW8c    cpu0      NCHW16c      OIHW16i8o  f2c6de1cbe5c0ddb                                                            float32[1, 8, 28, 28, 16], float32[16, 8, 3, 3, 16, 8], float32[1, 16, 1, 1, 8], float32[1, 16, 28, 28, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_4                         29,317.11     2.52               2     NCHW32c    cpu0      NCHW16c     OIHW16i32o  667036afd5deee1b                                                          float32[1, 4, 56, 56, 16], float32[8, 4, 1, 1, 16, 32], float32[1, 8, 56, 56, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu                   23,174.76     1.99               3      NCHW8c    cpu0       NCHW2c       OIHW2i8o  cb108aaf00eff9e2                                                              float32[1, 256, 7, 7, 2], float32[64, 256, 3, 3, 2, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_4                 18,815.10     1.61               5      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  e4cba4831bd46d2c                                                        float32[1, 1, 14, 14, 1024], float32[32, 1, 1, 1, 1024, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_1                         14,143.22     1.21               4     NCHW16c    cpu0      NCHW16c     OIHW16i16o  84bec82add215ebe                                                     float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_8                 10,807.49     0.93               3     NCHW16c    cpu0      NCHW16c     OIHW16i16o  d930aa7bf46c34e1                                                          float32[1, 32, 28, 28, 16], float32[8, 32, 1, 1, 16, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_3                         10,635.87     0.91               3     NCHW16c    cpu0       NCHW8c      OIHW8i16o  6beba43d92784786                                                       float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_13                 8,887.56     0.76               1     NCHW32c    cpu0       NCHW8c      OIHW8i32o  ce29dd2da9289ac4                                                              float32[1, 8, 56, 56, 8], float32[2, 8, 1, 1, 8, 32], float32[1, 2, 1, 1, 32], float32[1, 2, 56, 56, 32]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_11                 8,865.53     0.76               2     NCHW16c    cpu0      NCHW32c     OIHW32i16o  6e49d3c836077ac7                                                            float32[1, 8, 56, 56, 32], float32[4, 8, 1, 1, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_5                  8,017.28     0.69               1      NCHW8c    cpu0      NCHW16c      OIHW16i8o  7baee5c8a4d8e4ab                                                          float32[1, 16, 14, 14, 16], float32[32, 16, 3, 3, 16, 8], float32[1, 32, 1, 1, 8], float32[1, 32, 14, 14, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_12                 7,585.56     0.65               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  25fd1c3d9d4e561e                                                            float32[1, 2, 56, 56, 32], float32[4, 2, 3, 3, 32, 16], float32[1, 4, 1, 1, 16], float32[1, 4, 56, 56, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_2                              7,442.40     0.64               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  b6e66601adaeb1e3                                                                                 float32[1, 32, 28, 28, 16], float32[64, 32, 1, 1, 16, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_1                  7,293.48     0.63               2      NCHW8c    cpu0    NCHW2048c    OIHW2048i8o  af5e7bf563de2757                                                            float32[1, 1, 7, 7, 2048], float32[64, 1, 1, 1, 2048, 8], float32[1, 64, 1, 1, 8], float32[1, 64, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_3                              7,140.03     0.61               1      NCHW8c    cpu0    NCHW1024c    OIHW1024i8o  493c374dd5e37c2b                                                                                 float32[1, 1, 14, 14, 1024], float32[256, 1, 1, 1, 1024, 8], float32[1, 256, 7, 7, 8]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_1                              7,041.60     0.60               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  5e7a95757d65e24e                                                                                   float32[1, 8, 56, 56, 32], float32[32, 8, 1, 1, 32, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add                            6,836.18     0.59               2     NCHW16c    cpu0       NCHW4c      OIHW4i16o  d0d1536228842867                                                        float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_2                          3,727.41     0.32               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  c3c48546ccd1c8e4                                                       float32[1, 32, 14, 14, 8], float32[64, 32, 1, 1, 8, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu_1              3,596.31     0.31               1     NCHW16c    cpu0       NCHW8c      OIHW8i16o  faa415ce8e443d42                             float32[1, 16, 28, 28, 8], float32[32, 16, 1, 1, 8, 16], float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_multiply_add_nn_relu       3,468.59     0.30               1     NCHW16c    cpu0       NCHW4c      OIHW4i16o  a3a86603f87a1daa  float32[1, 128, 7, 7, 4], float32[128, 128, 1, 1, 4, 16], float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 1, 1, 16], float32[1, 128, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_add_nn_relu                3,440.23     0.30               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  237b36f60eadc660                           float32[1, 16, 14, 14, 16], float32[64, 16, 1, 1, 16, 16], float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 64, 14, 14, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_9                  3,144.19     0.27               1     NCHW16c    cpu0      NCHW32c     OIHW32i16o  39975a03990f0ed6                                                            float32[1, 8, 56, 56, 32], float32[8, 8, 1, 1, 32, 16], float32[1, 8, 1, 1, 16], float32[1, 8, 28, 28, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_2                  1,997.84     0.17               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8d07031ff51d0737                                                         float32[1, 64, 14, 14, 16], float32[32, 64, 1, 1, 16, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 7, 7, 16]                                         
tvmgen_default_fused_nn_contrib_conv2d_NCHWc_add_nn_relu_6                  1,783.56     0.15               1     NCHW16c    cpu0      NCHW16c     OIHW16i16o  8ec1781e87f7f62e                                                       float32[1, 32, 28, 28, 16], float32[16, 32, 1, 1, 16, 16], float32[1, 16, 1, 1, 16], float32[1, 16, 14, 14, 16]                                         
tvmgen_default_fused_add_nn_relu_1                                            473.00     0.04               2                cpu0                              f6724216088f2bf7                                                                                         float32[1, 8, 56, 56, 32], float32[1, 8, 1, 1, 32], float32[1, 8, 56, 56, 32]                                         
tvmgen_default_fused_add_nn_relu                                              338.92     0.03               3                cpu0                              848825acfc73218b                                                                                      float32[1, 32, 28, 28, 16], float32[1, 32, 1, 1, 16], float32[1, 32, 28, 28, 16]                                         
tvmgen_default_fused_add_nn_relu_layout_transform_1                           286.62     0.02               5                cpu0                              7590737f314ee1d9                                                                                     float32[1, 64, 14, 14, 16], float32[1, 64, 1, 1, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
tvmgen_default_fused_nn_contrib_dense_pack_add                                265.74     0.02               1                cpu0                              ced18cccebfa2ada                                                                                           float32[1, 2048], float32[125, 2048, 8], float32[1, 1000], float32[1, 1000]                                   NC8n  
tvmgen_default_fused_nn_max_pool2d_add_nn_relu                                251.56     0.02   NCHW8c      1                cpu0                              4883943910905d24                                                                                          float32[1, 8, 112, 112, 8], float32[1, 8, 1, 1, 8], float32[1, 8, 56, 56, 8]                                         
tvmgen_default_fused_layout_transform_3                                       132.62     0.01               5                cpu0                              0693edb3d97dc77f                                                                                                                  float32[1, 32, 14, 14, 8], float32[1, 4, 14, 14, 64]      NCHW8c     NCHW64c                 
tvmgen_default_fused_nn_global_avg_pool2d                                      69.42     0.01  NCHW16c      1                cpu0                              f18307e2786f4cb3                                                                                                                  float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16]                                         
tvmgen_default_fused_layout_transform_4                                        60.92     0.01               1                cpu0                              aad3e266e27c5054                                                                                                                   float32[1, 256, 7, 7, 8], float32[1, 128, 7, 7, 16]      NCHW8c     NCHW16c                 
tvmgen_default_fused_layout_transform_5                                        58.94     0.01               1                cpu0                              6dda5720a553f260                                                                                                               float32[1, 64, 14, 14, 16], float32[1, 1, 14, 14, 1024]     NCHW16c   NCHW1024c                 
tvmgen_default_fused_add_layout_transform                                      56.01     0.00               1                cpu0                              69355d3cc810f874                                                                                                 float32[1, 3, 224, 224], float32[3, 1, 1], float32[1, 1, 224, 224, 3]        NCHW      NCHW3c                 
tvmgen_default_fused_add_nn_relu_layout_transform                              54.40     0.00               2                cpu0                              468080b095af509a                                                                                       float32[1, 128, 7, 7, 16], float32[1, 128, 1, 1, 16], float32[1, 1, 7, 7, 2048]     NCHW16c   NCHW2048c                 
tvmgen_default_fused_layout_transform_1                                        42.67     0.00               2                cpu0                              69f132fa7e1d6749                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 256, 7, 7, 2]      NCHW8c      NCHW2c                 
tvmgen_default_fused_layout_transform                                          33.90     0.00               3                cpu0                              bd0b0c2ae84f7e09                                                                                                                     float32[1, 64, 7, 7, 8], float32[1, 128, 7, 7, 4]      NCHW8c      NCHW4c                 
tvmgen_default_fused_layout_transform_2                                        19.34     0.00               1                cpu0                              9bd937910d443787                                                                                                                    float32[1, 32, 7, 7, 16], float32[1, 256, 7, 7, 2]     NCHW16c      NCHW2c                 
tvmgen_default_fused_nn_softmax                                                 7.03     0.00               1                cpu0                              ca61e79ea24e53f0                                                                                                                                    float32[1, 1000], float32[1, 1000]                                         
tvmgen_default_fused_layout_transform_nn_batch_flatten                          0.96     0.00               1                cpu0                              2db99463d18696a4                                                                                                                           float32[1, 128, 1, 1, 16], float32[1, 2048]     NCHW16c        NCHW                 
----------                                                                                                                                                                                                                                                                                                                                                                                    
Sum                                                                     11,64,595.59    99.94              84                                                                                                                                                                                                                                                                                 
Total                                                                   11,65,326.68                        1                cpu0                                                                                  
```

(3) [benchmark]
```
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_nopack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
Config for target=llvm -keys=cpu -link-params=0, workload=('dense_pack.x86', ('TENSOR', (1, 2048), 'float32'), ('TENSOR', (1000, 2048), 'float32'), None, 'float32') is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
Evaluate inference time cost...
Execution time summary:
 mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  
  269.9458     270.0297     270.0697     269.7381      0.1478   
```





---
[Visit Topic](https://discuss.tvm.apache.org/t/difference-in-profiler-outputs/11255/3) to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/3be1cc7d891dd0827cc7db9c0c190811ef820a2f81f495b58a315fd38b305cca).