You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/08/05 22:16:58 UTC

[GitHub] [tvm] slyubomirsky opened a new issue, #12330: [Bug][Metaschedule] Tuning trial hanging after

slyubomirsky opened a new issue, #12330:
URL: https://github.com/apache/tvm/issues/12330

   I encountered this when trying to run [this script](https://github.com/tlc-pack/relax/blob/relax/apps/relax_examples/e2e_auto_tir.py) over RPC on machines with v100's. Though it was done using Relax, @zxybazh says he thinks this can probably be triggered on mainline as well.
   
   I ran ResNet-50 on V100 with an input shape of (1, 3, 224, 224), using 5 tuning trials. The tuning task began started hanging on the first tuning task, `fused_conv2d_add_relu`. It appeared that there were failures encountered during the task.
   
   Output from the host:
   ```
     input_name: input0
     input_shape: [1, 3, 224, 224]
     input_dtype: float32
   /home/ubuntu/tvm-runtime/python/tvm/driver/build_module.py:267: UserWarning: target_host parameter is going to be deprecated. Please pass in tvm.target.Target(target, host=target_host) instead.
     warnings.warn(
   INFO:tvm.meta_schedule.runner.rpc_runner:RPCRunner: max_workers = 2
   INFO:tvm.meta_schedule.tune:Working directory: /home/ubuntu/dump/
   2022-08-05 12:13:55.897 INFO Logging directory: /home/ubuntu/dump/logs
   2022-08-05 12:13:55.897 INFO Working directory: /home/ubuntu/dump/
   2022-08-05 12:13:55.898 INFO Creating JSONDatabase. Workload at: /home/ubuntu/dump/database_workload.json. Tuning records at: /home/ubuntu/dump/database_tuning_record.json
   2022-08-05 12:13:56.063 INFO LocalBuilder: max_workers = 24
   2022-08-05 12:13:56.388 INFO Initializing Task #0: "layout_transform"
   2022-08-05 12:13:56.459 INFO Initializing Task #1: "fused_conv2d_add_relu"
   2022-08-05 12:13:56.726 INFO Initializing Task #2: "max_pool2d"
   2022-08-05 12:13:56.866 INFO Initializing Task #3: "fused_conv2d1_add1_relu1"
   2022-08-05 12:13:57.114 INFO Initializing Task #4: "fused_contrib_conv2d_winograd_without_weight_transform_add1_relu1"
   2022-08-05 12:13:58.024 INFO Initializing Task #5: "fused_conv2d2_add2"
   2022-08-05 12:13:58.231 INFO Initializing Task #6: "fused_conv2d2_add2_add3_relu2"
   2022-08-05 12:13:58.532 INFO Initializing Task #7: "fused_conv2d3_add1_relu1"
   2022-08-05 12:13:58.784 INFO Initializing Task #8: "fused_conv2d4_add4_relu3"
   2022-08-05 12:13:59.033 INFO Initializing Task #9: "fused_conv2d5_add5_relu4"
   2022-08-05 12:13:59.301 INFO Initializing Task #10: "fused_conv2d7_add6"
   2022-08-05 12:13:59.518 INFO Initializing Task #11: "fused_conv2d6_add6_add7_relu5"
   2022-08-05 12:13:59.823 INFO Initializing Task #12: "fused_conv2d8_add5_relu4"
   2022-08-05 12:14:00.077 INFO Initializing Task #13: "fused_contrib_conv2d_winograd_without_weight_transform1_add5_relu4"
   2022-08-05 12:14:00.771 INFO Initializing Task #14: "fused_conv2d9_add8_relu6"
   2022-08-05 12:14:01.022 INFO Initializing Task #15: "fused_conv2d10_add9_relu7"
   2022-08-05 12:14:01.290 INFO Initializing Task #16: "fused_conv2d12_add10"
   2022-08-05 12:14:01.504 INFO Initializing Task #17: "fused_conv2d11_add10_add11_relu8"
   2022-08-05 12:14:01.806 INFO Initializing Task #18: "fused_conv2d13_add9_relu7"
   2022-08-05 12:14:02.057 INFO Initializing Task #19: "fused_contrib_conv2d_winograd_without_weight_transform2_add9_relu7"
   2022-08-05 12:14:02.753 INFO Initializing Task #20: "fused_conv2d14_add12_relu9"
   2022-08-05 12:14:03.003 INFO Initializing Task #21: "fused_conv2d15_add13_relu10"
   2022-08-05 12:14:03.272 INFO Initializing Task #22: "fused_conv2d17_add14"
   2022-08-05 12:14:03.486 INFO Initializing Task #23: "fused_conv2d16_add14_add15_relu11"
   2022-08-05 12:14:03.788 INFO Initializing Task #24: "fused_conv2d18_add13_relu10"
   2022-08-05 12:14:04.039 INFO Initializing Task #25: "fused_contrib_conv2d_winograd_without_weight_transform3_add13_relu10"
   2022-08-05 12:14:04.739 INFO Initializing Task #26: "adaptive_avg_pool2d"
   2022-08-05 12:14:04.865 INFO Initializing Task #27: "fused_layout_transform1_reshape_squeeze"
   2022-08-05 12:14:05.006 INFO Initializing Task #28: "fused_dense_add16"
   2022-08-05 12:14:05.113 INFO 
    ID |                                                                 Name |      FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Terminated 
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
     0 |                                                     layout_transform |         1 |      1 |            N/A |          N/A |                   N/A |      0 |            
     1 |                                                fused_conv2d_add_relu | 237633536 |      1 |            N/A |          N/A |                   N/A |      0 |            
     2 |                                                           max_pool2d |   1806336 |      1 |            N/A |          N/A |                   N/A |      0 |            
     3 |                                             fused_conv2d1_add1_relu1 |  26091520 |      1 |            N/A |          N/A |                   N/A |      0 |            
     4 |    fused_contrib_conv2d_winograd_without_weight_transform_add1_relu1 | 128651264 |      3 |            N/A |          N/A |                   N/A |      0 |            
     5 |                                                   fused_conv2d2_add2 | 103563264 |      1 |            N/A |          N/A |                   N/A |      0 |            
     6 |                                        fused_conv2d2_add2_add3_relu2 | 105168896 |      3 |            N/A |          N/A |                   N/A |      0 |            
     7 |                                             fused_conv2d3_add1_relu1 | 103161856 |      2 |            N/A |          N/A |                   N/A |      0 |            
     8 |                                             fused_conv2d4_add4_relu3 | 206323712 |      1 |            N/A |          N/A |                   N/A |      0 |            
     9 |                                             fused_conv2d5_add5_relu4 | 231411712 |      1 |            N/A |          N/A |                   N/A |      0 |            
    10 |                                                   fused_conv2d7_add6 | 205922304 |      1 |            N/A |          N/A |                   N/A |      0 |            
    11 |                                        fused_conv2d6_add6_add7_relu5 | 103964672 |      4 |            N/A |          N/A |                   N/A |      0 |            
    12 |                                             fused_conv2d8_add5_relu4 | 102961152 |      3 |            N/A |          N/A |                   N/A |      0 |            
    13 |   fused_contrib_conv2d_winograd_without_weight_transform1_add5_relu4 | 127045632 |      3 |            N/A |          N/A |                   N/A |      0 |            
    14 |                                             fused_conv2d9_add8_relu6 | 205922304 |      1 |            N/A |          N/A |                   N/A |      0 |            
    15 |                                            fused_conv2d10_add9_relu7 | 231311360 |      1 |            N/A |          N/A |                   N/A |      0 |            
    16 |                                                 fused_conv2d12_add10 | 205721600 |      1 |            N/A |          N/A |                   N/A |      0 |            
    17 |                                     fused_conv2d11_add10_add11_relu8 | 103362560 |      6 |            N/A |          N/A |                   N/A |      0 |            
    18 |                                            fused_conv2d13_add9_relu7 | 102860800 |      5 |            N/A |          N/A |                   N/A |      0 |            
    19 |   fused_contrib_conv2d_winograd_without_weight_transform2_add9_relu7 | 114903040 |      5 |            N/A |          N/A |                   N/A |      0 |            
    20 |                                           fused_conv2d14_add12_relu9 | 205721600 |      1 |            N/A |          N/A |                   N/A |      0 |            
    21 |                                          fused_conv2d15_add13_relu10 | 231261184 |      1 |            N/A |          N/A |                   N/A |      0 |            
    22 |                                                 fused_conv2d17_add14 | 205621248 |      1 |            N/A |          N/A |                   N/A |      0 |            
    23 |                                    fused_conv2d16_add14_add15_relu11 | 103061504 |      3 |            N/A |          N/A |                   N/A |      0 |            
    24 |                                          fused_conv2d18_add13_relu10 | 102810624 |      2 |            N/A |          N/A |                   N/A |      0 |            
    25 | fused_contrib_conv2d_winograd_without_weight_transform3_add13_relu10 | 142132224 |      2 |            N/A |          N/A |                   N/A |      0 |            
    26 |                                                  adaptive_avg_pool2d |    102400 |      1 |            N/A |          N/A |                   N/A |      0 |            
    27 |                              fused_layout_transform1_reshape_squeeze |         1 |      1 |            N/A |          N/A |                   N/A |      0 |            
    28 |                                                    fused_dense_add16 |   4097000 |      1 |            N/A |          N/A |                   N/A |      0 |            
   ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   Total trials: 0
   Total latency (us): 0
   
   2022-08-05 12:14:05.114 INFO Scheduler picks Task #0: "layout_transform"
   2022-08-05 12:14:06.380 INFO Sending 6 sample(s) to builder
   2022-08-05 12:14:06.713 INFO Sending 6 sample(s) to runner
   2022-08-05 12:14:06.713 INFO Scheduler picks Task #1: "fused_conv2d_add_relu"
   ```
   
   The tail of the long of task 1 (excerpted, as it goes on for a long time):
   ```
   [etc]
   2022-08-05 12:36:14.188 INFO Sample-Init-Population summary:
   Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
   Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
   Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
   Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
   Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
   Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1685504 failure(s)
   2022-08-05 12:36:15.803 INFO Sample-Init-Population summary:
   Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
   Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
   Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
   Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
   Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
   Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1687552 failure(s)
   2022-08-05 12:36:17.411 INFO Sample-Init-Population summary:
   Postproc #0 [meta_schedule.DisallowDynamicLoop(0x423d1d8)]: 0 failure(s)
   Postproc #1 [meta_schedule.RewriteCooperativeFetch(0xf2ad228)]: 0 failure(s)
   Postproc #2 [meta_schedule.RewriteUnboundBlock(0xf2ad258)]: 0 failure(s)
   Postproc #3 [meta_schedule.RewriteParallelVectorizeUnroll(0x3d6c8e8)]: 0 failure(s)
   Postproc #4 [meta_schedule.RewriteReductionBlock(0x4d449e8)]: 0 failure(s)
   Postproc #5 [meta_schedule.VerifyGPUCode(0x45933f8)]: 1689600 failure(s)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] junrushao commented on issue #12330: [Bug][Metaschedule] Tuning trial hanging after one task

Posted by GitBox <gi...@apache.org>.
junrushao commented on issue #12330:
URL: https://github.com/apache/tvm/issues/12330#issuecomment-1284350529

   the number of failures in the stats is quite abnormal. is the task `fused_conv2d_add_relu`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] zxybazh commented on issue #12330: [Bug][Metaschedule] Tuning trial hanging after one task

Posted by GitBox <gi...@apache.org>.
zxybazh commented on issue #12330:
URL: https://github.com/apache/tvm/issues/12330#issuecomment-1284485321

   Yes, I think the task is `fused_conv2d_add_relu`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] junrushao closed issue #12330: [Bug][Metaschedule] Tuning trial hanging after one task

Posted by GitBox <gi...@apache.org>.
junrushao closed issue #12330: [Bug][Metaschedule] Tuning trial hanging after one task
URL: https://github.com/apache/tvm/issues/12330


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org