You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/11/16 20:32:53 UTC

[GitHub] [incubator-tvm] TaylorZowtuk edited a comment on pull request #6909: Make AutoScheduler handling of errors during measure consistent with AutoTvm

TaylorZowtuk edited a comment on pull request #6909:
URL: https://github.com/apache/incubator-tvm/pull/6909#issuecomment-728307739


   > How do you hit this part of the code? Generally, it means you have some fatal errors in the code.
   > It is very rare to recover from a case where you have so many continuous errors.
   
   I'm not entirely certain what causes us to hit this condition. In our case, we observed from the AutoTvm debug prints that it was due to error_no=4 which is a RUNTIME_DEVICE error (as you can see from the except of AutoTvm log I included previously). Hitting this condition happened very intermittently. We could run a particular op/shape one time and hit the condition and without changing anything it would work the next. In addition, having one op/shape reach this condition didnt mean the rest of our op/shapes that we were running in the same script would fail meaning the system overall was able to recover. I think the main issue is that by terminating the program as soon as we meet this condition we dont allow for the chance to recover and additionally, we wont be getting this useful precise feedback about what error we are hitting while using the auto_scheduler.
   
   Ill do the rebasing and try to fix the CI issue.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org