You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/08/22 13:47:28 UTC

[GitHub] [tvm] guberti opened a new issue, #12538: [Bug] [microTVM] time_evaluator occasionally gives incorrect values

guberti opened a new issue, #12538:
URL: https://github.com/apache/tvm/issues/12538

   ### Expected behavior
   
   I was recently trying to time a model using Zephyr and TVM's host-driven AOT capabilities, with code that looked like the following:
   ```python
   result = aot_executor.module.time_evaluator(
       "run", session.device, number=runs_per_sample
   )()
   ```
   
   The model should take ~30 ms per inference, or 0.03 seconds. After averaging across ~500 runs, however, I discovered the **average** reported runtime was instead `288,230.30875` seconds.
   
   ### Actual behavior
   
   After looking into the issue, I discovered this was caused by occasional, very large reported values. For example, here is a list of runtimes of the model on ten different input samples:
   
   ```
   0.030411575
   0.030411575
   0.030411462
   230584247.26468965
   0.030411475
   0.030411575
   0.030411575
   0.030411575
   0.030411475
   0.030411575
   ```
   
   After some debugging, I've confirmed this issue happens on multiple models and multiple types of models, but seemingly only on the Zephyr platform. With the model above, the issue seems to occur on the order of every ~1/2000 runs, but it might occur more frequently for other models with longer runtimes.
   
   The issue seems to occur randomly, and is not triggered by any particular model input. In real time, it seems like the anomalous model runs take about the same amount of time as the others, and they certainly do not take anywhere near `230584247.26468965`. Also, the model's predictions on these anomalous runs seems to be correct.
   
   I suspect the issue is with Zephyr's implementation of `TVMPlatformTimerStop`. 
   
   ### Environment
   
   This issue occurs with the current build of TVM on `main` (I used e9aad35). For a microcontroller, I used Zephyr with the Nucleo-L4R5ZI board.
   
   Thoughts @mehrdadh @areusch @alanmacd?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] areusch commented on issue #12538: [Bug] [microTVM] time_evaluator occasionally gives incorrect values

Posted by GitBox <gi...@apache.org>.
areusch commented on issue #12538:
URL: https://github.com/apache/tvm/issues/12538#issuecomment-1224897113

   iirc we made some cleanups to TVMPlatformTimerStop() a few months ago. i thought it was due to updating to Zephyr 2.7; i wonder if that introduced this regression, or if it was here all along?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [tvm] guberti commented on issue #12538: [Bug] [microTVM] time_evaluator occasionally gives incorrect values

Posted by GitBox <gi...@apache.org>.
guberti commented on issue #12538:
URL: https://github.com/apache/tvm/issues/12538#issuecomment-1225263643

   I would not be surprised if my changes to `TVMPlatformTimerStop()` caused this issue - definitely worth investigating. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org