You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2022/04/04 17:27:25 UTC

[GitHub] [tvm] cconvey commented on pull request #10765: [runtime-rpc] Detect time_evaluator infinite loop

cconvey commented on PR #10765:
URL: https://github.com/apache/tvm/pull/10765#issuecomment-1087821766

   > I'm not sure I agree with throwing an error as the correct thing to do. Seems like if the runtime is zero nanoseconds then we should report zero nanoseconds.
   ...
   > And if the runtime is not zero, then we should increase number by a fixed amount a couple of times to see if we can start getting realistic results.
   
   I think this proposal and the PR's current implementation have the same issue: They make assumptions about the distribution of the  _actual_ running times of the target function, and build logic around that assumption.  Both run afoul of the "Seems like if the runtime is zero nanoseconds then we should report zero nanoseconds" critique.
   
   I have a few different ideas for addressing these concerns.
   
   Idea 1: Add a CI test that runs on every supported platform, and confirms that putting a timer around a `sleep(1)` call returns a non-zero duration.  Would have caught this Hexagon issue, but I have no idea if this will catch the kinds of timer issues that will arise on future target platforms.
   
   Idea 2: Push the bogus-value sniffing into the `DefaultTimerNode` [implementation](https://github.com/apache/tvm/blob/6babb89cbb9fc5ab718f8b996c7ce60bf5ebbefd/src/runtime/profiling.cc#L43-L52). 
   - For example, it could throw an exception if `std::chrono::high_resolution_clock::now()` ever returned 0.
   - It's still a heuristic, but at least it's a platform-specific heuristic, which is a little more reasonable.
   
   Idea 3: Modify the `time_evaluator` contract/API so the caller can also specify a total limit on the number of loop iterations.
   - This would have prevented the infinite loops when Hexagon was broken.
   - But it seems strange to add it to the `time_evaluator` API if the only justification is that we don't trust our implementations.
   
   Thoughts?  (Personally, I like Idea 1 or Idea 2 the best.)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tvm.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org