You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2020/11/17 04:31:21 UTC

[GitHub] [incubator-tvm] areusch opened a new pull request #6930: [µTVM] Fix problems with the debug flow

areusch opened a new pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930


   This commit makes an internal API change in the µTVM transport library and which allows for some fixes to the GdbDebugger class, improving if not unbreaking the debug flow. When timeouts were introduced to the µTVM transport, it was assumed that a timeout was always wanted, since a lack of timeout could break automated tests. However, in debugging, this is is not the case--in fact, no timeouts are wanted, save on physical hardware where a session start retry timeout can be useful.
   
   Overall effects of this change:
    - When tvm.micro.transport.Transport.read() or tvm.micro.transport.Transport.write() are called, timeout_sec can now be None. When None, implementations should block forever. Previously, 0 could have meant "block forever" (almost always) or "perform non-blocking I/O" (in edge cases on session start). Now, there is an unambiguous meaning that matches pyserial.
    - When using GdbDebugger, never terminate the debugger. Always block on the debugger until the user exits. This allows the user to finish debugging problems without stepping on them in the event that a host-side exception occurs.
    - Immediately restore Ctrl+C handler when the debugger quits. If the debugger cleanup flow hangs, allows the user to typically find the problem by Ctrl+C. Before, explicit kill was required.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] areusch commented on pull request #6930: [µTVM] Fix problems with the debug flow

Posted by GitBox <gi...@apache.org>.
areusch commented on pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930#issuecomment-729120351


   @tqchen @tom-gall @u99127 @manupa-arm 
   
   Note: this mostly impacts the x86 flow used with test_crt, but submitting this to make things not broken and more maintainable. One thing that may impact the zephyr flow is the "debugger not kill'd on error."


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] areusch commented on pull request #6930: [µTVM] Fix problems with the debug flow

Posted by GitBox <gi...@apache.org>.
areusch commented on pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930#issuecomment-729966666


   thanks @liangfu I found a few lines of bad merge (this was pulled out of the linked params PR #6917) but have re-tested locally with changes from your comments and it looks good now


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] liangfu commented on a change in pull request #6930: [µTVM] Fix problems with the debug flow

Posted by GitBox <gi...@apache.org>.
liangfu commented on a change in pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930#discussion_r526552825



##########
File path: python/tvm/micro/debugger.py
##########
@@ -82,23 +73,56 @@ def _stop_all(cls):
     def __init__(self):
         super(GdbDebugger, self).__init__()
         self._is_running = False
-        self._child_alive_lock = threading.RLock()
-        self._is_child_alive = False
+        self._is_running_lock = threading.RLock()
+        self._child_exited_event = threading.Event()
+        self._signals_reset_event = threading.Event()
 
     @abc.abstractmethod
     def popen_kwargs(self):
         raise NotImplementedError()
 
+    def _internal_stop(self):
+        if not self._is_running:
+            return
+
+        os.kill(os.getpid(), signal.SIGUSR1)
+        self._signals_reset_event.wait()
+        termios.tcsetattr(sys.stdin.fileno(), termios.TCSAFLUSH, self.old_termios)
+
+        try:
+            children = psutil.Process(self.popen.pid).children(recursive=True)
+            for c in children:
+                c.terminate()
+                _, alive = psutil.wait_procs(children, timeout=self._GRACEFUL_SHUTDOWN_TIMEOUT_SEC)
+                for a in alive:
+                    a.kill()
+        except psutil.NoSuchProcess:
+            pass
+        finally:
+            self.__class__._STARTED_INSTANCE = None
+            self._is_running = False
+            self._child_exited_event.set()
+
     def _wait_for_child(self):
         self.popen.wait()
-        with self._child_alive_lock:
-            self._is_child_alive = True
+        with self._is_running_lock:
+            self._internal_stop()
+
+    @classmethod
+    def _sigusr1_handler(cls, signum, stack_frame):  # pylint: disable=unused-argument
+        if cls._STARTED_INSTANCE is not None:

Review comment:
       the logic seems not inverted yet. missed something in the submit?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] liangfu commented on a change in pull request #6930: [µTVM] Fix problems with the debug flow

Posted by GitBox <gi...@apache.org>.
liangfu commented on a change in pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930#discussion_r526110578



##########
File path: python/tvm/micro/debugger.py
##########
@@ -82,23 +73,56 @@ def _stop_all(cls):
     def __init__(self):
         super(GdbDebugger, self).__init__()
         self._is_running = False
-        self._child_alive_lock = threading.RLock()
-        self._is_child_alive = False
+        self._is_running_lock = threading.RLock()
+        self._child_exited_event = threading.Event()
+        self._signals_reset_event = threading.Event()
 
     @abc.abstractmethod
     def popen_kwargs(self):
         raise NotImplementedError()
 
+    def _internal_stop(self):
+        if not self._is_running:
+            return
+
+        os.kill(os.getpid(), signal.SIGUSR1)
+        self._signals_reset_event.wait()
+        termios.tcsetattr(sys.stdin.fileno(), termios.TCSAFLUSH, self.old_termios)
+
+        try:
+            children = psutil.Process(self.popen.pid).children(recursive=True)
+            for c in children:
+                c.terminate()
+                _, alive = psutil.wait_procs(children, timeout=self._GRACEFUL_SHUTDOWN_TIMEOUT_SEC)
+                for a in alive:
+                    a.kill()
+        except psutil.NoSuchProcess:
+            pass
+        finally:
+            self.__class__._STARTED_INSTANCE = None
+            self._is_running = False
+            self._child_exited_event.set()
+
     def _wait_for_child(self):
         self.popen.wait()
-        with self._child_alive_lock:
-            self._is_child_alive = True
+        with self._is_running_lock:
+            self._internal_stop()
+
+    @classmethod
+    def _sigusr1_handler(cls, signum, stack_frame):  # pylint: disable=unused-argument
+        if cls._STARTED_INSTANCE is not None:

Review comment:
       Just curious, why not raise an error when STARTED_INSTANCE is None?

##########
File path: python/tvm/micro/debugger.py
##########
@@ -82,23 +73,56 @@ def _stop_all(cls):
     def __init__(self):
         super(GdbDebugger, self).__init__()
         self._is_running = False
-        self._child_alive_lock = threading.RLock()
-        self._is_child_alive = False
+        self._is_running_lock = threading.RLock()
+        self._child_exited_event = threading.Event()
+        self._signals_reset_event = threading.Event()
 
     @abc.abstractmethod
     def popen_kwargs(self):
         raise NotImplementedError()
 
+    def _internal_stop(self):
+        if not self._is_running:
+            return
+
+        os.kill(os.getpid(), signal.SIGUSR1)
+        self._signals_reset_event.wait()
+        termios.tcsetattr(sys.stdin.fileno(), termios.TCSAFLUSH, self.old_termios)
+
+        try:
+            children = psutil.Process(self.popen.pid).children(recursive=True)
+            for c in children:
+                c.terminate()
+                _, alive = psutil.wait_procs(children, timeout=self._GRACEFUL_SHUTDOWN_TIMEOUT_SEC)
+                for a in alive:
+                    a.kill()
+        except psutil.NoSuchProcess:
+            pass
+        finally:
+            self.__class__._STARTED_INSTANCE = None
+            self._is_running = False
+            self._child_exited_event.set()
+
     def _wait_for_child(self):
         self.popen.wait()
-        with self._child_alive_lock:
-            self._is_child_alive = True
+        with self._is_running_lock:
+            self._internal_stop()
+
+    @classmethod
+    def _sigusr1_handler(cls, signum, stack_frame):  # pylint: disable=unused-argument
+        if cls._STARTED_INSTANCE is not None:
+            signal.signal(signal.SIGINT, cls._STARTED_INSTANCE.old_sigint_handler)
+            signal.signal(signal.SIGUSR1, cls._STARTED_INSTANCE.old_sigusr1_handler)
+            cls._STARTED_INSTANCE._signals_reset_event.set()
+            return
+
+        raise Exception()

Review comment:
       Please leave some meaningful message if possible.

##########
File path: python/tvm/micro/debugger.py
##########
@@ -109,46 +133,32 @@ def _sigint_handler(cls, signum, stack_frame):  # pylint: disable=unused-argumen
         raise Exception()

Review comment:
       mind leave a message here as well?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] areusch commented on a change in pull request #6930: [µTVM] Fix problems with the debug flow

Posted by GitBox <gi...@apache.org>.
areusch commented on a change in pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930#discussion_r526431416



##########
File path: python/tvm/micro/debugger.py
##########
@@ -82,23 +73,56 @@ def _stop_all(cls):
     def __init__(self):
         super(GdbDebugger, self).__init__()
         self._is_running = False
-        self._child_alive_lock = threading.RLock()
-        self._is_child_alive = False
+        self._is_running_lock = threading.RLock()
+        self._child_exited_event = threading.Event()
+        self._signals_reset_event = threading.Event()
 
     @abc.abstractmethod
     def popen_kwargs(self):
         raise NotImplementedError()
 
+    def _internal_stop(self):
+        if not self._is_running:
+            return
+
+        os.kill(os.getpid(), signal.SIGUSR1)
+        self._signals_reset_event.wait()
+        termios.tcsetattr(sys.stdin.fileno(), termios.TCSAFLUSH, self.old_termios)
+
+        try:
+            children = psutil.Process(self.popen.pid).children(recursive=True)
+            for c in children:
+                c.terminate()
+                _, alive = psutil.wait_procs(children, timeout=self._GRACEFUL_SHUTDOWN_TIMEOUT_SEC)
+                for a in alive:
+                    a.kill()
+        except psutil.NoSuchProcess:
+            pass
+        finally:
+            self.__class__._STARTED_INSTANCE = None
+            self._is_running = False
+            self._child_exited_event.set()
+
     def _wait_for_child(self):
         self.popen.wait()
-        with self._child_alive_lock:
-            self._is_child_alive = True
+        with self._is_running_lock:
+            self._internal_stop()
+
+    @classmethod
+    def _sigusr1_handler(cls, signum, stack_frame):  # pylint: disable=unused-argument
+        if cls._STARTED_INSTANCE is not None:

Review comment:
       good call, i've inverted the logic and made the check stricter.

##########
File path: python/tvm/micro/debugger.py
##########
@@ -82,23 +73,56 @@ def _stop_all(cls):
     def __init__(self):
         super(GdbDebugger, self).__init__()
         self._is_running = False
-        self._child_alive_lock = threading.RLock()
-        self._is_child_alive = False
+        self._is_running_lock = threading.RLock()
+        self._child_exited_event = threading.Event()
+        self._signals_reset_event = threading.Event()
 
     @abc.abstractmethod
     def popen_kwargs(self):
         raise NotImplementedError()
 
+    def _internal_stop(self):
+        if not self._is_running:
+            return
+
+        os.kill(os.getpid(), signal.SIGUSR1)
+        self._signals_reset_event.wait()
+        termios.tcsetattr(sys.stdin.fileno(), termios.TCSAFLUSH, self.old_termios)
+
+        try:
+            children = psutil.Process(self.popen.pid).children(recursive=True)
+            for c in children:
+                c.terminate()
+                _, alive = psutil.wait_procs(children, timeout=self._GRACEFUL_SHUTDOWN_TIMEOUT_SEC)
+                for a in alive:
+                    a.kill()
+        except psutil.NoSuchProcess:
+            pass
+        finally:
+            self.__class__._STARTED_INSTANCE = None
+            self._is_running = False
+            self._child_exited_event.set()
+
     def _wait_for_child(self):
         self.popen.wait()
-        with self._child_alive_lock:
-            self._is_child_alive = True
+        with self._is_running_lock:
+            self._internal_stop()
+
+    @classmethod
+    def _sigusr1_handler(cls, signum, stack_frame):  # pylint: disable=unused-argument
+        if cls._STARTED_INSTANCE is not None:
+            signal.signal(signal.SIGINT, cls._STARTED_INSTANCE.old_sigint_handler)
+            signal.signal(signal.SIGUSR1, cls._STARTED_INSTANCE.old_sigusr1_handler)
+            cls._STARTED_INSTANCE._signals_reset_event.set()
+            return
+
+        raise Exception()

Review comment:
       done

##########
File path: python/tvm/micro/debugger.py
##########
@@ -109,46 +133,32 @@ def _sigint_handler(cls, signum, stack_frame):  # pylint: disable=unused-argumen
         raise Exception()

Review comment:
       done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] tqchen merged pull request #6930: [µTVM] Fix problems with the debug flow

Posted by GitBox <gi...@apache.org>.
tqchen merged pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [incubator-tvm] areusch commented on a change in pull request #6930: [µTVM] Fix problems with the debug flow

Posted by GitBox <gi...@apache.org>.
areusch commented on a change in pull request #6930:
URL: https://github.com/apache/incubator-tvm/pull/6930#discussion_r526572278



##########
File path: python/tvm/micro/debugger.py
##########
@@ -82,23 +73,56 @@ def _stop_all(cls):
     def __init__(self):
         super(GdbDebugger, self).__init__()
         self._is_running = False
-        self._child_alive_lock = threading.RLock()
-        self._is_child_alive = False
+        self._is_running_lock = threading.RLock()
+        self._child_exited_event = threading.Event()
+        self._signals_reset_event = threading.Event()
 
     @abc.abstractmethod
     def popen_kwargs(self):
         raise NotImplementedError()
 
+    def _internal_stop(self):
+        if not self._is_running:
+            return
+
+        os.kill(os.getpid(), signal.SIGUSR1)
+        self._signals_reset_event.wait()
+        termios.tcsetattr(sys.stdin.fileno(), termios.TCSAFLUSH, self.old_termios)
+
+        try:
+            children = psutil.Process(self.popen.pid).children(recursive=True)
+            for c in children:
+                c.terminate()
+                _, alive = psutil.wait_procs(children, timeout=self._GRACEFUL_SHUTDOWN_TIMEOUT_SEC)
+                for a in alive:
+                    a.kill()
+        except psutil.NoSuchProcess:
+            pass
+        finally:
+            self.__class__._STARTED_INSTANCE = None
+            self._is_running = False
+            self._child_exited_event.set()
+
     def _wait_for_child(self):
         self.popen.wait()
-        with self._child_alive_lock:
-            self._is_child_alive = True
+        with self._is_running_lock:
+            self._internal_stop()
+
+    @classmethod
+    def _sigusr1_handler(cls, signum, stack_frame):  # pylint: disable=unused-argument
+        if cls._STARTED_INSTANCE is not None:

Review comment:
       oooops good catch. actually fixed that now.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org