You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tvm.apache.org by GitBox <gi...@apache.org> on 2021/04/30 16:15:34 UTC

[GitHub] [tvm] areusch commented on a change in pull request #7919: [RUNTIME] Improve signal handling in python env.

areusch commented on a change in pull request #7919:
URL: https://github.com/apache/tvm/pull/7919#discussion_r623985826



##########
File path: src/runtime/registry.cc
##########
@@ -102,6 +103,89 @@ std::vector<std::string> Registry::ListNames() {
   return keys;
 }
 
+/*!
+ * \brief Execution environment specific API registry.
+ *
+ *  This registry stores C API function pointers about
+ *  execution environment(e.g. python) specific API function that
+ *  we need for specific low-level handling(e.g. signal checking).
+ *
+ *  We only stores the C API function when absolutely necessary (e.g. when signal handler
+ *  cannot trap back into python). Always consider use the PackedFunc FFI when possible
+ *  in other cases.
+ */
+class EnvCAPIRegistry {
+ public:
+  /*!
+   * \brief Callback to check if signals have been sent to the process and
+   *        if so invoke the registered signal handler in the frontend environment.
+   *
+   *  When runnning TVM in another langugage(python), the signal handler
+   *  may not be immediately executed, but instead the signal is marked
+   *  in the interpreter state(to ensure non-blocking of the signal handler).
+   *
+   * \return 0 if no error happens, -1 if error happens.
+   */
+  typedef int (*F_PyErr_CheckSignals)();
+
+  /*!
+   * \brief Clear the error indicator.
+   */
+  typedef void (*F_PyErr_Clear)();
+
+  // NOTE: the following function are only registered
+  // in a python environment.
+  /*!
+   * \brief PyErr_CheckSignal function
+   */
+  F_PyErr_CheckSignals pyerr_check_signals = nullptr;
+  /*!
+   * \brief PyErrClear function
+   */
+  F_PyErr_Clear pyerr_clear = nullptr;
+
+  static EnvCAPIRegistry* Global() {
+    static EnvCAPIRegistry* inst = new EnvCAPIRegistry();
+    return inst;
+  }
+
+  // register environment(e.g. python) specific api functions
+  void Register(const std::string& symbol_name, void* fptr) {
+    if (symbol_name == "PyErr_CheckSignals") {
+      Update(symbol_name, &pyerr_check_signals, fptr);
+    } else if (symbol_name == "PyErr_Clear") {
+      Update(symbol_name, &pyerr_clear, fptr);
+    } else {
+      LOG(FATAL) << "Unknown env API " << symbol_name;
+    }
+  }
+
+  // implementation of tvm::runtime::EnvCheckSignals
+  void CheckSignals() {
+    // check python signal to see if there are exception raised
+    if (pyerr_check_signals != nullptr && (*pyerr_check_signals)() != 0) {
+      ICHECK_NOTNULL(pyerr_clear);
+      // clear the error so we can throw a new error
+      (*pyerr_clear)();
+      // Raise the error
+      LOG(FATAL) << "KeyboardInterrupt: Signal received";

Review comment:
       let's raise something besides KeyboardInterrupt. How about TVMSignalReceived or something

##########
File path: include/tvm/runtime/c_backend_api.h
##########
@@ -99,6 +100,19 @@ TVM_DLL void* TVMBackendAllocWorkspace(int device_type, int device_id, uint64_t
  */
 TVM_DLL int TVMBackendFreeWorkspace(int device_type, int device_id, void* ptr);
 
+/*!
+ * \brief Backend function to register execution environment(e.g. python)
+ *        specific C APIs.
+ *
+ * \note  We only register the C API function when absolutely necessary (e.g. when signal handler
+ *  cannot trap back into python). In most cases we should use the PackedFunc FFI.
+ *
+ * \param name The name of the symbol
+ * \param ptr The symbol address.
+ * \return 0 when no error is thrown, -1 when failure happens
+ */
+TVM_DLL int TVMBackendRegisterEnvCAPI(const char* name, void* ptr);

Review comment:
       what's the motivation for doing this with a string name rather than just a dedicated API to handling signals? from a documentation perspective, what values could `name` take on, where are those documented, and when should you call this function? what happens if you don't register these?
   
   all of these questions make me think that maybe since this is quite an OS-specific thing, there's no need to consolidate similar things behind a single API call until we have multiple. On the other hand, I understand how this does avoid backwards-compatibility issues, and deprecating `.so` libraries does seem hard.

##########
File path: include/tvm/runtime/registry.h
##########
@@ -52,6 +52,45 @@
 namespace tvm {
 namespace runtime {
 
+/*!
+ * \brief Check if signals have been sent to the process and if so
+ *  invoke the registered signal handler in the frontend environment.
+ *
+ *  When runnning TVM in another langugage(python), the signal handler
+ *  may not be immediately executed, but instead the signal is marked
+ *  in the interpreter state(to ensure non-blocking of the signal handler).
+ *
+ *  This function can be explicitly invoked to check the cached signal
+ *  and run the related processing if a signal is marked.
+ *
+ *  Invoke this function periodically in a long running C++ function
+ *  to check if KeyboardInterrupt happens in a python execution environment.
+ *
+ *  Not inserting this function will not cause any correctness
+ *  issue, but will delay the KeyboardInterrupt until the function returns

Review comment:
       "but will delay invoking the Python-side signal handler until the function returns to the Python side. This means that the effect of e.g. pressing Ctrl+C or sending signals the process will be delayed until function return. When a C function is blocked on a syscall such as accept(), it may become impossible to interrupt TVM outside of using SIGKILL. So this function is not needed in most API functions, which can finish quickly in a reasonable, deterministic amount of time"

##########
File path: src/runtime/registry.cc
##########
@@ -102,6 +103,89 @@ std::vector<std::string> Registry::ListNames() {
   return keys;
 }
 
+/*!
+ * \brief Execution environment specific API registry.
+ *
+ *  This registry stores C API function pointers about
+ *  execution environment(e.g. python) specific API function that
+ *  we need for specific low-level handling(e.g. signal checking).
+ *
+ *  We only stores the C API function when absolutely necessary (e.g. when signal handler
+ *  cannot trap back into python). Always consider use the PackedFunc FFI when possible
+ *  in other cases.
+ */
+class EnvCAPIRegistry {
+ public:
+  /*!
+   * \brief Callback to check if signals have been sent to the process and
+   *        if so invoke the registered signal handler in the frontend environment.
+   *
+   *  When runnning TVM in another langugage(python), the signal handler
+   *  may not be immediately executed, but instead the signal is marked
+   *  in the interpreter state(to ensure non-blocking of the signal handler).
+   *
+   * \return 0 if no error happens, -1 if error happens.
+   */
+  typedef int (*F_PyErr_CheckSignals)();
+
+  /*!
+   * \brief Clear the error indicator.
+   */
+  typedef void (*F_PyErr_Clear)();
+
+  // NOTE: the following function are only registered
+  // in a python environment.
+  /*!
+   * \brief PyErr_CheckSignal function
+   */
+  F_PyErr_CheckSignals pyerr_check_signals = nullptr;
+  /*!
+   * \brief PyErrClear function
+   */
+  F_PyErr_Clear pyerr_clear = nullptr;
+
+  static EnvCAPIRegistry* Global() {
+    static EnvCAPIRegistry* inst = new EnvCAPIRegistry();
+    return inst;
+  }
+
+  // register environment(e.g. python) specific api functions
+  void Register(const std::string& symbol_name, void* fptr) {
+    if (symbol_name == "PyErr_CheckSignals") {
+      Update(symbol_name, &pyerr_check_signals, fptr);
+    } else if (symbol_name == "PyErr_Clear") {
+      Update(symbol_name, &pyerr_clear, fptr);
+    } else {
+      LOG(FATAL) << "Unknown env API " << symbol_name;
+    }
+  }
+
+  // implementation of tvm::runtime::EnvCheckSignals
+  void CheckSignals() {
+    // check python signal to see if there are exception raised
+    if (pyerr_check_signals != nullptr && (*pyerr_check_signals)() != 0) {
+      ICHECK_NOTNULL(pyerr_clear);

Review comment:
       why not place PyErr_ stuff in the FFI layer and name the function FrontendHasPendingSignal

##########
File path: src/support/socket.h
##########
@@ -361,6 +363,41 @@ class Socket {
 #endif
   }
 
+  /*!
+   * \brief Call a function and retry if an EINTR error is encountered.
+   *
+   *  Socket operations can return EINTR when the interrupt handler
+   *  is registered by the execution environment(e.g. python).
+   *  We should retry if there is no KeyboardInterrupt recorded in
+   *  the environment.
+   *
+   * \note This function is needed to avoid rare interrupt event
+   *       in long running server code.
+   *
+   * \param func The function to retry.
+   * \return The return code returned by function f or error_value on retry failure.
+   */
+  template <typename FuncType>
+  ssize_t RetryCallOnEINTR(FuncType func) {
+    ssize_t ret = func();
+    // common path
+    if (ret != -1) return ret;
+    // less common path
+    do {
+      if (GetLastError() == EINTR) {
+        // Call into env check signals to see if there are
+        // environment specific(e.g. python) signal exceptions.
+        // This function will throw an exception if there is an KeyboardInterrupt.

Review comment:
       if the process received a signal that requires TVM to return immediately (e.g. SIGINT).

##########
File path: include/tvm/runtime/registry.h
##########
@@ -52,6 +52,45 @@
 namespace tvm {
 namespace runtime {
 
+/*!
+ * \brief Check if signals have been sent to the process and if so
+ *  invoke the registered signal handler in the frontend environment.
+ *
+ *  When runnning TVM in another langugage(python), the signal handler
+ *  may not be immediately executed, but instead the signal is marked
+ *  in the interpreter state(to ensure non-blocking of the signal handler).
+ *
+ *  This function can be explicitly invoked to check the cached signal
+ *  and run the related processing if a signal is marked.
+ *
+ *  Invoke this function periodically in a long running C++ function
+ *  to check if KeyboardInterrupt happens in a python execution environment.
+ *
+ *  Not inserting this function will not cause any correctness
+ *  issue, but will delay the KeyboardInterrupt until the function returns
+ *  to the python side. So this function is not needed in most API
+ *  functions can finish quickly in a reasonable amount of time.
+ *
+ * \code
+ *
+ * int check_signal_every_kiter = 10;
+ *
+ * for (int iter = 0; iter < very_large_number; ++iter) {
+ *   if (iter % check_signal_every_kiter == 0) {
+ *     tvm::runtime::EnvCheckSignals();
+ *   }
+ *   // do work here
+ * }
+ *
+ * \endcode
+ *
+ * \note This function is a nop when no signal checking function is registered.

Review comment:
       This function is a no-op when TVMBackendRegisterCAPI("FrontendHasPendingSIgnal", ...) has not been called. (the parameter may need to be updated)

##########
File path: include/tvm/runtime/registry.h
##########
@@ -52,6 +52,45 @@
 namespace tvm {
 namespace runtime {
 
+/*!
+ * \brief Check if signals have been sent to the process and if so
+ *  invoke the registered signal handler in the frontend environment.
+ *
+ *  When runnning TVM in another langugage(python), the signal handler
+ *  may not be immediately executed, but instead the signal is marked
+ *  in the interpreter state(to ensure non-blocking of the signal handler).
+ *
+ *  This function can be explicitly invoked to check the cached signal
+ *  and run the related processing if a signal is marked.
+ *
+ *  Invoke this function periodically in a long running C++ function
+ *  to check if KeyboardInterrupt happens in a python execution environment.
+ *
+ *  Not inserting this function will not cause any correctness
+ *  issue, but will delay the KeyboardInterrupt until the function returns
+ *  to the python side. So this function is not needed in most API
+ *  functions can finish quickly in a reasonable amount of time.
+ *
+ * \code
+ *
+ * int check_signal_every_kiter = 10;
+ *
+ * for (int iter = 0; iter < very_large_number; ++iter) {
+ *   if (iter % check_signal_every_kiter == 0) {
+ *     tvm::runtime::EnvCheckSignals();
+ *   }
+ *   // do work here
+ * }
+ *
+ * \endcode
+ *
+ * \note This function is a nop when no signal checking function is registered.
+ *
+ * \throws This function throws approperiate exception if an error happens,

Review comment:
       \throws This function throws an exception when the frontend FrontendHasPendingSignal indicates that a signal handler is pending. When this occurs, the intent of this function is to interrupt execution in TVM and return quickly to the calling frontend code.

##########
File path: include/tvm/runtime/registry.h
##########
@@ -52,6 +52,45 @@
 namespace tvm {
 namespace runtime {
 
+/*!
+ * \brief Check if signals have been sent to the process and if so
+ *  invoke the registered signal handler in the frontend environment.
+ *
+ *  When runnning TVM in another langugage(python), the signal handler
+ *  may not be immediately executed, but instead the signal is marked
+ *  in the interpreter state(to ensure non-blocking of the signal handler).
+ *
+ *  This function can be explicitly invoked to check the cached signal
+ *  and run the related processing if a signal is marked.
+ *
+ *  Invoke this function periodically in a long running C++ function

Review comment:
       i think you could further refine this to:
   - on Linux, when siginterrupt() is set, invoke this function whenever a syscall returns EINTR. when it is not set, invoke it between long-running syscalls when you will not immediately return to the frontend.
   - on Windows, the same rules apply, but due to differences in signal processing, these are likely to only make a difference when used with Ctrl+C and socket calls
   
   the last bit is what I gather from reading docs. would be great if @rkimball could confirm

##########
File path: include/tvm/runtime/registry.h
##########
@@ -52,6 +52,45 @@
 namespace tvm {
 namespace runtime {
 
+/*!
+ * \brief Check if signals have been sent to the process and if so
+ *  invoke the registered signal handler in the frontend environment.
+ *
+ *  When runnning TVM in another langugage(python), the signal handler
+ *  may not be immediately executed, but instead the signal is marked
+ *  in the interpreter state(to ensure non-blocking of the signal handler).
+ *
+ *  This function can be explicitly invoked to check the cached signal
+ *  and run the related processing if a signal is marked.
+ *
+ *  Invoke this function periodically in a long running C++ function
+ *  to check if KeyboardInterrupt happens in a python execution environment.
+ *
+ *  Not inserting this function will not cause any correctness
+ *  issue, but will delay the KeyboardInterrupt until the function returns
+ *  to the python side. So this function is not needed in most API
+ *  functions can finish quickly in a reasonable amount of time.
+ *
+ * \code
+ *
+ * int check_signal_every_kiter = 10;

Review comment:
       check_signal_every_k_iter




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org