You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pegasus.apache.org by GitBox <gi...@apache.org> on 2022/07/24 18:29:28 UTC

[GitHub] [incubator-pegasus] empiredan opened a new pull request, #1074: feat: support to close percentile to prevent from heap-use-after-free error

empiredan opened a new pull request, #1074:
URL: https://github.com/apache/incubator-pegasus/pull/1074

   This PR is to resolve the issue https://github.com/apache/incubator-pegasus/issues/1066.
   
   The root cause of issue https://github.com/apache/incubator-pegasus/issues/1066 is that the percentile is freed during the destruction of the metric registry, after which sometimes the handler for asynchronous wait operation of timer will be still executed, where the freed percentile is certainly referenced. This will lead to heap-use-after-free error.
   
   The solution for this problem will be extend the lifetime of percentile: during the construction of the percentile, it can increment its ref count (its base class is `ref_counter`) since it is referenced by timer. This effectively extends the lifetime of percentile: even if both of its metric entity and registry are destructed, it can still survive in the memory for the reason that its ref count is 1 at least.
   
   Once the percentile should be released when it will retire from metric entity or the metric registry (singleton) will be destructed, the `close()` method for percentile will be invoked. `close()` will change the state of timer from *running* to *closing*, and also call `cancel()` for the timer. Then, whether the timer is cancelled successfully or not, the timer will finally be stopped: if successful, the callback for cancel will be executed immediately; otherwise, the *closing* state will tell the timer to stop finally, albeit sometimes after a while.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan merged pull request #1074: feat: support to close percentile to prevent from heap-use-after-free error

Posted by GitBox <gi...@apache.org>.
empiredan merged PR #1074:
URL: https://github.com/apache/incubator-pegasus/pull/1074


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan commented on a diff in pull request #1074: feat: support to close percentile to prevent from heap-use-after-free error

Posted by GitBox <gi...@apache.org>.
empiredan commented on code in PR #1074:
URL: https://github.com/apache/incubator-pegasus/pull/1074#discussion_r931105112


##########
src/rdsn/include/dsn/utility/metrics.h:
##########
@@ -168,6 +171,16 @@ class metric_entity : public ref_counter
 
     ~metric_entity();
 
+    // Close all "closeable" metrics owned by this entity.
+    //
+    // `async` is used to control if the close operations are asynchronous or not. It is set to
+    // true by default, which means all metrics owned by this entity will be closed asynchronously
+    // without waiting for the close operations to be finished.
+    //
+    // Otherwise, once `async` is set to false, close() will be blocked until the close operations
+    // are finished.
+    void close(bool async = true);

Review Comment:
   OK, we can use enum class instead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] empiredan commented on a diff in pull request #1074: feat: support to close percentile to prevent from heap-use-after-free error

Posted by GitBox <gi...@apache.org>.
empiredan commented on code in PR #1074:
URL: https://github.com/apache/incubator-pegasus/pull/1074#discussion_r931105450


##########
src/rdsn/src/utils/metrics.cpp:
##########
@@ -132,27 +185,78 @@ uint64_t percentile_timer::generate_initial_delay_ms(uint64_t interval_ms)
     return (rand::next_u64() % interval_seconds + 1) * 1000 + rand::next_u64() % 1000;
 }
 
-percentile_timer::percentile_timer(uint64_t interval_ms, exec_fn exec)
+percentile_timer::percentile_timer(uint64_t interval_ms, on_exec_fn on_exec, on_close_fn on_close)
     : _initial_delay_ms(generate_initial_delay_ms(interval_ms)),
       _interval_ms(interval_ms),
-      _exec(exec),
+      _on_exec(on_exec),
+      _on_close(on_close),
+      _state(state::kRunning),
+      _completed(),
       _timer(new boost::asio::deadline_timer(tools::shared_io_service::instance().ios))
 {
     _timer->expires_from_now(boost::posix_time::milliseconds(_initial_delay_ms));
     _timer->async_wait(std::bind(&percentile_timer::on_timer, this, std::placeholders::_1));
 }
 
+void percentile_timer::close()
+{
+    // If the timer has already expired when cancel() is called, then the handlers for asynchronous
+    // wait operations will:
+    // * have already been invoked; or
+    // * have been queued for invocation in the near future.
+    //
+    // These handlers can no longer be cancelled, and therefore are passed an error code that
+    // indicates the successful completion of the wait operation. Thus set the state of timer to
+    // kClosing to tell on_timer() that the timer should be closed even if it is not called with
+    // operation_canceled.
+    auto expected_state = state::kRunning;
+    if (_state.compare_exchange_strong(expected_state, state::kClosing)) {
+        _timer->cancel();
+    }
+}
+
+void percentile_timer::wait() { _completed.wait(); }
+
+void percentile_timer::on_close()
+{
+    _on_close();
+    _completed.notify();
+}
+
 void percentile_timer::on_timer(const boost::system::error_code &ec)
 {
+// This macro is defined for the case that handlers for asynchronous wait operations are no
+// longer cancelled. It just checks the internal state atomically (since close() can also be
+// called simultaneously) for kClosing; once it's matched, it will not execute future callbacks
+// to stop the timer.
+#define TRY_PROCESS_TIMER_CLOSING()                                                                \
+    do {                                                                                           \
+        auto expected_state = state::kClosing;                                                     \
+        if (_state.compare_exchange_strong(expected_state, state::kClosed)) {                      \
+            on_close();                                                                            \
+            return;                                                                                \
+        }                                                                                          \
+    } while (0)
+
     if (dsn_unlikely(!!ec)) {
         dassert_f(ec == boost::system::errc::operation_canceled,
                   "failed to exec on_timer with an error that cannot be handled: {}",
                   ec.message());
+
+        // Cancel can only be launched by close().
+        auto expected_state = state::kClosing;
+        dassert_f(_state.compare_exchange_strong(expected_state, state::kClosed),
+                  "wrong state for percentile_timer: {}, while expecting closing state",
+                  static_cast<int>(expected_state));
+        on_close();
+
         return;
     }
 
-    _exec();
+    TRY_PROCESS_TIMER_CLOSING();
+    _on_exec();
 
+    TRY_PROCESS_TIMER_CLOSING();

Review Comment:
   Sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org


[GitHub] [incubator-pegasus] acelyc111 commented on a diff in pull request #1074: feat: support to close percentile to prevent from heap-use-after-free error

Posted by GitBox <gi...@apache.org>.
acelyc111 commented on code in PR #1074:
URL: https://github.com/apache/incubator-pegasus/pull/1074#discussion_r930910435


##########
src/rdsn/src/utils/metrics.cpp:
##########
@@ -132,27 +185,78 @@ uint64_t percentile_timer::generate_initial_delay_ms(uint64_t interval_ms)
     return (rand::next_u64() % interval_seconds + 1) * 1000 + rand::next_u64() % 1000;
 }
 
-percentile_timer::percentile_timer(uint64_t interval_ms, exec_fn exec)
+percentile_timer::percentile_timer(uint64_t interval_ms, on_exec_fn on_exec, on_close_fn on_close)
     : _initial_delay_ms(generate_initial_delay_ms(interval_ms)),
       _interval_ms(interval_ms),
-      _exec(exec),
+      _on_exec(on_exec),
+      _on_close(on_close),
+      _state(state::kRunning),
+      _completed(),
       _timer(new boost::asio::deadline_timer(tools::shared_io_service::instance().ios))
 {
     _timer->expires_from_now(boost::posix_time::milliseconds(_initial_delay_ms));
     _timer->async_wait(std::bind(&percentile_timer::on_timer, this, std::placeholders::_1));
 }
 
+void percentile_timer::close()
+{
+    // If the timer has already expired when cancel() is called, then the handlers for asynchronous
+    // wait operations will:
+    // * have already been invoked; or
+    // * have been queued for invocation in the near future.
+    //
+    // These handlers can no longer be cancelled, and therefore are passed an error code that
+    // indicates the successful completion of the wait operation. Thus set the state of timer to
+    // kClosing to tell on_timer() that the timer should be closed even if it is not called with
+    // operation_canceled.
+    auto expected_state = state::kRunning;
+    if (_state.compare_exchange_strong(expected_state, state::kClosing)) {
+        _timer->cancel();
+    }
+}
+
+void percentile_timer::wait() { _completed.wait(); }
+
+void percentile_timer::on_close()
+{
+    _on_close();
+    _completed.notify();
+}
+
 void percentile_timer::on_timer(const boost::system::error_code &ec)
 {
+// This macro is defined for the case that handlers for asynchronous wait operations are no
+// longer cancelled. It just checks the internal state atomically (since close() can also be
+// called simultaneously) for kClosing; once it's matched, it will not execute future callbacks
+// to stop the timer.
+#define TRY_PROCESS_TIMER_CLOSING()                                                                \
+    do {                                                                                           \
+        auto expected_state = state::kClosing;                                                     \
+        if (_state.compare_exchange_strong(expected_state, state::kClosed)) {                      \
+            on_close();                                                                            \
+            return;                                                                                \
+        }                                                                                          \
+    } while (0)
+
     if (dsn_unlikely(!!ec)) {
         dassert_f(ec == boost::system::errc::operation_canceled,
                   "failed to exec on_timer with an error that cannot be handled: {}",
                   ec.message());
+
+        // Cancel can only be launched by close().
+        auto expected_state = state::kClosing;
+        dassert_f(_state.compare_exchange_strong(expected_state, state::kClosed),
+                  "wrong state for percentile_timer: {}, while expecting closing state",
+                  static_cast<int>(expected_state));
+        on_close();
+
         return;
     }
 
-    _exec();
+    TRY_PROCESS_TIMER_CLOSING();
+    _on_exec();
 
+    TRY_PROCESS_TIMER_CLOSING();

Review Comment:
   better to add `#undef TRY_PROCESS_TIMER_CLOSING` when you'll never use it.



##########
src/rdsn/include/dsn/utility/metrics.h:
##########
@@ -168,6 +171,16 @@ class metric_entity : public ref_counter
 
     ~metric_entity();
 
+    // Close all "closeable" metrics owned by this entity.
+    //
+    // `async` is used to control if the close operations are asynchronous or not. It is set to
+    // true by default, which means all metrics owned by this entity will be closed asynchronously
+    // without waiting for the close operations to be finished.
+    //
+    // Otherwise, once `async` is set to false, close() will be blocked until the close operations
+    // are finished.
+    void close(bool async = true);

Review Comment:
   use 'wait' maybe more accurate, and it's suggest to use enum class instead of boolean, true/false is meanless when we see the invoker.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pegasus.apache.org
For additional commands, e-mail: dev-help@pegasus.apache.org