You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by xy...@apache.org on 2022/03/24 09:54:11 UTC

[pulsar] branch branch-2.8 updated (cc95eeb -> 99b9dad)

This is an automated email from the ASF dual-hosted git repository.

xyz pushed a change to branch branch-2.8
in repository https://gitbox.apache.org/repos/asf/pulsar.git.


    from cc95eeb  [C++] Fix producer is never destructed until client is closed (#14797)
     new f326603  [C++] Handle exception in creating socket when fd limit is reached (#14587)
     new 2c7a19a  [C++] Fix the race condition of connect timeout task (#14823)
     new 99b9dad  [C++] Fix segmentation fault when creating socket failed (#14834)

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 pulsar-client-cpp/lib/ClientConnection.cc | 27 +++++++++++++++++++--------
 1 file changed, 19 insertions(+), 8 deletions(-)

[pulsar] 02/03: [C++] Fix the race condition of connect timeout task (#14823)

Posted by xy...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

xyz pushed a commit to branch branch-2.8
in repository https://gitbox.apache.org/repos/asf/pulsar.git

commit 2c7a19a7ec73d98b57731ca5ca905b907cd12cc7
Author: Yunze Xu <xy...@163.com>
AuthorDate: Thu Mar 24 03:59:41 2022 +0800

    [C++] Fix the race condition of connect timeout task (#14823)
    
    Fixes #14665
    
    ### Motivation
    
    In C++ client, a connect timeout task is created each time before an
    asynchronous connect operation is performed, if the connection cannot be
    established in the configured timeout, the callback of the task will be
    called to close the connection and then the `createProducer` or
    `subscribe` methods will return `ResultConnectError`.
    
    `ClientConnection::connectTimeoutTask_`, which is a shared pointer,
    represents the timeout task. However, after `ClientConnection::close` is
    called, the shared pointer will be reset, and the underlying `PeriodicTask`
    object will be released. After that, when `stop` method is called on the
    released `PeriodicTask` object in the callback (`handleTcpConnected`), a
    segmentation fault will happen.
    
    The root cause is that `connectTimeoutTask_` can be accessed in two
    threads while one of them could release the memory. See #14665 for more
    explanations. This race condition leads to flaky Python tests as well,
    because we also have the similar test in Python tests. See
    https://github.com/apache/pulsar/blob/f7cbc1eb83ffd27b784d90d5d2dea8660c590ad2/pulsar-client-cpp/python/pulsar_test.py#L1207-L1221
    
    So this PR might also fix #14714.
    
    ### Modifications
    
    Remove `connectTimeoutTask_.reset()` in `ClientConnection::close`. After
    that, the `connectTimeoutTask_` will always points to the same
    `PeriodicTask` object, whose methods are thread safe.
    
    ### Verifying this change
    
    Execute the following command
    
    ```bash
    ./tests/main --gtest_filter='ClientTest.testConnectTimeout' --gtest_repeat=10
    ```
    
    to runs the `testConnectTimeout` for 10 times. In my local env, it never
    failed, while before applying this patch, it's very easy to fail.
    
    (cherry picked from commit 0c3aad1e0ba0ee53784b963a1238d3d76b6dd8b2)
---
 pulsar-client-cpp/lib/ClientConnection.cc | 5 +----
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/pulsar-client-cpp/lib/ClientConnection.cc b/pulsar-client-cpp/lib/ClientConnection.cc
index 3a52191..2747de0 100644
--- a/pulsar-client-cpp/lib/ClientConnection.cc
+++ b/pulsar-client-cpp/lib/ClientConnection.cc
@@ -1526,10 +1526,7 @@ void ClientConnection::close(Result result) {
         consumerStatsRequestTimer_.reset();
     }
 
-    if (connectTimeoutTask_) {
-        connectTimeoutTask_->stop();
-        connectTimeoutTask_.reset();
-    }
+    connectTimeoutTask_->stop();
 
     lock.unlock();
     LOG_INFO(cnxString_ << "Connection closed");

[pulsar] 01/03: [C++] Handle exception in creating socket when fd limit is reached (#14587)

Posted by xy...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

xyz pushed a commit to branch branch-2.8
in repository https://gitbox.apache.org/repos/asf/pulsar.git

commit f3266034816dc5f102fc1f4becf21bc7706aa899
Author: Matteo Merli <mm...@apache.org>
AuthorDate: Tue Mar 8 21:33:27 2022 -0800

    [C++] Handle exception in creating socket when fd limit is reached (#14587)
    
    (cherry picked from commit babae8e98a172302aee0bb3790b0f4e4128a7c35)
---
 pulsar-client-cpp/lib/ClientConnection.cc | 28 ++++++++++++++++++++--------
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/pulsar-client-cpp/lib/ClientConnection.cc b/pulsar-client-cpp/lib/ClientConnection.cc
index 9f4ae6d..3a52191 100644
--- a/pulsar-client-cpp/lib/ClientConnection.cc
+++ b/pulsar-client-cpp/lib/ClientConnection.cc
@@ -161,7 +161,6 @@ ClientConnection::ClientConnection(const std::string& logicalAddress, const std:
       serverProtocolVersion_(ProtocolVersion_MIN),
       executor_(executor),
       resolver_(executor_->createTcpResolver()),
-      socket_(executor_->createSocket()),
 #if BOOST_VERSION >= 107000
       strand_(boost::asio::make_strand(executor_->getIOService().get_executor())),
 #elif BOOST_VERSION >= 106600
@@ -173,12 +172,20 @@ ClientConnection::ClientConnection(const std::string& logicalAddress, const std:
       physicalAddress_(physicalAddress),
       cnxString_("[<none> -> " + physicalAddress + "] "),
       incomingBuffer_(SharedBuffer::allocate(DefaultBufferSize)),
-      connectTimeoutTask_(std::make_shared<PeriodicTask>(executor_->getIOService(),
-                                                         clientConfiguration.getConnectionTimeout())),
       outgoingBuffer_(SharedBuffer::allocate(DefaultBufferSize)),
-      consumerStatsRequestTimer_(executor_->createDeadlineTimer()),
       maxPendingLookupRequest_(clientConfiguration.getConcurrentLookupRequest()) {
 
+    try {
+        socket_ = executor_->createSocket();
+        connectTimeoutTask_ = std::make_shared<PeriodicTask>(executor_->getIOService(),
+                                                             clientConfiguration.getConnectionTimeout());
+        consumerStatsRequestTimer_ = executor_->createDeadlineTimer();
+    } catch (const boost::system::system_error& e) {
+        LOG_ERROR("Failed to initialize connection: " << e.what());
+        close();
+        return;
+    }
+
     LOG_INFO(cnxString_ << "Create ClientConnection, timeout=" << clientConfiguration.getConnectionTimeout());
     if (clientConfiguration.isUseTls()) {
 #if BOOST_VERSION >= 105400
@@ -1480,9 +1487,11 @@ void ClientConnection::close(Result result) {
     }
     state_ = Disconnected;
     boost::system::error_code err;
-    socket_->close(err);
-    if (err) {
-        LOG_WARN(cnxString_ << "Failed to close socket: " << err.message());
+    if (socket_) {
+        socket_->close(err);
+        if (err) {
+            LOG_WARN(cnxString_ << "Failed to close socket: " << err.message());
+        }
     }
 
     if (tlsSocket_) {
@@ -1517,7 +1526,10 @@ void ClientConnection::close(Result result) {
         consumerStatsRequestTimer_.reset();
     }
 
-    connectTimeoutTask_->stop();
+    if (connectTimeoutTask_) {
+        connectTimeoutTask_->stop();
+        connectTimeoutTask_.reset();
+    }
 
     lock.unlock();
     LOG_INFO(cnxString_ << "Connection closed");

[pulsar] 03/03: [C++] Fix segmentation fault when creating socket failed (#14834)

Posted by xy...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

xyz pushed a commit to branch branch-2.8
in repository https://gitbox.apache.org/repos/asf/pulsar.git

commit 99b9dad6dcc833e986de29a8314adb4c8b705035
Author: Yunze Xu <xy...@163.com>
AuthorDate: Thu Mar 24 15:29:16 2022 +0800

    [C++] Fix segmentation fault when creating socket failed (#14834)
    
    ### Motivation
    
    https://github.com/apache/pulsar/pull/14823 fixes the flaky
    `testConnectTimeout` but it's also a regression of
    https://github.com/apache/pulsar/pull/14587. Because when the fd limit
    is reached, the `connectionTimeoutTask_` won't be initialized with a
    non-null value. Calling `stop` method on it directly will cause
    segmentation fault.
    
    See
    https://github.com/apache/pulsar/blob/0fe921f32cefe7648ca428cd9861f9163c69767d/pulsar-client-cpp/lib/ClientConnection.cc#L178-L185
    
    ### Modifications
    
    Add the null check for `connectionTimeoutTask_` in `ClientConnection::close`.
    
    (cherry picked from commit 54c368ed3744a40240205c17bdcac5cef48130e4)
---
 pulsar-client-cpp/lib/ClientConnection.cc | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/pulsar-client-cpp/lib/ClientConnection.cc b/pulsar-client-cpp/lib/ClientConnection.cc
index 2747de0..8802187 100644
--- a/pulsar-client-cpp/lib/ClientConnection.cc
+++ b/pulsar-client-cpp/lib/ClientConnection.cc
@@ -1526,7 +1526,9 @@ void ClientConnection::close(Result result) {
         consumerStatsRequestTimer_.reset();
     }
 
-    connectTimeoutTask_->stop();
+    if (connectTimeoutTask_) {
+        connectTimeoutTask_->stop();
+    }
 
     lock.unlock();
     LOG_INFO(cnxString_ << "Connection closed");