You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/08/05 12:34:47 UTC

[GitHub] [pulsar] Vanlightly opened a new pull request #11571: [C++] Fix testConnectTimeout test by shutting down clients

Vanlightly opened a new pull request #11571:
URL: https://github.com/apache/pulsar/pull/11571


   ### Motivation
   
   This test does not call shutdown() on the clients which causes a connect timer to not be cancelled and a mutex operation is performed by the timer after the mutex has already been destructed on the test completing. Only occurs when running many tests at a time.
   
   ### Modifications
   
   Call shutdown on both clients in the test.
   
   ### Verifying this change
   
   Run the existing tests and see no error. When it happens, the test passes but 10 seconds later (that is the connect timeout set in the test), the process terminates with the error "mutex lock failed: Invalid argument". This is caused by the compare_exchange_strong in PeriodicTask, after the underlying mutex has been destructed.
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If `yes` was chosen, please highlight the changes*
   
     - Dependencies (does it add or upgrade a dependency): ( no)
     - The public API: (no)
     - The schema: (no)
     - The default values of configurations: (no)
     - The wire protocol: (no)
     - The rest endpoints: (no)
     - The admin cli options: (no)
     - Anything that affects deployment: (no)
   
   ### Documentation
   
   #### For contributor
   
   For this PR, do we need to update docs?
   
   No. This is a minor fix to a test.
   
   #### For committer
   
   For this PR, do we need to update docs?
   
   - If yes,
     
     - if you update docs in this PR, label this PR with the `doc` label.
     
     - if you plan to update docs later, label this PR with the `doc-required` label.
   
     - if you need help on updating docs, create a follow-up issue with the `doc-required` label.
     
   - If no, label this PR with the `no-need-doc` label and explain why.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #11571: [C++] Fix testConnectTimeout test by shutting down clients

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #11571:
URL: https://github.com/apache/pulsar/pull/11571#issuecomment-894002153


   It's right that `shutdown()` won't be called in `close()` when the client doesn't own any producer or consumer. But it's a bug, and I also fixed it #11557, see https://github.com/apache/pulsar/pull/11557/files#diff-07b42f6891f2bfe80d8bc2c8877f34a10b714dc7ffd12789b16eb18e5a51dd48


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower edited a comment on pull request #11571: [C++] Fix testConnectTimeout test by shutting down clients

Posted by GitBox <gi...@apache.org>.
BewareMyPower edited a comment on pull request #11571:
URL: https://github.com/apache/pulsar/pull/11571#issuecomment-893563738


   The tests still failed:
   
   ```
       1522 ms: ./main ClientTest.testConnectTimeout (try #1)
       1522 ms: ./main ClientTest.testConnectTimeout (try #2)
   ```
   
   See https://github.com/apache/pulsar/pull/11571/checks?check_run_id=3252289848
   
   The `testConnectTimeout` failed not because `shutdown()` is not called. It's because `connectTimeoutTask_ ` started again even if the the next endpoint iterator is the end iterator, see https://github.com/apache/pulsar/blob/27dcb0a6febd0d88438e18fae72aad8ee38c0738/pulsar-client-cpp/lib/ClientConnection.cc#L435-L443
   
   We should increase `endpointIterator` first and then check `if (endpointIterator != tcp::resolver::iterator())`.
   
   To verify my point, you can change
   
   ```c++
   ASSERT_EQ(futureLow.wait_for(std::chrono::milliseconds(1500)), std::future_status::ready);
   ```
   
   to
   
   ```c++
   ASSERT_EQ(futureLow.wait_for(std::chrono::milliseconds(2500)), std::future_status::ready);
   ```
   
   The tests will pass.
   
   BTW, current CI for C++ client is broken (see https://github.com/apache/pulsar/issues/11549). And I've fixed all failed C++ tests in https://github.com/apache/pulsar/pull/11557.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Vanlightly closed pull request #11571: [C++] Fix testConnectTimeout test by shutting down clients

Posted by GitBox <gi...@apache.org>.
Vanlightly closed pull request #11571:
URL: https://github.com/apache/pulsar/pull/11571


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Vanlightly commented on pull request #11571: [C++] Fix testConnectTimeout test by shutting down clients

Posted by GitBox <gi...@apache.org>.
Vanlightly commented on pull request #11571:
URL: https://github.com/apache/pulsar/pull/11571#issuecomment-893674605


   The issue I am addressing is perhaps due to how I am running the tests, via IntelliJ and the main.cc.
   
   I misdiagnosed which mutex was being incorrectly operated on, it is in fact the `promiseDefault`of the test:
   
   https://github.com/apache/pulsar/blob/3bfbee1a8eeeed40c8615351474e69c79906b31e/pulsar-client-cpp/tests/ClientTest.cc#L107
   
   The issue is not about the iterating of the endpoints but that the connection timeout triggers after the test has completed and the promise out of scope. Calling close() on the client doesn't stop the connect timer, so when it triggers it executes `promiseDefault.set_value(result);` which was already destructed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] Vanlightly commented on pull request #11571: [C++] Fix testConnectTimeout test by shutting down clients

Posted by GitBox <gi...@apache.org>.
Vanlightly commented on pull request #11571:
URL: https://github.com/apache/pulsar/pull/11571#issuecomment-894056951


   Excellent, I'll close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] BewareMyPower commented on pull request #11571: [C++] Fix testConnectTimeout test by shutting down clients

Posted by GitBox <gi...@apache.org>.
BewareMyPower commented on pull request #11571:
URL: https://github.com/apache/pulsar/pull/11571#issuecomment-893563738


   The tests still failed:
   
   ```
       1522 ms: ./main ClientTest.testConnectTimeout (try #1)
       1522 ms: ./main ClientTest.testConnectTimeout (try #2)
   ```
   
   See https://github.com/apache/pulsar/pull/11571/checks?check_run_id=3252289848
   
   The `testConnectTimeout` failed not because `shutdown()` is not called. It's because `connectTimeoutTask_ ` started again even if the the next endpoint iterator is the end iterator, see https://github.com/apache/pulsar/blob/27dcb0a6febd0d88438e18fae72aad8ee38c0738/pulsar-client-cpp/lib/ClientConnection.cc#L435-L443
   
   We should increase `endpointIterator` first and then check `if (endpointIterator != tcp::resolver::iterator())`.
   
   BTW, current CI for C++ client is broken (see https://github.com/apache/pulsar/issues/11549). And I've fixed all failed C++ tests in https://github.com/apache/pulsar/pull/11557.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org