You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/14 08:14:04 UTC

[GitHub] [pulsar] BewareMyPower opened a new pull request, #17645: [fix][cpp] Fix flaky testReferenceCount

BewareMyPower opened a new pull request, #17645:
URL: https://github.com/apache/pulsar/pull/17645

   Fixes #14848
   
   ### Motivation
   
   There were several fixes on `ClientTest.testReferenceCount` but it's still very flaky.
   
   The root cause is even after a `Reader` went out of the scope and destructed, there was still a `Reader` object existed in the thread of the event loop. See
   https://github.com/apache/pulsar/blob/845daf5cac23a4dda4a209d91c85804a0bcaf28a/pulsar-client-cpp/lib/ReaderImpl.cc#L88
   
   To verify this point, I added some logs and saw:
   
   ```
   2022-09-14 03:52:28.427 INFO  [140046042864960] Reader:39 | Reader ctor 0x7fffd2a7c110
   # ...
   2022-09-14 03:52:28.444 INFO  [140046039774976] Reader:42 | Reader ctor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3)
   # ...
   2022-09-14 03:52:28.445 INFO  [140046042864960] ClientTest:217 | Reference count of the reader: 4
   # ...
   2022-09-14 03:52:28.445 INFO  [140046042864960] ClientImpl:490 | Closing Pulsar client with 1 producers and 2 consumers
   2022-09-14 03:52:28.445 INFO  [140046039774976] Reader:55 | Reader dtor 0x7f5f0273d720 ReaderImpl(0x7f5efc00a9d0, 3)
   ```
   
   The first `Reader` object 0x7fffd2a7c110 was constructed in main thread
   140046042864960. However, it destructed before another `Reader` object 0x0x7f5f0273d720 that was constructed in event loop thread
   140046039774976. When the callback passed to `createReaderAsync` completed the promise, the `createReader` immediately returns, at the same time the `Reader` object in the callback was still in the scope and not destructed.
   
   Since `Reader` holds a `shared_ptr<ReaderImpl>` and `ReaderImpl` holds a `shared_ptr<ConsumerImpl>`, if we check the reference count too quickly, the reference count of the underlying consumer is still positive because the `Reader` was not destructed at the moment.
   
   ### Modifications
   
   Since we cannot determine the precise destructed time point because that `Reader` object is in the event loop thread, we have to wait for a while. This PR adds a `waitUntil` utility function to wait for at most some time until the condition is met. Then wait until the reference count becomes 0 after the `Reader` object goes out of scope.
   
   Replace `ASSERT_EQ` with `EXPECT_EQ` to let the test continue if it failed.
   
   ### Verifying this change
   
   Following the steps here to reproduce:
   https://github.com/apache/pulsar/issues/14848#issuecomment-1246375370
   
   The test never failed even with `--gtest_repeat=100`.
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
   
   - [x] `doc-not-needed` 
   (Please explain why)
   
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] merlimat merged pull request #17645: [fix][cpp] Fix flaky testReferenceCount

Posted by GitBox <gi...@apache.org>.
merlimat merged PR #17645:
URL: https://github.com/apache/pulsar/pull/17645


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org