You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2021/02/11 14:21:39 UTC

[GitHub] [pulsar] lhotari opened a new pull request #9565: [Broker] Fix race condition in BrokerService topic cache

lhotari opened a new pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565


   Fixes #9564
   
   ### Motivation
   
   There's a concurrency by in `org.apache.pulsar.broker.service.BrokerService.getTopic` method.
   When the topic has been retried with `createIfMissing=false`, for example by getting the statistics, for the topic, the pending future for this call will be in the cache and it's result will be returned for the call which currently calls this method with `createIfMissing=true`.
   
   ### Modifications
   
   Address the race condition and if `createIfMissing=true`, retry calling `getTopic` when the current result is `Optional.empty()`.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on a change in pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
lhotari commented on a change in pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#discussion_r574987574



##########
File path: pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerServiceTest.java
##########
@@ -1005,4 +1003,33 @@ public void generate(SimpleTextOutputStream stream) {
         }
         Assert.assertTrue(sb.toString().contains("test_metrics"));
     }
+
+    @Test(timeOut = 20000L)
+    public void shouldNotPreventCreatingTopicWhenNonexistingTopicIsCached() throws Exception {
+        // run multiple iterations to increase the chance of reproducing a race condition in the topic cache
+        for (int i = 0; i < 100; i++) {
+            final String topicName = "persistent://prop/ns-abc/topic-caching-test-topic" + i;
+            CountDownLatch latch = new CountDownLatch(1);
+            Thread getStatsThread = new Thread(() -> {
+                try {
+                    latch.countDown();
+                    // create race condition with a short delay
+                    // the bug might not reproduce in all environments, this works at least on i7-10750H CPU
+                    Thread.sleep(1);
+                    admin.topics().getStats(topicName);
+                    fail("The topic should not exist yet.");
+                } catch (PulsarAdminException.NotFoundException e) {
+                    // expected exception
+                } catch (PulsarAdminException | InterruptedException e) {
+                    log.error("Exception in getStatsThread", e);

Review comment:
       resolved. I put the thread index in the thread's name and logged the name of the thread here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] merlimat merged pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
merlimat merged pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#issuecomment-777775731


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#issuecomment-777941595


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#issuecomment-777721664


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
lhotari commented on pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#issuecomment-777763672


   /pulsarbot run-failure-checks


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] eolivelli commented on a change in pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
eolivelli commented on a change in pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#discussion_r574562215



##########
File path: pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerServiceTest.java
##########
@@ -1005,4 +1005,33 @@ public void generate(SimpleTextOutputStream stream) {
         }
         Assert.assertTrue(sb.toString().contains("test_metrics"));
     }
+
+    @Test(timeOut = 20000L)
+    public void shouldNotPreventCreatingTopicWhenNonexistingTopicIsCached() throws Exception {
+        // run multiple iterations to increase the chance of reproducing a race condition in the topic cache
+        for (int i = 0; i < 100; i++) {
+            final String topicName = "persistent://prop/ns-abc/topic-caching-test-topic" + i;
+            CountDownLatch latch = new CountDownLatch(1);
+            Thread getStatsThread = new Thread(() -> {
+                try {
+                    latch.countDown();
+                    // create race condition with a short delay
+                    // the bug might not reproduce in all environments, this works at least on i7-10750H CPU
+                    Thread.sleep(1);
+                    admin.topics().getStats(topicName);
+                    fail("The topic should not exist yet.");
+                } catch (PulsarAdminException.NotFoundException e) {
+                    // expected exception
+                } catch (PulsarAdminException | InterruptedException e) {
+                    e.printStackTrace();

Review comment:
       Nit: use Logger?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on a change in pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
lhotari commented on a change in pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#discussion_r574684276



##########
File path: pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerServiceTest.java
##########
@@ -1005,4 +1005,33 @@ public void generate(SimpleTextOutputStream stream) {
         }
         Assert.assertTrue(sb.toString().contains("test_metrics"));
     }
+
+    @Test(timeOut = 20000L)
+    public void shouldNotPreventCreatingTopicWhenNonexistingTopicIsCached() throws Exception {
+        // run multiple iterations to increase the chance of reproducing a race condition in the topic cache
+        for (int i = 0; i < 100; i++) {
+            final String topicName = "persistent://prop/ns-abc/topic-caching-test-topic" + i;
+            CountDownLatch latch = new CountDownLatch(1);
+            Thread getStatsThread = new Thread(() -> {
+                try {
+                    latch.countDown();
+                    // create race condition with a short delay
+                    // the bug might not reproduce in all environments, this works at least on i7-10750H CPU
+                    Thread.sleep(1);
+                    admin.topics().getStats(topicName);
+                    fail("The topic should not exist yet.");
+                } catch (PulsarAdminException.NotFoundException e) {
+                    // expected exception
+                } catch (PulsarAdminException | InterruptedException e) {
+                    e.printStackTrace();

Review comment:
       Done




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] lhotari commented on a change in pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
lhotari commented on a change in pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#discussion_r574987415



##########
File path: pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerServiceTest.java
##########
@@ -1005,4 +1003,33 @@ public void generate(SimpleTextOutputStream stream) {
         }
         Assert.assertTrue(sb.toString().contains("test_metrics"));
     }
+
+    @Test(timeOut = 20000L)

Review comment:
       timeout removed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [pulsar] sijie commented on a change in pull request #9565: [Broker] Fix race condition in BrokerService topic cache

Posted by GitBox <gi...@apache.org>.
sijie commented on a change in pull request #9565:
URL: https://github.com/apache/pulsar/pull/9565#discussion_r574943048



##########
File path: pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerServiceTest.java
##########
@@ -1005,4 +1003,33 @@ public void generate(SimpleTextOutputStream stream) {
         }
         Assert.assertTrue(sb.toString().contains("test_metrics"));
     }
+
+    @Test(timeOut = 20000L)
+    public void shouldNotPreventCreatingTopicWhenNonexistingTopicIsCached() throws Exception {
+        // run multiple iterations to increase the chance of reproducing a race condition in the topic cache
+        for (int i = 0; i < 100; i++) {
+            final String topicName = "persistent://prop/ns-abc/topic-caching-test-topic" + i;
+            CountDownLatch latch = new CountDownLatch(1);
+            Thread getStatsThread = new Thread(() -> {
+                try {
+                    latch.countDown();
+                    // create race condition with a short delay
+                    // the bug might not reproduce in all environments, this works at least on i7-10750H CPU
+                    Thread.sleep(1);
+                    admin.topics().getStats(topicName);
+                    fail("The topic should not exist yet.");
+                } catch (PulsarAdminException.NotFoundException e) {
+                    // expected exception
+                } catch (PulsarAdminException | InterruptedException e) {
+                    log.error("Exception in getStatsThread", e);

Review comment:
       Can you add the thread index in the error log message?

##########
File path: pulsar-broker/src/test/java/org/apache/pulsar/broker/service/BrokerServiceTest.java
##########
@@ -1005,4 +1003,33 @@ public void generate(SimpleTextOutputStream stream) {
         }
         Assert.assertTrue(sb.toString().contains("test_metrics"));
     }
+
+    @Test(timeOut = 20000L)

Review comment:
       why do we need to add `timeOut` here?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org