You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/07 04:44:05 UTC
[GitHub] [pulsar] michaeljmarshall commented on a diff in pull request #17401: [fix][broker] Fix issue where leader broker information isn't available after 10 minutes

michaeljmarshall commented on code in PR #17401:
URL: https://github.com/apache/pulsar/pull/17401#discussion_r964188907


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/LeaderElectionImpl.java:
##########
@@ -73,14 +76,20 @@ private enum InternalState {
         this.path = path;
         this.serde = new JSONMetadataSerdeSimpleType<>(TypeFactory.defaultInstance().constructSimpleType(clazz, null));
         this.store = store;
-        this.cache = store.getMetadataCache(clazz);
+        MetadataCacheConfig metadataCacheConfig = MetadataCacheConfig.builder()
+                .expireAfterWriteMillis(-1L)
+                .build();
+        this.cache = store.getMetadataCache(clazz, metadataCacheConfig);
         this.leaderElectionState = LeaderElectionState.NoLeader;
         this.internalState = InternalState.Init;
         this.stateChangesListener = stateChangesListener;
         this.executor = executor;
 
         store.registerListener(this::handlePathNotification);
         store.registerSessionListener(this::handleSessionNotification);
+        updateCachedValueFuture = executor.scheduleWithFixedDelay(SafeRunnable.safeRun(this::getLeaderValue),
+                metadataCacheConfig.getRefreshAfterWriteMillis() / 2,
+                metadataCacheConfig.getRefreshAfterWriteMillis(), TimeUnit.MILLISECONDS);

Review Comment:
   A call to `getLeaderValueIfPresent` will asynchronously trigger the `refreshAfter` logic in the loading cache. Is there a reason it is insufficient to rely on those calls?



##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/LeaderElectionImpl.java:
##########
@@ -207,11 +219,6 @@ private synchronized CompletableFuture<LeaderElectionState> tryToBecomeLeader()
                         // There was a conflict between 2 participants trying to become leaders at same time. Retry
                         // to fetch info on new leader.
 
-                        // We force the invalidation of the cache entry. Since we received a BadVersion error, we
-                        // already know that the entry is out of date. If we don't invalidate, we'd be retrying the
-                        // leader election many times until we finally receive the notification that invalidates the
-                        // cache.
-                        cache.invalidate(path);

Review Comment:
   If we don't invalidate the value here, I think we are at risk of recursively retrying to acquire the lock just as the comment indicates because when the cached value is `None` and the actual value is another broker, this broker will keep getting `BadVersionException` without completing any of the `result` objects.
   
   This might be a place where the implementation in #17254 is better because an empty cache results in a read to the metadata store. I am wondering if its worth using both PRs in order to decrease certain edge cases during leader election where the current broker doesn't know the real leader.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org