You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2022/09/17 03:53:29 UTC

[GitHub] [pulsar] mattisonchao opened a new pull request, #17700: [fix][metadata] Complete expire future when revalidate got `LockBusyException`

mattisonchao opened a new pull request, #17700:
URL: https://github.com/apache/pulsar/pull/17700

   ### Motivation
   
   In the production environment,  we found two brokers holding the same valid locks. and one has an exceptional revalidate future with `lockBusyException`. after reading the code, there may forget the reset the cache and complete expire exception when getting lockBusyException.
   
   Snapshot: broker A
   
   <img width="991" alt="image" src="https://user-images.githubusercontent.com/74767115/190839387-aeb5ae62-728b-4505-b2c7-449f007f6da2.png">
   
   Snapshot: broker B
   
   <img width="952" alt="image" src="https://user-images.githubusercontent.com/74767115/190839434-c0017bbe-a11a-40d8-a772-6947b5b9fc38.png">
   
   
   ### Modifications
   
   <!-- Describe the modifications you've done. -->
   
   ### Verifying this change
   
   - [x] Make sure that the change passes the CI checks.
   
   *(Please pick either of the following options)*
   
   This change is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This change is already covered by existing tests, such as *(please describe tests)*.
   
   *(or)*
   
   This change added tests and can be verified as follows:
   
   *(example:)*
     - *Added integration tests for end-to-end deployment with large payloads (10MB)*
     - *Extended integration test for recovery after broker failure*
   
   ### Does this pull request potentially affect one of the following parts:
   
   *If the box was checked, please highlight the changes*
   
   - [ ] Dependencies (add or upgrade a dependency)
   - [ ] The public API
   - [ ] The schema
   - [ ] The default values of configurations
   - [ ] The binary protocol
   - [ ] The REST endpoints
   - [ ] The admin CLI options
   - [ ] Anything that affects deployment
   
   ### Documentation
   
   <!-- DO NOT REMOVE THIS SECTION. CHECK THE PROPER BOX ONLY. -->
   
   - [ ] `doc-required` 
   (Your PR needs to update docs and you will update later)
   
   - [x] `doc-not-needed` 
   (Please explain why)
   
   - [ ] `doc` 
   (Your PR contains doc changes)
   
   - [ ] `doc-complete`
   (Docs have been already added)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#issuecomment-1250441561

   Add testing, convert to draft.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#issuecomment-1250517841

   @eolivelli  @Jason918  Already add the test. Please retake a look.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973835719


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   When I am writing this test, I found an interesting thing. We can allow the new lock to steal the existing lock that may hold by others(same value). 
   I'm not sure if it's a big problem. You can use this test to verify this behaviour.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974038820


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.
+        Awaitility.await().untilAsserted(()-> {
+            // Ensure steal the lock success.
+           lock2.set(lm2.acquireLock(path1, "value-1").join());

Review Comment:
   Because steal lock may fail and throw an exception. but we need to ensure lock2 steal lock success and then trigger lock1 to revalidate. (To avoid test flaky)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974031384


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.
+        Awaitility.await().untilAsserted(()-> {
+            // Ensure steal the lock success.
+           lock2.set(lm2.acquireLock(path1, "value-1").join());

Review Comment:
   We don't have any assertions here. Why do we need `untilAsserted`



##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -233,6 +217,25 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            synchronized (ResourceLockImpl.this) {
+                Throwable realCause = FutureUtil.unwrapCompletionException(ex);
+                if (!revalidateAfterReconnection || realCause instanceof BadVersionException
+                        || realCause instanceof LockBusyException) {
+                    log.warn("Failed to revalidate the lock at {}. Marked as expired", path);

Review Comment:
   Since we have 3 cases will reach here, it's better to add more information to the log to let users to understand the real cause.



##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -233,6 +217,25 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            synchronized (ResourceLockImpl.this) {
+                Throwable realCause = FutureUtil.unwrapCompletionException(ex);
+                if (!revalidateAfterReconnection || realCause instanceof BadVersionException
+                        || realCause instanceof LockBusyException) {
+                    log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
+                    state = State.Released;
+                    expiredFuture.complete(null);
+                } else {
+                    // We failed to revalidate the lock due to connectivity issue
+                    // Continue assuming we hold the lock, until we can revalidate it, either
+                    // on Reconnected or SessionReestablished events.
+                    this.revalidateAfterReconnection = true;

Review Comment:
   ```suggestion
                       ResourceLockImpl.this.revalidateAfterReconnection = true;
   ```



##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   I'm not sure why we need the `steal lock` operation.
   But this is a potential risk. It will introduce too many small ledgers if we encounter this problem



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973535614


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -232,6 +217,23 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            Throwable realCause = FutureUtil.unwrapCompletionException(ex);
+            synchronized (ResourceLockImpl.this) {
+                if (realCause instanceof BadVersionException || realCause instanceof LockBusyException) {
+                    log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
+                    state = State.Released;
+                    expiredFuture.complete(null);
+                } else {
+                    // We failed to revalidate the lock due to connectivity issue
+                    // Continue assuming we hold the lock, until we can revalidate it, either
+                    // on Reconnected or SessionReestablished events.
+                    log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,

Review Comment:
   there may lost update `revalidateAfterReconnection`, the fix PR #17664



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973804080


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -184,38 +185,21 @@ synchronized void lockWasInvalidated() {
         }
 
         log.info("Lock on resource {} was invalidated", path);
-        revalidate(value)
-                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path))
-                .exceptionally(ex -> {
-                    synchronized (ResourceLockImpl.this) {
-                        if (ex.getCause() instanceof BadVersionException) {
-                            log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
-                            state = State.Released;
-                            expiredFuture.complete(null);
-                        } else {
-                            // We failed to revalidate the lock due to connectivity issue
-                            // Continue assuming we hold the lock, until we can revalidate it, either
-                            // on Reconnected or SessionReestablished events.
-                            revalidateAfterReconnection = true;
-                            log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,
-                                    ex.getCause().getMessage());
-                        }
-                    }
-                    return null;
-                });
+        revalidate(value, true)
+                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path));
     }
 
     synchronized CompletableFuture<Void> revalidateIfNeededAfterReconnection() {
         if (revalidateAfterReconnection) {
             revalidateAfterReconnection = false;
             log.warn("Revalidate lock at {} after reconnection", path);
-            return revalidate(value);
+            return revalidate(value, true);
         } else {
             return CompletableFuture.completedFuture(null);
         }
     }
 
-    synchronized CompletableFuture<Void> revalidate(T newValue) {
+    synchronized CompletableFuture<Void> revalidate(T newValue, boolean retryWhenConnectionLost) {

Review Comment:
   Add `retryWhenConnectionLost` to avoid useless resource lock acquire distributed lock.
   
   e.g. when we invoke the `acquire` method, if we got a connection problem, we will return the failed future immediately and discard the resource lock. if we continue re-try when reconnecting, we probably get the distributed lock for useless resource lock.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Jason918 commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
Jason918 commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973827309


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -184,38 +185,21 @@ synchronized void lockWasInvalidated() {
         }
 
         log.info("Lock on resource {} was invalidated", path);
-        revalidate(value)
-                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path))
-                .exceptionally(ex -> {
-                    synchronized (ResourceLockImpl.this) {
-                        if (ex.getCause() instanceof BadVersionException) {
-                            log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
-                            state = State.Released;
-                            expiredFuture.complete(null);
-                        } else {
-                            // We failed to revalidate the lock due to connectivity issue
-                            // Continue assuming we hold the lock, until we can revalidate it, either
-                            // on Reconnected or SessionReestablished events.
-                            revalidateAfterReconnection = true;
-                            log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,
-                                    ex.getCause().getMessage());
-                        }
-                    }
-                    return null;
-                });
+        revalidate(value, true)
+                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path));
     }
 
     synchronized CompletableFuture<Void> revalidateIfNeededAfterReconnection() {
         if (revalidateAfterReconnection) {
             revalidateAfterReconnection = false;
             log.warn("Revalidate lock at {} after reconnection", path);
-            return revalidate(value);
+            return revalidate(value, true);
         } else {
             return CompletableFuture.completedFuture(null);
         }
     }
 
-    synchronized CompletableFuture<Void> revalidate(T newValue) {
+    synchronized CompletableFuture<Void> revalidate(T newValue, boolean retryWhenConnectionLost) {

Review Comment:
   About the variable name, maybe just reuse `revalidateAfterReconnection`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973831241


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -184,38 +185,21 @@ synchronized void lockWasInvalidated() {
         }
 
         log.info("Lock on resource {} was invalidated", path);
-        revalidate(value)
-                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path))
-                .exceptionally(ex -> {
-                    synchronized (ResourceLockImpl.this) {
-                        if (ex.getCause() instanceof BadVersionException) {
-                            log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
-                            state = State.Released;
-                            expiredFuture.complete(null);
-                        } else {
-                            // We failed to revalidate the lock due to connectivity issue
-                            // Continue assuming we hold the lock, until we can revalidate it, either
-                            // on Reconnected or SessionReestablished events.
-                            revalidateAfterReconnection = true;
-                            log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,
-                                    ex.getCause().getMessage());
-                        }
-                    }
-                    return null;
-                });
+        revalidate(value, true)
+                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path));
     }
 
     synchronized CompletableFuture<Void> revalidateIfNeededAfterReconnection() {
         if (revalidateAfterReconnection) {
             revalidateAfterReconnection = false;
             log.warn("Revalidate lock at {} after reconnection", path);
-            return revalidate(value);
+            return revalidate(value, true);
         } else {
             return CompletableFuture.completedFuture(null);
         }
     }
 
-    synchronized CompletableFuture<Void> revalidate(T newValue) {
+    synchronized CompletableFuture<Void> revalidate(T newValue, boolean retryWhenConnectionLost) {

Review Comment:
   Sure, and fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973835719


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   When I am writing this test, I found an interesting thing. We can allow the new lock to steal the existing lock that may hold by others(same value). 
   I'm not sure if it's a big problem. You can use this test to verify this behaviour.
   
   Plus, steal lock behaviour may cause an infinity loop when they use the same value or a different value in the same session. the details please see `ResourceLockImpl#doRevalidate`
   
   https://github.com/apache/pulsar/blob/69f3f7471fa6faf24d4776d65e0509538c105d37/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java#L239-L310
   
   Case:
   - lock 1 holds the lock with value `value-1`.
   - lock 2 tries to acquire the lock with value `value-1` got the `LockBusyException` then invoked `revalidate` to steal the lock. (delete and re-create it)
   - lock 1 receives the delete notification **after lock 2 already acquires the lock**. then lock1 tries to invoke `revalidate` to steal the lock again.
   - Under this assumption, we're going to fall into an infinite loop.
   
   This is a theoretical assumption because the situation is more difficult to simulate. Please take a look, I'm not sure if I missing something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973835719


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   When I am writing this test, I found an interesting thing. We can allow the new lock to steal the existing lock that may hold by others(same value). 
   I'm not sure if it's a big problem. You can use this test to verify this behaviour.
   
   Plus, steal lock behaviour may cause an infinity loop when they use the same value or a different value in the same session. the details please see `ResourceLockImpl#doRevalidate`
   
   https://github.com/apache/pulsar/blob/69f3f7471fa6faf24d4776d65e0509538c105d37/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java#L239-L310
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Complete expire future when revalidate got `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973535614


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -232,6 +217,23 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            Throwable realCause = FutureUtil.unwrapCompletionException(ex);
+            synchronized (ResourceLockImpl.this) {
+                if (realCause instanceof BadVersionException || realCause instanceof LockBusyException) {
+                    log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
+                    state = State.Released;
+                    expiredFuture.complete(null);
+                } else {
+                    // We failed to revalidate the lock due to connectivity issue
+                    // Continue assuming we hold the lock, until we can revalidate it, either
+                    // on Reconnected or SessionReestablished events.
+                    log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,

Review Comment:
   there may lost update `revalidateAfterReconnection`, the fix PR #17664



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Technoboy- commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
Technoboy- commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973978140


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -184,38 +185,21 @@ synchronized void lockWasInvalidated() {
         }
 
         log.info("Lock on resource {} was invalidated", path);
-        revalidate(value)
-                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path))
-                .exceptionally(ex -> {
-                    synchronized (ResourceLockImpl.this) {
-                        if (ex.getCause() instanceof BadVersionException) {
-                            log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
-                            state = State.Released;
-                            expiredFuture.complete(null);
-                        } else {
-                            // We failed to revalidate the lock due to connectivity issue
-                            // Continue assuming we hold the lock, until we can revalidate it, either
-                            // on Reconnected or SessionReestablished events.
-                            revalidateAfterReconnection = true;
-                            log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,
-                                    ex.getCause().getMessage());
-                        }
-                    }
-                    return null;
-                });
+        revalidate(value, true)
+                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path));
     }
 
     synchronized CompletableFuture<Void> revalidateIfNeededAfterReconnection() {
         if (revalidateAfterReconnection) {
             revalidateAfterReconnection = false;
             log.warn("Revalidate lock at {} after reconnection", path);
-            return revalidate(value);
+            return revalidate(value, true);
         } else {
             return CompletableFuture.completedFuture(null);
         }
     }
 
-    synchronized CompletableFuture<Void> revalidate(T newValue) {
+    synchronized CompletableFuture<Void> revalidate(T newValue, boolean retryWhenConnectionLost) {

Review Comment:
   > About the variable name, maybe just reuse `revalidateAfterReconnection`?
   
   More confused.
   If we call `revalidateAfterReconnection`, maybe we need to set `revalidateAfterReconnection`= true at line 223~227



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973835719


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   When I am writing this test, I found an interesting thing. We can allow the new lock to steal the existing lock that may hold by others(same value or same session). 
   
   I'm not sure if it's a big problem. You can use this test to verify this behaviour.
   
   Plus, steal lock behaviour may cause an infinity loop when they use the same value or a different value in the same session. the details please see `ResourceLockImpl#doRevalidate`
   
   https://github.com/apache/pulsar/blob/69f3f7471fa6faf24d4776d65e0509538c105d37/pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java#L239-L310
   
   Case:
   - lock 1 holds the lock with value `value-1`.
   - lock 2 tries to acquire the lock with value `value-1` got the `LockBusyException` then invoked `revalidate` to steal the lock. (delete and re-create it)
   - lock 1 receives the delete notification **after lock 2 already acquires the lock**. then lock1 tries to invoke `revalidate` to steal the lock again.
   - Under this assumption, we're going to fall into an infinite loop.
   
   This is a theoretical assumption because the situation is more difficult to simulate. Please take a look, I'm not sure if I missing something.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] Jason918 commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
Jason918 commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973826990


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -233,6 +217,30 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            synchronized (ResourceLockImpl.this) {
+                if (!retryWhenConnectionLost) {

Review Comment:
   This can just merge into the `if` at L229.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974059590


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   @merlimat  Could you please help to answer @codelipenghui  question?
   I'm not sure about the original design.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974059590


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   @merlimat  Could you please help to answer @codelipenghui  question?
   I'm not sure the original design.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974059590


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   @merlimat  Could you please help to answer @codelipenghui  question?
   I'm not sure about the original design and it looks like we don't have the logic to acquire a resource lock with the same value in pulsar



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974041772


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   Since the current logic already supports stealing locks, I just use it to make the tests easier to write. otherwise, I have to use complex operations to do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974041772


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   Since the current logic already supports stealing locks, I just use it to make the tests easier to write. otherwise, I have to use complex operations to do it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974780117


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.
+        Awaitility.await().untilAsserted(()-> {
+            // Ensure steal the lock success.
+           lock2.set(lm2.acquireLock(path1, "value-1").join());

Review Comment:
   Thanks, and fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui merged pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
codelipenghui merged PR #17700:
URL: https://github.com/apache/pulsar/pull/17700


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973835719


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   When I am writing this test, I found an interesting thing. We can allow the new lock to steal the existing lock that may hold by others(same value). 
   I'm not sure if it's a big problem. You can use this test to verify this behaviour.
   
   Plus, steal lock behaviour may cause an infinity loop when they use the same value or a different value in the same session. the details please see `ResourceLockImpl#doRevalidate`
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974038820


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.
+        Awaitility.await().untilAsserted(()-> {
+            // Ensure steal the lock success.
+           lock2.set(lm2.acquireLock(path1, "value-1").join());

Review Comment:
   Because steal lock may fail and throw an exception. but we need to ensure lock2 steal lock success and then trigger lock1 to revalidate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974041872


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -233,6 +217,25 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            synchronized (ResourceLockImpl.this) {
+                Throwable realCause = FutureUtil.unwrapCompletionException(ex);
+                if (!revalidateAfterReconnection || realCause instanceof BadVersionException
+                        || realCause instanceof LockBusyException) {
+                    log.warn("Failed to revalidate the lock at {}. Marked as expired", path);

Review Comment:
   Sure. and fixed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974059590


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   @merlimat  Could you please help to answer @codelipenghui  question?
   I'm not sure about the original design and it looks like we don't have the logic to acquire a resource lock with the same value.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974780117


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.
+        Awaitility.await().untilAsserted(()-> {
+            // Ensure steal the lock success.
+           lock2.set(lm2.acquireLock(path1, "value-1").join());

Review Comment:
   fixed, PTAL.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#issuecomment-1250419110

   > If I understand this right, the root cause this case is that we missed handling revalidate failure in `revalidateIfNeededAfterReconnection`. Can we just add it there, like `lockWasInvalidated` did?
   
   I don't want to invoke exception handling everywhere.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973835719


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   When I writing this test, I found an interesting thing.  we can allow the new lock to steal the existing lock that may hold by others(same value). 
   I'm not sure if it's a big problem, I can paste my analysis later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973831241


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -184,38 +185,21 @@ synchronized void lockWasInvalidated() {
         }
 
         log.info("Lock on resource {} was invalidated", path);
-        revalidate(value)
-                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path))
-                .exceptionally(ex -> {
-                    synchronized (ResourceLockImpl.this) {
-                        if (ex.getCause() instanceof BadVersionException) {
-                            log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
-                            state = State.Released;
-                            expiredFuture.complete(null);
-                        } else {
-                            // We failed to revalidate the lock due to connectivity issue
-                            // Continue assuming we hold the lock, until we can revalidate it, either
-                            // on Reconnected or SessionReestablished events.
-                            revalidateAfterReconnection = true;
-                            log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,
-                                    ex.getCause().getMessage());
-                        }
-                    }
-                    return null;
-                });
+        revalidate(value, true)
+                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path));
     }
 
     synchronized CompletableFuture<Void> revalidateIfNeededAfterReconnection() {
         if (revalidateAfterReconnection) {
             revalidateAfterReconnection = false;
             log.warn("Revalidate lock at {} after reconnection", path);
-            return revalidate(value);
+            return revalidate(value, true);
         } else {
             return CompletableFuture.completedFuture(null);
         }
     }
 
-    synchronized CompletableFuture<Void> revalidate(T newValue) {
+    synchronized CompletableFuture<Void> revalidate(T newValue, boolean retryWhenConnectionLost) {

Review Comment:
   Ok @Jason918 



##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -184,38 +185,21 @@ synchronized void lockWasInvalidated() {
         }
 
         log.info("Lock on resource {} was invalidated", path);
-        revalidate(value)
-                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path))
-                .exceptionally(ex -> {
-                    synchronized (ResourceLockImpl.this) {
-                        if (ex.getCause() instanceof BadVersionException) {
-                            log.warn("Failed to revalidate the lock at {}. Marked as expired", path);
-                            state = State.Released;
-                            expiredFuture.complete(null);
-                        } else {
-                            // We failed to revalidate the lock due to connectivity issue
-                            // Continue assuming we hold the lock, until we can revalidate it, either
-                            // on Reconnected or SessionReestablished events.
-                            revalidateAfterReconnection = true;
-                            log.warn("Failed to revalidate the lock at {}. Retrying later on reconnection {}", path,
-                                    ex.getCause().getMessage());
-                        }
-                    }
-                    return null;
-                });
+        revalidate(value, true)
+                .thenRun(() -> log.info("Successfully revalidated the lock on {}", path));
     }
 
     synchronized CompletableFuture<Void> revalidateIfNeededAfterReconnection() {
         if (revalidateAfterReconnection) {
             revalidateAfterReconnection = false;
             log.warn("Revalidate lock at {} after reconnection", path);
-            return revalidate(value);
+            return revalidate(value, true);
         } else {
             return CompletableFuture.completedFuture(null);
         }
     }
 
-    synchronized CompletableFuture<Void> revalidate(T newValue) {
+    synchronized CompletableFuture<Void> revalidate(T newValue, boolean retryWhenConnectionLost) {

Review Comment:
   Ok



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973831103


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -233,6 +217,30 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            synchronized (ResourceLockImpl.this) {
+                if (!retryWhenConnectionLost) {

Review Comment:
   Sure, and fixed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Complete expire future when revalidate got `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r973535652


##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -232,6 +217,23 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {

Review Comment:
   Catch exceptions to clean up the internal state.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974059590


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   @merlimat  Can you help to confirm?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] mattisonchao commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
mattisonchao commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974041772


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.

Review Comment:
   Since the current logic can support stealing locks, I just use it to make the tests easier to write. otherwise, I have to use complex operations to do it.



##########
pulsar-metadata/src/main/java/org/apache/pulsar/metadata/coordination/impl/ResourceLockImpl.java:
##########
@@ -233,6 +217,25 @@ synchronized CompletableFuture<Void> revalidate(T newValue) {
             });
             revalidateFuture = newFuture;
         }
+        revalidateFuture.exceptionally(ex -> {
+            synchronized (ResourceLockImpl.this) {
+                Throwable realCause = FutureUtil.unwrapCompletionException(ex);
+                if (!revalidateAfterReconnection || realCause instanceof BadVersionException
+                        || realCause instanceof LockBusyException) {
+                    log.warn("Failed to revalidate the lock at {}. Marked as expired", path);

Review Comment:
   Sure.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [pulsar] codelipenghui commented on a diff in pull request #17700: [fix][metadata] Cleanup state when lock revalidation gets `LockBusyException`

Posted by GitBox <gi...@apache.org>.
codelipenghui commented on code in PR #17700:
URL: https://github.com/apache/pulsar/pull/17700#discussion_r974094454


##########
pulsar-metadata/src/test/java/org/apache/pulsar/metadata/LockManagerTest.java:
##########
@@ -293,4 +296,55 @@ public void revalidateLockOnDifferentSession(String provider, Supplier<String> u
             assertEquals(new String(store1.get(path2).join().get().getValue()), "\"value-1\"");
         });
     }
+
+    @Test(dataProvider = "impl")
+    public void testCleanUpStateWhenRevalidationGotLockBusy(String provider, Supplier<String> urlSupplier)
+            throws Exception {
+
+        if (provider.equals("Memory") || provider.equals("RocksDB")) {
+            // Local memory provider doesn't really have the concept of multiple sessions
+            return;
+        }
+
+        @Cleanup
+        MetadataStoreExtended store1 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+        @Cleanup
+        MetadataStoreExtended store2 = MetadataStoreExtended.create(urlSupplier.get(),
+                MetadataStoreConfig.builder().build());
+
+        @Cleanup
+        CoordinationService cs1 = new CoordinationServiceImpl(store1);
+        @Cleanup
+        LockManager<String> lm1 = cs1.getLockManager(String.class);
+
+        @Cleanup
+        CoordinationService cs2 = new CoordinationServiceImpl(store2);
+        @Cleanup
+        LockManager<String> lm2 = cs2.getLockManager(String.class);
+
+        String path1 = newKey();
+
+        ResourceLock<String> lock1 = lm1.acquireLock(path1, "value-1").join();
+        AtomicReference<ResourceLock<String>> lock2 = new AtomicReference<>();
+        // lock 2 will steal the distributed lock first.
+        Awaitility.await().untilAsserted(()-> {
+            // Ensure steal the lock success.
+           lock2.set(lm2.acquireLock(path1, "value-1").join());

Review Comment:
   But `untilAsserted` will not handle all exceptions.
   
   For example:
   
   ```java
       @Test
       public void testA() {
           Awaitility.await().untilAsserted(() -> {
               throw new LockBusyException();
           });
       }
   ```
   
   ```
   java.lang.RuntimeException
   	at org.apache.pulsar.metadata.LockManagerTest.lambda$testA$4(LockManagerTest.java:354)
   	at org.awaitility.core.AssertionCondition.lambda$new$0(AssertionCondition.java:53)
   	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:248)
   	at org.awaitility.core.ConditionAwaiter$ConditionPoller.call(ConditionAwaiter.java:235)
   	at java.base/java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:264)
   	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java)
   	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
   	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   	at java.base/java.lang.Thread.run(Thread.java:833)
   ```
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@pulsar.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org