You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2021/04/22 19:44:54 UTC

[GitHub] [jackrabbit-oak] Joscorbe opened a new pull request #289: OAK-9401: Avoid calling getLeaseTime on inactive cluster.

Joscorbe opened a new pull request #289:
URL: https://github.com/apache/jackrabbit-oak/pull/289


   Implemented a fix to avoid calling getLeaseTime in the method tryBreakRecoveryLock when the cluster is not active, as it could be null and this throw a NullPointerException.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg closed pull request #289: OAK-9401: Avoid calling getLeaseTime on inactive cluster.

Posted by GitBox <gi...@apache.org>.
mreutegg closed pull request #289:
URL: https://github.com/apache/jackrabbit-oak/pull/289


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] Joscorbe commented on pull request #289: OAK-9401: Avoid calling getLeaseTime on inactive cluster.

Posted by GitBox <gi...@apache.org>.
Joscorbe commented on pull request #289:
URL: https://github.com/apache/jackrabbit-oak/pull/289#issuecomment-834376813


   Thank you for reviewing @mreutegg, I have applied all the proposed changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg commented on pull request #289: OAK-9401: Avoid calling getLeaseTime on inactive cluster.

Posted by GitBox <gi...@apache.org>.
mreutegg commented on pull request #289:
URL: https://github.com/apache/jackrabbit-oak/pull/289#issuecomment-833652073


   There also another suggestion I have, though it is unrelated to the issue.
   
   The executor service should be stopped, otherwise its pooled threads will linger around and use memory.
   
   ```
       public void after() {
           ClusterNodeInfo.resetClockToDefault();
           new ExecutorCloser(executor).close();
       }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg commented on a change in pull request #289: OAK-9401: Avoid calling getLeaseTime on inactive cluster.

Posted by GitBox <gi...@apache.org>.
mreutegg commented on a change in pull request #289:
URL: https://github.com/apache/jackrabbit-oak/pull/289#discussion_r627567459



##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/RecoveryLockTest.java
##########
@@ -182,6 +185,49 @@ public void selfRecoveryWithinDeadline() throws Exception {
         assertFalse(lock1.acquireRecoveryLock(2));
     }
 
+    @Test
+    // OAK-9401: Reproduce a bug that happens when the cluster is not active having a null leaseEndTime
+    public void breakRecoveryLockOfNotActiveCluster() throws Exception {
+        // Create a mocked version of the DocumentStore, to replace a method later during the test
+        DocumentStore store = spy(new MemoryDocumentStore());
+
+        info1 = ClusterNodeInfo.getInstance(store, RecoveryHandler.NOOP,
+                null, "node1", 1);
+        RecoveryLock recLock = new RecoveryLock(store, clock, 1);
+
+        // expire clusterId 1
+        clock.waitUntil(info1.getLeaseEndTime() + DEFAULT_LEASE_UPDATE_INTERVAL_MILLIS);
+        MissingLastRevSeeker seeker = new MissingLastRevSeeker(store, clock);
+
+        Semaphore recovering = new Semaphore(0);
+        Semaphore recovered = new Semaphore(0);
+        // simulate new startup and get info again
+        Future<ClusterNodeInfo> infoFuture = executor.submit(() ->
+                ClusterNodeInfo.getInstance(store, clusterId -> {
+                    assertTrue(recLock.acquireRecoveryLock(1));
+                    recovering.release();
+                    recovered.acquireUninterruptibly();
+                    recLock.releaseRecoveryLock(true);
+                    return true;
+        }, null, "node1", 1));
+        // wait until submitted task is in recovery
+        recovering.acquireUninterruptibly();
+
+        // OAK-9401: Reproduce a bug that happens when the cluster is not active having a null leaseEndTime
+        // create a mocked copy of the ClusterNodeInfoDocument to be able to edit it, as the original is immutable
+        ClusterNodeInfoDocument realClusterInfo = spy(store.find(CLUSTER_NODES, String.valueOf(1)));
+        ClusterNodeInfoDocument mockClusterInfo = spy(CLUSTER_NODES.newDocument(store));
+        realClusterInfo.deepCopy(mockClusterInfo);
+
+        mockClusterInfo.put(ClusterNodeInfo.LEASE_END_KEY, null);
+        doReturn(false).when(mockClusterInfo).isActive();
+        when(store.find(CLUSTER_NODES, String.valueOf(1))).thenCallRealMethod().thenReturn(mockClusterInfo);
+
+        // clusterId 2 should be able to acquire (break) the recovery lock, instead of 
+        // throwing "java.lang.NullPointerException: Lease End Time not set"
+        assertTrue(recLock.acquireRecoveryLock(2));
+    }

Review comment:
       IIUC, the submitted task is still blocked when the test finishes.
   ```suggestion
   
           // let submitted task complete
           recovered.release();
       }
   ```

##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/RecoveryLockTest.java
##########
@@ -182,6 +185,49 @@ public void selfRecoveryWithinDeadline() throws Exception {
         assertFalse(lock1.acquireRecoveryLock(2));
     }
 
+    @Test
+    // OAK-9401: Reproduce a bug that happens when the cluster is not active having a null leaseEndTime
+    public void breakRecoveryLockOfNotActiveCluster() throws Exception {
+        // Create a mocked version of the DocumentStore, to replace a method later during the test
+        DocumentStore store = spy(new MemoryDocumentStore());
+
+        info1 = ClusterNodeInfo.getInstance(store, RecoveryHandler.NOOP,
+                null, "node1", 1);
+        RecoveryLock recLock = new RecoveryLock(store, clock, 1);
+
+        // expire clusterId 1
+        clock.waitUntil(info1.getLeaseEndTime() + DEFAULT_LEASE_UPDATE_INTERVAL_MILLIS);
+        MissingLastRevSeeker seeker = new MissingLastRevSeeker(store, clock);

Review comment:
       This is never used. Remove?
   ```suggestion
   ```

##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/RecoveryLockTest.java
##########
@@ -182,6 +185,49 @@ public void selfRecoveryWithinDeadline() throws Exception {
         assertFalse(lock1.acquireRecoveryLock(2));
     }
 
+    @Test
+    // OAK-9401: Reproduce a bug that happens when the cluster is not active having a null leaseEndTime
+    public void breakRecoveryLockOfNotActiveCluster() throws Exception {
+        // Create a mocked version of the DocumentStore, to replace a method later during the test
+        DocumentStore store = spy(new MemoryDocumentStore());
+
+        info1 = ClusterNodeInfo.getInstance(store, RecoveryHandler.NOOP,
+                null, "node1", 1);
+        RecoveryLock recLock = new RecoveryLock(store, clock, 1);
+
+        // expire clusterId 1
+        clock.waitUntil(info1.getLeaseEndTime() + DEFAULT_LEASE_UPDATE_INTERVAL_MILLIS);
+        MissingLastRevSeeker seeker = new MissingLastRevSeeker(store, clock);
+
+        Semaphore recovering = new Semaphore(0);
+        Semaphore recovered = new Semaphore(0);
+        // simulate new startup and get info again
+        Future<ClusterNodeInfo> infoFuture = executor.submit(() ->

Review comment:
       Similar here. `infoFuture` is not used later on.
   ```suggestion
           executor.submit(() ->
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org