You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2021/09/03 13:11:06 UTC

[GitHub] [jackrabbit-oak] Joscorbe opened a new pull request #359: Make the renewLease mechanism a bit more tolerant against unexpected updates

Joscorbe opened a new pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359


   - Added a new RuntimeID field to ensure the update is coming from the same ClusterNode.
   - Added a new condition to ensure not older updates are executed.
   
   If an update comes from the same RuntimeID and it's newer, we will still accept it even if it was not expected at that moment.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg merged pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
mreutegg merged pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg commented on pull request #359: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
mreutegg commented on pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#issuecomment-914286471


   Introducing the `runtimeId` makes `previousLeaseEndTime` obsolete. The latter can now be removed from `ClusterNodeInfo`.
   
   https://github.com/apache/jackrabbit-oak/pull/359/files#diff-f9b067ac268deaf9345ecba3c52f7028bb596a533dd47d902dc9c0d8e29c2646R311


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] Joscorbe commented on a change in pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
Joscorbe commented on a change in pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#discussion_r711947759



##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/UpdateOp.java
##########
@@ -557,6 +590,16 @@ public static Condition newNotEqualsCondition(@Nullable Object value) {
             return new Condition(Type.NOTEQUALS, value);
         }
 
+        /**
+         * Creates a new lessThan condition with the given value.
+         *
+         * @param value the value to compare t
+         * @return the lessThan condition.
+         */
+        public static Condition newLessThanCondition(@Nullable Object value) {

Review comment:
       I have removed all the logic related with "lessThan", as now I'm using UpdateOp.max() as you proposed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg commented on pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
mreutegg commented on pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#issuecomment-931961913


   There are test failures with the current changes in the PR: https://ci-builds.apache.org/job/Jackrabbit/job/oak-trunk-pr/job/PR-359/3/testReport/junit/org.apache.jackrabbit.oak.plugins.document/?cloudbees-analytics-link=scm-reporting%2Ftests%2Ffailed
   
   I can also reproduce them locally. Can you please analyze why they fail and whether they need to be updated because some assumptions no longer hold?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg commented on a change in pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
mreutegg commented on a change in pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#discussion_r719984259



##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoTest.java
##########
@@ -269,6 +271,110 @@ public void renewLeaseTimedOutWithCheck() throws Exception {
         }
     }
 
+    // OAK-9564
+    @Test
+    public void renewLeaseSameRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        String runtimeId = info.getRuntimeId();
+        long leaseEnd = info.getLeaseEndTime();
+        waitLeaseUpdateInterval();
+        assertTrue(info.renewLease());
+        assertTrue(info.getLeaseEndTime() > leaseEnd);
+        // The Runtime UUID should remain the same
+        assertEquals(info.getRuntimeId(), runtimeId);
+        assertFalse(handler.isLeaseFailure());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseDifferentRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+
+        // Modify the UUID to mock it belongs to a different node
+        UpdateOp update = new UpdateOp("1", false);
+        update.set(ClusterNodeInfo.RUNTIME_ID_KEY, "different-uuid");
+        store.findAndUpdate(Collection.CLUSTER_NODES, update);
+
+        try {
+            info.renewLease();
+            fail("Could not update lease anymore");
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // Lease end time shouldn't be different
+        assertEquals(leaseEndTimeBeforeRenew, info.getLeaseEndTime());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseTakingLongerThanTimeout() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        final long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+        final String runtimeId = info.getRuntimeId();
+
+        Map<String, Long> unexpectedLeaseEnd = new HashMap<>();
+        long unexpectedLeaseEndTime = info.getLeaseEndTime() + 133333;
+        unexpectedLeaseEnd.put(ClusterNodeInfo.LEASE_END_KEY, unexpectedLeaseEndTime);
+
+        // The update will fail after 30 seconds. Simulating a Mongo timeout.
+        store.setFailAfterUpdate(1);
+        store.setDelayMillisOnce(30000);
+        store.setDelayMillis(10000);
+        store.setFindShouldAlterReturnDocument(true);
+        // However, the following find after the update will return an
+        // unexpected lease time (but still within a valid time).
+        // This unexpected update could come from a previous but very slow update
+        // executed in Mongo. So it's still a valid one, but not the new one
+        // that is expected.
+        store.setMapAlterReturnDocument(unexpectedLeaseEnd);
+
+        // However, the current behaviour is that as the lease end time doesn't
+        // match the expected one, the lease will fail and the nodeStore becomes
+        // unusable.
+        try {
+            info.renewLease();
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // The new leaseEndTime coming from Mongo is not reflected in the
+        // ClusterNodeInfo. Meaning it will eventually be treated as 'expired'
+        // by the DocumentNodeStore, even when in Mongo it was set.
+        assertThat(leaseEndTimeBeforeRenew, lessThan(info.getLeaseEndTime()));
+        assertEquals(unexpectedLeaseEndTime, info.getLeaseEndTime());
+        // Runtime ID is the same
+        assertEquals(runtimeId, info.getRuntimeId());
+    }
+
+    // OAK-9564: This is a someway artificial test. The idea behind is to try to reproduce
+    // a case where a renewLease fails because of a timeout. Then the following renewLease
+    // occurs faster, but during that time the previous update is executed in Mongo.
+    // That 'older' update shouldn't go through now, reducing the effective lease end time.
+    @Test
+    public void renewLeaseShouldNotGoBackInTime() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        long leaseEnd = info.getLeaseEndTime();
+        waitLeaseUpdateInterval();
+
+        long newerLeaseEndTime = clock.getTime() + ClusterNodeInfo.DEFAULT_LEASE_DURATION_MILLIS +
+                ClusterNodeInfo.DEFAULT_LEASE_UPDATE_INTERVAL_MILLIS;
+        // simulate a newer renew lease took place
+        UpdateOp update = new UpdateOp("1", false);
+        update.set(ClusterNodeInfo.LEASE_END_KEY, newerLeaseEndTime);
+        store.findAndUpdate(Collection.CLUSTER_NODES, update);
+
+        // now another renew happens, which will try to set a lesser lease end
+        info.renewLease();
+
+        Document info2 = store.find(Collection.CLUSTER_NODES, "1");
+        // the lease end time should remain the same
+        assertEquals(newerLeaseEndTime, info2.get(ClusterNodeInfo.LEASE_END_KEY));

Review comment:
       Instead you can also do:
   ```suggestion
           ClusterNodeInfoDocument info2 = store.find(Collection.CLUSTER_NODES, "1");
           assertNotNull(info2);
           // the lease end time should remain the same
           assertEquals(newerLeaseEndTime, info2.getLeaseEndTime());
   ```

##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfo.java
##########
@@ -1067,10 +1067,9 @@ public boolean renewLease() throws DocumentStoreException {
                             "anymore. {}", id, doc);
                     // break here and let the next lease update attempt fail
                     break;
-                } else if (doc.getLeaseEndTime() == previousLeaseEndTime
-                        || doc.getLeaseEndTime() == updatedLeaseEndTime) {
-                    // set lease end times to current values
-                    previousLeaseEndTime = doc.getLeaseEndTime();
+                } else if (doc.getRuntimeId().equals(runtimeId)) {

Review comment:
       This may produce a NullPointerException. `doc.getRuntimeId()` may return `null`. Better flip it around.
   ```suggestion
                   } else if (runtimeId.equals(doc.getRuntimeId())) {
   ```

##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoTest.java
##########
@@ -269,6 +271,110 @@ public void renewLeaseTimedOutWithCheck() throws Exception {
         }
     }
 
+    // OAK-9564
+    @Test
+    public void renewLeaseSameRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        String runtimeId = info.getRuntimeId();
+        long leaseEnd = info.getLeaseEndTime();
+        waitLeaseUpdateInterval();
+        assertTrue(info.renewLease());
+        assertTrue(info.getLeaseEndTime() > leaseEnd);
+        // The Runtime UUID should remain the same
+        assertEquals(info.getRuntimeId(), runtimeId);
+        assertFalse(handler.isLeaseFailure());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseDifferentRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+
+        // Modify the UUID to mock it belongs to a different node
+        UpdateOp update = new UpdateOp("1", false);
+        update.set(ClusterNodeInfo.RUNTIME_ID_KEY, "different-uuid");
+        store.findAndUpdate(Collection.CLUSTER_NODES, update);
+
+        try {
+            info.renewLease();
+            fail("Could not update lease anymore");

Review comment:
       ```suggestion
               fail("Should not update lease anymore");
   ```

##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoTest.java
##########
@@ -269,6 +271,110 @@ public void renewLeaseTimedOutWithCheck() throws Exception {
         }
     }
 
+    // OAK-9564
+    @Test
+    public void renewLeaseSameRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        String runtimeId = info.getRuntimeId();
+        long leaseEnd = info.getLeaseEndTime();
+        waitLeaseUpdateInterval();
+        assertTrue(info.renewLease());
+        assertTrue(info.getLeaseEndTime() > leaseEnd);
+        // The Runtime UUID should remain the same
+        assertEquals(info.getRuntimeId(), runtimeId);
+        assertFalse(handler.isLeaseFailure());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseDifferentRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+
+        // Modify the UUID to mock it belongs to a different node
+        UpdateOp update = new UpdateOp("1", false);
+        update.set(ClusterNodeInfo.RUNTIME_ID_KEY, "different-uuid");
+        store.findAndUpdate(Collection.CLUSTER_NODES, update);
+
+        try {
+            info.renewLease();
+            fail("Could not update lease anymore");
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // Lease end time shouldn't be different
+        assertEquals(leaseEndTimeBeforeRenew, info.getLeaseEndTime());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseTakingLongerThanTimeout() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        final long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+        final String runtimeId = info.getRuntimeId();
+
+        Map<String, Long> unexpectedLeaseEnd = new HashMap<>();
+        long unexpectedLeaseEndTime = info.getLeaseEndTime() + 133333;
+        unexpectedLeaseEnd.put(ClusterNodeInfo.LEASE_END_KEY, unexpectedLeaseEndTime);
+
+        // The update will fail after 30 seconds. Simulating a Mongo timeout.
+        store.setFailAfterUpdate(1);
+        store.setDelayMillisOnce(30000);
+        store.setDelayMillis(10000);
+        store.setFindShouldAlterReturnDocument(true);
+        // However, the following find after the update will return an
+        // unexpected lease time (but still within a valid time).
+        // This unexpected update could come from a previous but very slow update
+        // executed in Mongo. So it's still a valid one, but not the new one
+        // that is expected.
+        store.setMapAlterReturnDocument(unexpectedLeaseEnd);
+
+        // However, the current behaviour is that as the lease end time doesn't
+        // match the expected one, the lease will fail and the nodeStore becomes
+        // unusable.
+        try {
+            info.renewLease();
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // The new leaseEndTime coming from Mongo is not reflected in the
+        // ClusterNodeInfo. Meaning it will eventually be treated as 'expired'
+        // by the DocumentNodeStore, even when in Mongo it was set.
+        assertThat(leaseEndTimeBeforeRenew, lessThan(info.getLeaseEndTime()));
+        assertEquals(unexpectedLeaseEndTime, info.getLeaseEndTime());
+        // Runtime ID is the same
+        assertEquals(runtimeId, info.getRuntimeId());
+    }
+
+    // OAK-9564: This is a someway artificial test. The idea behind is to try to reproduce
+    // a case where a renewLease fails because of a timeout. Then the following renewLease
+    // occurs faster, but during that time the previous update is executed in Mongo.
+    // That 'older' update shouldn't go through now, reducing the effective lease end time.
+    @Test
+    public void renewLeaseShouldNotGoBackInTime() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        long leaseEnd = info.getLeaseEndTime();

Review comment:
       `leaseEnd` is unused -> remove
   ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] Joscorbe commented on a change in pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
Joscorbe commented on a change in pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#discussion_r711947384



##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/UpdateUtils.java
##########
@@ -150,6 +150,18 @@ public static boolean checkConditions(@NotNull Document doc,
                 } else if (c.type == Condition.Type.NOTEQUALS && equal) {
                     return false;
                 }
+            } else if (c.type == Condition.Type.LESSTHAN) {
+                if (r != null) {
+                    if (value instanceof Map) {
+                        value = ((Map) value).get(r);
+                    } else {
+                        value = null;
+                    }
+                }
+                if (value instanceof Comparable && c.value instanceof Comparable) {
+                    int lessThan = ((Comparable) value).compareTo(c.value);
+                    return lessThan == -1;
+                }

Review comment:
       I have removed all the logic related with "lessThan", as now I'm using UpdateOp.max() as you proposed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] Joscorbe commented on pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
Joscorbe commented on pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#issuecomment-1006772447


   > There are test failures with the current changes in the PR: https://ci-builds.apache.org/job/Jackrabbit/job/oak-trunk-pr/job/PR-359/3/testReport/junit/org.apache.jackrabbit.oak.plugins.document/?cloudbees-analytics-link=scm-reporting%2Ftests%2Ffailed
   > 
   > I can also reproduce them locally. Can you please analyze why they fail and whether they need to be updated because some assumptions no longer hold?
   
   Finally! Could you please review again? @mreutegg 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg commented on a change in pull request #359: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
mreutegg commented on a change in pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#discussion_r703558877



##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoTest.java
##########
@@ -417,6 +526,29 @@ public void skipExpiredClusterIdWithDifferentInstanceId() throws Exception {
         info2.dispose();
     }
 
+    // OAK-9564
+    @Test
+    public void cannotGetClusterWithRuntimeId() {
+        ClusterNodeInfo info = newClusterNodeInfo(0);
+        int id = info.getId();
+        assertEquals(1, id);
+        // shut it down
+        info.dispose();
+
+        // edit the runtime ID
+        UpdateOp op = new UpdateOp(String.valueOf(id), false);
+        op.set(ClusterNodeInfo.RUNTIME_ID_KEY, "some-different-uuid");
+        assertNotNull(store.findAndUpdate(Collection.CLUSTER_NODES, op));
+
+        // shouldn't be able to acquire it again
+        try {
+            info = newClusterNodeInfo(0);

Review comment:
       Should rather try to get the specific entry with the id.
   ```suggestion
               info = newClusterNodeInfo(id);
   ```

##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoTest.java
##########
@@ -268,6 +270,113 @@ public void renewLeaseTimedOutWithCheck() throws Exception {
         }
     }
 
+    // OAK-9564
+    @Test
+    public void renewLeaseSameRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        String runtimeId = info.getRuntimeId();
+        long leaseEnd = info.getLeaseEndTime();
+        waitLeaseUpdateInterval();
+        assertTrue(info.renewLease());
+        assertTrue(info.getLeaseEndTime() > leaseEnd);
+        // The Runtime UUID should remain the same
+        assertEquals(info.getRuntimeId(), runtimeId);
+        assertFalse(handler.isLeaseFailure());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseDifferentRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+
+        // Modify the UUID to mock it belongs to a different node
+        UpdateOp update = new UpdateOp("1", false);
+        update.set(ClusterNodeInfo.RUNTIME_ID_KEY, "different-uuid");
+        store.findAndUpdate(Collection.CLUSTER_NODES, update);
+
+        try {
+            info.renewLease();
+            fail("Could not update lease anymore");
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // Lease end time shouldn't be different
+        assertEquals(leaseEndTimeBeforeRenew, info.getLeaseEndTime());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseTakingLongerThanTimeout() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        final long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+        final String runtimeId = info.getRuntimeId();
+
+        Map<String, Long> unexpectedLeaseEnd = new HashMap<>();
+        unexpectedLeaseEnd.put(ClusterNodeInfo.LEASE_END_KEY, info.getLeaseEndTime() + 133333);
+
+        // The update will fail after 30 seconds. Simulating a Mongo timeout.
+        store.setFailAfterUpdate(1);
+        store.setDelayMillisOnce(30000);
+        store.setDelayMillis(10000);
+        store.setFindShouldAlterReturnDocument(true);
+        // However, the following find after the update will return an
+        // unexpected lease time (but still within a valid time).
+        // This unexpected update could come from a previous but very slow update
+        // executed in Mongo. So it's still a valid one, but not the new one
+        // that is expected.
+        store.setMapAlterReturnDocument(unexpectedLeaseEnd);
+
+        // However, the current behaviour is that as the lease end time doesn't
+        // match the expected one, the lease will fail and the nodeStore becomes
+        // unusable.
+        try {
+            info.renewLease();
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // The new leaseEndTime coming from Mongo is not reflected in the
+        // ClusterNodeInfo. Meaning it will eventually be treated as 'expired'
+        // by the DocumentNodeStore, even when in Mongo it was set.
+        assertThat(leaseEndTimeBeforeRenew, lessThan(info.getLeaseEndTime()));
+        // Runtime ID is the same
+        assertEquals(runtimeId, info.getRuntimeId());
+    }
+
+    // OAK-9564: This is a someway artificial test. The idea behind is to try to reproduce

Review comment:
       Is this the only reason why this PR also introduces a new `UpdateOp.lessThan()` condition on the `leaseEnd` time when it updates the lease? Instead of introducing a new condition, do you think it's possible to use `UpdateOp.max()`? This would also ensure `leaseEnd` does not go back in time.

##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoDocument.java
##########
@@ -69,6 +69,13 @@ public long getStartTime() {
         return startTime;
     }
 
+    /**
+     * @return the Runtime ID for this cluster node.
+     */
+    public String getRuntimeId() {

Review comment:
       Please annotate with Nullable.

##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfo.java
##########
@@ -1066,9 +1084,9 @@ public boolean renewLease() throws DocumentStoreException {
                             "anymore. {}", id, doc);
                     // break here and let the next lease update attempt fail
                     break;
-                } else if (doc.getLeaseEndTime() == previousLeaseEndTime
-                        || doc.getLeaseEndTime() == updatedLeaseEndTime) {
-                    // set lease end times to current values
+                } else if (doc.getRuntimeId().equals(runtimeId)) {
+                    // set lease end times to current values, as they belong
+                    // to this same cluster node
                     previousLeaseEndTime = doc.getLeaseEndTime();

Review comment:
       `previousLeaseEndTime` does not need to be maintained anymore.

##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoTest.java
##########
@@ -268,6 +270,113 @@ public void renewLeaseTimedOutWithCheck() throws Exception {
         }
     }
 
+    // OAK-9564
+    @Test
+    public void renewLeaseSameRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        String runtimeId = info.getRuntimeId();
+        long leaseEnd = info.getLeaseEndTime();
+        waitLeaseUpdateInterval();
+        assertTrue(info.renewLease());
+        assertTrue(info.getLeaseEndTime() > leaseEnd);
+        // The Runtime UUID should remain the same
+        assertEquals(info.getRuntimeId(), runtimeId);
+        assertFalse(handler.isLeaseFailure());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseDifferentRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+
+        // Modify the UUID to mock it belongs to a different node
+        UpdateOp update = new UpdateOp("1", false);
+        update.set(ClusterNodeInfo.RUNTIME_ID_KEY, "different-uuid");
+        store.findAndUpdate(Collection.CLUSTER_NODES, update);
+
+        try {
+            info.renewLease();
+            fail("Could not update lease anymore");
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // Lease end time shouldn't be different
+        assertEquals(leaseEndTimeBeforeRenew, info.getLeaseEndTime());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseTakingLongerThanTimeout() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        final long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+        final String runtimeId = info.getRuntimeId();
+
+        Map<String, Long> unexpectedLeaseEnd = new HashMap<>();
+        unexpectedLeaseEnd.put(ClusterNodeInfo.LEASE_END_KEY, info.getLeaseEndTime() + 133333);
+
+        // The update will fail after 30 seconds. Simulating a Mongo timeout.
+        store.setFailAfterUpdate(1);
+        store.setDelayMillisOnce(30000);
+        store.setDelayMillis(10000);
+        store.setFindShouldAlterReturnDocument(true);
+        // However, the following find after the update will return an
+        // unexpected lease time (but still within a valid time).
+        // This unexpected update could come from a previous but very slow update
+        // executed in Mongo. So it's still a valid one, but not the new one
+        // that is expected.
+        store.setMapAlterReturnDocument(unexpectedLeaseEnd);

Review comment:
       These do not seem to have an effect. Values are set, but never used.

##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/UpdateOp.java
##########
@@ -557,6 +590,16 @@ public static Condition newNotEqualsCondition(@Nullable Object value) {
             return new Condition(Type.NOTEQUALS, value);
         }
 
+        /**
+         * Creates a new lessThan condition with the given value.
+         *
+         * @param value the value to compare t
+         * @return the lessThan condition.
+         */
+        public static Condition newLessThanCondition(@Nullable Object value) {

Review comment:
       What is the semantic when `value` is `null`? I think it would be better if `value` is of type `Comparable<T>`, similar to `UpdateOp.max()`.
   
   I'd also like to better understand if this addition is really necessary. Adding this functionality should also have more test coverage. E.g. additional tests in `BasicDocumentStoreTest`.

##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/UpdateUtils.java
##########
@@ -150,6 +150,18 @@ public static boolean checkConditions(@NotNull Document doc,
                 } else if (c.type == Condition.Type.NOTEQUALS && equal) {
                     return false;
                 }
+            } else if (c.type == Condition.Type.LESSTHAN) {
+                if (r != null) {
+                    if (value instanceof Map) {
+                        value = ((Map) value).get(r);
+                    } else {
+                        value = null;
+                    }
+                }
+                if (value instanceof Comparable && c.value instanceof Comparable) {
+                    int lessThan = ((Comparable) value).compareTo(c.value);
+                    return lessThan == -1;
+                }

Review comment:
       I think this should not silently ignore cases when value is not Comparable, but this could be enforced earlier.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] Joscorbe commented on a change in pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
Joscorbe commented on a change in pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359#discussion_r711948344



##########
File path: oak-store-document/src/test/java/org/apache/jackrabbit/oak/plugins/document/ClusterNodeInfoTest.java
##########
@@ -268,6 +270,113 @@ public void renewLeaseTimedOutWithCheck() throws Exception {
         }
     }
 
+    // OAK-9564
+    @Test
+    public void renewLeaseSameRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        String runtimeId = info.getRuntimeId();
+        long leaseEnd = info.getLeaseEndTime();
+        waitLeaseUpdateInterval();
+        assertTrue(info.renewLease());
+        assertTrue(info.getLeaseEndTime() > leaseEnd);
+        // The Runtime UUID should remain the same
+        assertEquals(info.getRuntimeId(), runtimeId);
+        assertFalse(handler.isLeaseFailure());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseDifferentRuntimeId() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+
+        // Modify the UUID to mock it belongs to a different node
+        UpdateOp update = new UpdateOp("1", false);
+        update.set(ClusterNodeInfo.RUNTIME_ID_KEY, "different-uuid");
+        store.findAndUpdate(Collection.CLUSTER_NODES, update);
+
+        try {
+            info.renewLease();
+            fail("Could not update lease anymore");
+        } catch(DocumentStoreException e) {
+            // expected
+        }
+
+        // Lease end time shouldn't be different
+        assertEquals(leaseEndTimeBeforeRenew, info.getLeaseEndTime());
+    }
+
+    // OAK-9564
+    @Test
+    public void renewLeaseTakingLongerThanTimeout() throws Exception {
+        ClusterNodeInfo info = newClusterNodeInfo(1);
+        waitLeaseUpdateInterval();
+        final long leaseEndTimeBeforeRenew = info.getLeaseEndTime();
+        final String runtimeId = info.getRuntimeId();
+
+        Map<String, Long> unexpectedLeaseEnd = new HashMap<>();
+        unexpectedLeaseEnd.put(ClusterNodeInfo.LEASE_END_KEY, info.getLeaseEndTime() + 133333);
+
+        // The update will fail after 30 seconds. Simulating a Mongo timeout.
+        store.setFailAfterUpdate(1);
+        store.setDelayMillisOnce(30000);
+        store.setDelayMillis(10000);
+        store.setFindShouldAlterReturnDocument(true);
+        // However, the following find after the update will return an
+        // unexpected lease time (but still within a valid time).
+        // This unexpected update could come from a previous but very slow update
+        // executed in Mongo. So it's still a valid one, but not the new one
+        // that is expected.
+        store.setMapAlterReturnDocument(unexpectedLeaseEnd);

Review comment:
       It seems like some changes were not committed in this file. I have added them and improved a bit this test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] mreutegg merged pull request #359: OAK-9564: Make the renewLease mechanism a bit more tolerant against unexpected updates

Posted by GitBox <gi...@apache.org>.
mreutegg merged pull request #359:
URL: https://github.com/apache/jackrabbit-oak/pull/359


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org