You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2021/08/30 11:33:25 UTC

[GitHub] [jackrabbit-oak] mreutegg commented on a change in pull request #346: OAK-9535 : support/fix recovery of large branch merge that could breach the 16MB document size restriction on MongoDB

mreutegg commented on a change in pull request #346:
URL: https://github.com/apache/jackrabbit-oak/pull/346#discussion_r698350304



##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java
##########
@@ -309,6 +317,8 @@ public void sweepUpdate(Map<Path, UpdateOp> updates)
         long startOfScan = clock.getTime();
         long lastLog = startOfScan;
 
+        final List<Revision> pseudoBcRevs = new ArrayList<Revision>();

Review comment:
       Can use diamond operator.
   ```suggestion
           final List<Revision> pseudoBcRevs = new ArrayList<>();
   ```

##########
File path: oak-store-document/src/main/java/org/apache/jackrabbit/oak/plugins/document/LastRevRecoveryAgent.java
##########
@@ -364,7 +374,41 @@ public void sweepUpdate(Map<Path, UpdateOp> updates)
                     unsavedParents.put(path, lastRevForParents);
                 }
             }
+            // avoid recalculating the size of the updateOp upon every single path
+            // but also avoid doing it only after we hit the 16MB limit
+            if (changes.getNumChangedNodes() >= nextFlushCheckCount) {
+                final Revision pseudoBcRev = Revision.newRevision(clusterId).asBranchRevision();
+                final UpdateOp pseudoBcUpdateOp = changes.asUpdateOp(pseudoBcRev);
+                final int approxPseudoBcUpdateOpSize = pseudoBcUpdateOp.toString().length();
+                if (approxPseudoBcUpdateOpSize >= PSEUDO_BRANCH_COMMIT_UPDATE_OP_THRESHOLD_BYTES) {
+                    // flush the (pseudo) journal entry
+                    // regarding 'pseudo' : this journal entry, while being a branch commit,
+                    // does not correspond to an actual branch commit that happened before the crash.
+                    // we might be able to in theory reconstruct the very original branch commits,
+                    // but that's a tedious job, and we were not doing that prior to OAK-9535 neither.
+                    // hence the optimization built-in here is that we create a journal entry
+                    // of type 'branch commit', but with a revision that is different from
+                    // what originally happened. Thx to the fact that the JournalEntry just
+                    // contains a list of branch commit journal ids, that should work fine.
+                    if (store.create(JOURNAL, singletonList(pseudoBcUpdateOp))) {
+                        log.info("recover : created intermediate pseudo-bc journal entry with rev {} and approx size {} bytes.",
+                                pseudoBcRev, approxPseudoBcUpdateOpSize);
+                        pseudoBcRevs.add(pseudoBcRev);
+                        changes = JOURNAL.newDocument(store);
+                        nextFlushCheckCount = PSEUDO_BRANCH_COMMIT_FLUSH_CHECK_COUNT;
+                    } else {
+                        log.warn("recover : could not create intermediate pseudo-bc journal entry with rev {}",
+                                pseudoBcRev);
+                        // retry a little later then, hence reduce the next counter by half an interval
+                        nextFlushCheckCount += changes.getNumChangedNodes() + (PSEUDO_BRANCH_COMMIT_FLUSH_CHECK_COUNT / 2);

Review comment:
       I would rather fail recovery in this case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org