You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2021/08/05 08:17:19 UTC

[GitHub] [jackrabbit-oak] averma21 opened a new pull request #338: OAK-9524: Retry failed data download (resuming from the point it stop…

averma21 opened a new pull request #338:
URL: https://github.com/apache/jackrabbit-oak/pull/338


   …ped) during indexing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] thomasmueller commented on a change in pull request #338: OAK-9524: Retry failed data download (resuming from the point it stop…

Posted by GitBox <gi...@apache.org>.
thomasmueller commented on a change in pull request #338:
URL: https://github.com/apache/jackrabbit-oak/pull/338#discussion_r683408441



##########
File path: oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/CompositeException.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.jackrabbit.oak.index.indexer.document;
+
+import org.slf4j.Logger;
+
+public class CompositeException extends Exception {

Review comment:
       What would happen if we also implement "Throwable[] getSuppressed​()" to return the (array of) collectedThrowables? That way it would be more generally useful (I think). I'm not saying this class should be moved to oak-commons right now... also I would still implement the logAllExceptions method, as it looks useful.

##########
File path: oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/DocumentStoreIndexerBase.java
##########
@@ -133,6 +135,52 @@ public NodeStateEntryTraverser create(LastModifiedRange lastModifiedRange) {
         }
     }
 
+    private FlatFileStore buildFlatFileStore(NodeState checkpointedState, CompositeIndexer indexer) throws IOException {
+
+        Stopwatch flatFileStoreWatch = Stopwatch.createStarted();
+        int executionCount = 1;
+        CompositeException lastException = null;
+        List<File> previousDownloadDirs = new ArrayList<>();
+        FlatFileStore flatFileStore = null;
+        //TODO How to ensure we can safely read from secondary
+        DocumentNodeState rootDocumentState = (DocumentNodeState) checkpointedState;
+        DocumentNodeStore nodeStore = (DocumentNodeStore) indexHelper.getNodeStore();
+
+        DocumentStoreSplitter splitter = new DocumentStoreSplitter(getMongoDocumentStore());
+        List<Long> lastModifiedBreakPoints = splitter.split(Collection.NODES, 0L ,10);
+        FlatFileNodeStoreBuilder builder = null;
+
+        while (flatFileStore == null && executionCount <= MAX_DOWNLOAD_RETRIES) {
+            try {
+                //TODO Use flatFileStore only if we have relative nodes to be indexed

Review comment:
       Not quite sure why this TODO was+is there... I would remove it

##########
File path: oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/MultithreadedTraverseWithSortStrategy.java
##########
@@ -254,7 +258,9 @@ private void resumeFromPreviousState(List<LastModifiedRange> previousState,
             LastModifiedRange currentRange = previousState.get(i);
             LastModifiedRange nextRange = i < previousState.size() - 1 ? previousState.get(i+1) : null;
             if (nextRange != null && currentRange.checkOverlap(nextRange)) {
-                throw new IllegalStateException("Range overlap between " + currentRange + " and " + nextRange);
+                LastModifiedRange merged = currentRange.mergeWith(nextRange);
+                log.info("Range overlap between " + currentRange + " and " + nextRange + ". Using merged range " + merged);
+                currentRange = merged;

Review comment:
       Not sure if I understand the logic... If you merge the ranges, wouldn't you want to remove nextRange as well? Is that anyway done and so not necessary?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] averma21 commented on a change in pull request #338: OAK-9524: Retry failed data download (resuming from the point it stop…

Posted by GitBox <gi...@apache.org>.
averma21 commented on a change in pull request #338:
URL: https://github.com/apache/jackrabbit-oak/pull/338#discussion_r684014120



##########
File path: oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/DocumentStoreIndexerBase.java
##########
@@ -133,6 +135,52 @@ public NodeStateEntryTraverser create(LastModifiedRange lastModifiedRange) {
         }
     }
 
+    private FlatFileStore buildFlatFileStore(NodeState checkpointedState, CompositeIndexer indexer) throws IOException {
+
+        Stopwatch flatFileStoreWatch = Stopwatch.createStarted();
+        int executionCount = 1;
+        CompositeException lastException = null;
+        List<File> previousDownloadDirs = new ArrayList<>();
+        FlatFileStore flatFileStore = null;
+        //TODO How to ensure we can safely read from secondary
+        DocumentNodeState rootDocumentState = (DocumentNodeState) checkpointedState;
+        DocumentNodeStore nodeStore = (DocumentNodeStore) indexHelper.getNodeStore();
+
+        DocumentStoreSplitter splitter = new DocumentStoreSplitter(getMongoDocumentStore());
+        List<Long> lastModifiedBreakPoints = splitter.split(Collection.NODES, 0L ,10);
+        FlatFileNodeStoreBuilder builder = null;
+
+        while (flatFileStore == null && executionCount <= MAX_DOWNLOAD_RETRIES) {
+            try {
+                //TODO Use flatFileStore only if we have relative nodes to be indexed

Review comment:
       done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] averma21 merged pull request #338: OAK-9524: Retry failed data download (resuming from the point it stop…

Posted by GitBox <gi...@apache.org>.
averma21 merged pull request #338:
URL: https://github.com/apache/jackrabbit-oak/pull/338


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] averma21 commented on a change in pull request #338: OAK-9524: Retry failed data download (resuming from the point it stop…

Posted by GitBox <gi...@apache.org>.
averma21 commented on a change in pull request #338:
URL: https://github.com/apache/jackrabbit-oak/pull/338#discussion_r684000894



##########
File path: oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/CompositeException.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.jackrabbit.oak.index.indexer.document;
+
+import org.slf4j.Logger;
+
+public class CompositeException extends Exception {

Review comment:
       That method is final, so we can't override it. Adding/getting suppressed exceptions is provided by Throwables class itself. So, keeping this class just for exception type identification and logging method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] thomasmueller commented on a change in pull request #338: OAK-9524: Retry failed data download (resuming from the point it stop…

Posted by GitBox <gi...@apache.org>.
thomasmueller commented on a change in pull request #338:
URL: https://github.com/apache/jackrabbit-oak/pull/338#discussion_r684044859



##########
File path: oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/CompositeException.java
##########
@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.jackrabbit.oak.index.indexer.document;
+
+import org.slf4j.Logger;
+
+public class CompositeException extends Exception {

Review comment:
       I see, so instead of overriding getSuppressed() we would have to call addSuppressed(Throwable exception) for each entry in the list. I think it would still make sense do to that, but I don't have a strong opinion.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] averma21 commented on a change in pull request #338: OAK-9524: Retry failed data download (resuming from the point it stop…

Posted by GitBox <gi...@apache.org>.
averma21 commented on a change in pull request #338:
URL: https://github.com/apache/jackrabbit-oak/pull/338#discussion_r684007897



##########
File path: oak-run-commons/src/main/java/org/apache/jackrabbit/oak/index/indexer/document/flatfile/MultithreadedTraverseWithSortStrategy.java
##########
@@ -254,7 +258,9 @@ private void resumeFromPreviousState(List<LastModifiedRange> previousState,
             LastModifiedRange currentRange = previousState.get(i);
             LastModifiedRange nextRange = i < previousState.size() - 1 ? previousState.get(i+1) : null;
             if (nextRange != null && currentRange.checkOverlap(nextRange)) {
-                throw new IllegalStateException("Range overlap between " + currentRange + " and " + nextRange);
+                LastModifiedRange merged = currentRange.mergeWith(nextRange);
+                log.info("Range overlap between " + currentRange + " and " + nextRange + ". Using merged range " + merged);
+                currentRange = merged;

Review comment:
       yes, you're right :) This didn't surface up during tests because having the second range as well was just duplicating the downloads but duplicates are finally removed while merging. 
   Fixed the code to skip the next range if merge is done.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org