You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@jackrabbit.apache.org by GitBox <gi...@apache.org> on 2021/10/19 13:10:42 UTC

[GitHub] [jackrabbit-oak] fabriziofortino opened a new pull request #393: OAK-9598: reduce server load on reindex operations

fabriziofortino opened a new pull request #393:
URL: https://github.com/apache/jackrabbit-oak/pull/393


   This PR changes the ES indexes lifecycle. At reindexing time, a new index gets created and populated while the old one is still used for queries. The new index is optimized for bulk loads (to reduce server load). Once the reindex process ends, the new index gets enabled and the old one is deleted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] fabriziofortino merged pull request #393: OAK-9598: reduce server load on reindex operations

Posted by GitBox <gi...@apache.org>.
fabriziofortino merged pull request #393:
URL: https://github.com/apache/jackrabbit-oak/pull/393


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] fabriziofortino commented on pull request #393: OAK-9598: reduce server load on reindex operations

Posted by GitBox <gi...@apache.org>.
fabriziofortino commented on pull request #393:
URL: https://github.com/apache/jackrabbit-oak/pull/393#issuecomment-949580109


   @nit0906 I wanted to add a test but unfortunately reindex in tests seems to work only with synchronous indexes. A test like that would require async indexing. Even with that, there could be race conditions where reindexing finishes faster than expected. To work this around we can introduce a [toxyproxy](https://www.testcontainers.org/modules/toxiproxy/) between the test and the ES container to add latency while reindexing.
   
   I have created a separate issue to track this task. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] thomasmueller commented on a change in pull request #393: OAK-9598: reduce server load on reindex operations

Posted by GitBox <gi...@apache.org>.
thomasmueller commented on a change in pull request #393:
URL: https://github.com/apache/jackrabbit-oak/pull/393#discussion_r734300698



##########
File path: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/IndexUpdate.java
##########
@@ -223,19 +223,18 @@ private boolean shouldReindex(NodeBuilder definition, NodeState before,
             String name) {
         PropertyState type = definition.getProperty(TYPE_PROPERTY_NAME);
 
+        //Do not attempt reindex of disabled indexes
+        if (type != null && TYPE_DISABLED.equals(type.getValue(Type.STRING))) {
+            return false;
+        }
+
         //Async indexes are not considered for reindexing for sync indexing
         // Skip this check for elastic index
         // TODO : See if the check to skip elastic can be handled in a better way - maybe move isMatchingIndexNode to IndexDefinition ?
         if (!TYPE_ELASTICSEARCH.equals(type.getValue(Type.STRING)) && !isMatchingIndexMode(definition)){

Review comment:
       Could we get a NPE here if type is null? (I understand this code was there before... but now it's more obvious... Also, missing space before "{"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [jackrabbit-oak] thomasmueller commented on a change in pull request #393: OAK-9598: reduce server load on reindex operations

Posted by GitBox <gi...@apache.org>.
thomasmueller commented on a change in pull request #393:
URL: https://github.com/apache/jackrabbit-oak/pull/393#discussion_r736233752



##########
File path: oak-core/src/main/java/org/apache/jackrabbit/oak/plugins/index/IndexUpdate.java
##########
@@ -242,29 +240,27 @@ private boolean shouldReindex(NodeBuilder definition, NodeState before,
         }
         // reindex in the case this is a new node, even though the reindex flag
         // might be set to 'false' (possible via content import).
-        // However if its already indexed i.e. has some hidden nodes (containing hidden data)
+        // However, if its already indexed i.e. has some hidden nodes (containing hidden data)
         // then no need to reindex
 
         // WARNING: If there is _any_ hidden node, then it is assumed that
         // no reindex is needed. Even if the hidden node is completely unrelated
         // and doesn't contain index data (for example the node ":status").
         // See also OAK-7991.
-        boolean result = !before.getChildNode(INDEX_DEFINITIONS_NAME).hasChildNode(name)
-                && !hasAnyHiddenNodes(definition);
+        boolean result = !before.getChildNode(INDEX_DEFINITIONS_NAME).hasChildNode(name) && !hasAnyHiddenNodes(definition);
         // See OAK-9449
-        // In case of elasticsearch, indexed data is stored remotely and not under hidden nodes, so
-        // in case of OutOfBand indexing during content import, there is no hidden node created for elastic (not even :status)
-        // So, we log a warn and return false to avoid unnecessary reindexing. The warning is only if the someone added the new index node and forgot to add
+        // In case of elasticsearch, indexed data is stored remotely and not under hidden nodes, so in case of OutOfBand
+        // indexing during content import, there is no hidden node created for elastic (not even :status)
+        // So, we log a warning and return false to avoid unnecessary reindexing. The warning is  displayed only if
+        // someone added the new index node and forgot to add
         // the reindex flag, in case OutOfBand Indexing has been performed, warning can be ignored.
         // Also, in case the new elastic node has been added with reindex = true , this method would have already returned true
         if (result && TYPE_ELASTICSEARCH.equals((type.getValue(Type.STRING)))) {
             log.warn("Found a new elastic index node [{}]. Please set the reindex flag = true to initiate reindexing." +

Review comment:
       Nit: missing space after the dot. This was there before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@jackrabbit.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org