You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/01/28 22:51:15 UTC

[GitHub] [lucene-solr] megancarey opened a new pull request #2265: SOLR-15119 Add logs and make default splitMethod to be LINK

megancarey opened a new pull request #2265:
URL: https://github.com/apache/lucene-solr/pull/2265


   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Lucene or Solr:
   
   * https://issues.apache.org/jira/projects/LUCENE
   * https://issues.apache.org/jira/projects/SOLR
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * LUCENE-####: <short description of problem or changes>
   * SOLR-####: <short description of problem or changes>
   
   LUCENE and SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   
   REWRITE splitMethod is considerably slower than LINK. Deprecated Autoscaling triggers used LINK method as default, contradicting SplitShardCmd.
   
   # Solution
   
   Unify branch_8x autoscaling code with SplitShardCmd and make the more performant splitMethod (LINK) the default.
   
   # Tests
   
   Added a test to SplitShardTest to run against all available split methods, and verify that an invalid splitMethod causes splits to fail.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] madrob commented on a change in pull request #2265: SOLR-15119 Add logs and make default splitMethod to be LINK

Posted by GitBox <gi...@apache.org>.
madrob commented on a change in pull request #2265:
URL: https://github.com/apache/lucene-solr/pull/2265#discussion_r566467915



##########
File path: solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
##########
@@ -80,16 +83,33 @@ public void call(ClusterState state, ZkNodeProps message, @SuppressWarnings({"ra
     split(state, message,(NamedList<Object>) results);
   }
 
+  /**

Review comment:
       Can we use HTML tags for the list? Otherwise the formatting will get messed up in editor tooltips and on the javadoc page.

##########
File path: solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
##########
@@ -382,6 +407,7 @@ public boolean split(ClusterState clusterState, ZkNodeProps message, NamedList<O
         log.debug("Index on shard: {} split into {} successfully", nodeName, subShardNames.size());
       }
 
+      // 8. apply buffered updates on sub-shards
       t = timings.sub("applyBufferedUpdates");
       // apply buffered updates on sub-shards

Review comment:
       duplicated comment

##########
File path: solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
##########
@@ -80,16 +83,33 @@ public void call(ClusterState state, ZkNodeProps message, @SuppressWarnings({"ra
     split(state, message,(NamedList<Object>) results);
   }
 
+  /**
+   * Shard splits start here and make additional requests to the host of the parent shard.
+   *
+   * The sequence of requests is as follows:
+   *  1. Verify that there is enough disk space to create sub-shards.
+   *  2. If splitByPrefix is true, make request to get prefix ranges.
+   *  3. If this split was attempted previously and there are lingering sub-shards, delete them.
+   *  4. Create sub-shards in CONSTRUCTION state.
+   *  5. Add an initial replica to each sub-shard.
+   *  6. Request that parent shard wait for children to become ACTIVE.
+   *  7. Execute split: either LINK or REWRITE.
+   *  8. Apply buffered updates to the sub-shards so they are up-to-date with parent.
+   *  9. Determine node placement for additional replicas (but do not create yet).
+   *  10. If replicationFactor is more than 1, set shard state for sub-shards to RECOVERY; else mark ACTIVE.
+   *  11. Create additional replicas of sub-shards.
+   */
   @SuppressWarnings({"rawtypes"})
   public boolean split(ClusterState clusterState, ZkNodeProps message, NamedList<Object> results) throws Exception {
     final String asyncId = message.getStr(ASYNC);
 
+    // get split method, or default to LINK if unspecified
     boolean waitForFinalState = message.getBool(CommonAdminParams.WAIT_FOR_FINAL_STATE, false);
-    String methodStr = message.getStr(CommonAdminParams.SPLIT_METHOD, SolrIndexSplitter.SplitMethod.REWRITE.toLower());
+    String methodStr = message.getStr(CommonAdminParams.SPLIT_METHOD, SolrIndexSplitter.SplitMethod.LINK.toLower());

Review comment:
       This should be called out in upgrade-notes, I think.

##########
File path: solr/core/src/test/org/apache/solr/cloud/SplitShardTest.java
##########
@@ -106,6 +108,50 @@ public void doTest() throws IOException, SolrServerException {
     }
   }
 
+  /**
+   * Create a collection with 3 shards and split them each with a different splitMethod value.
+   * 1. No override specified. Verify that LINK method is used.
+   * 2. REWRITE method specified. Verify that LINK steps are skipped.
+   * 3. Invalid override specified. Verify that split fails.
+   */
+  @Test
+  public void testSplitMethods() throws Exception {
+    CollectionAdminRequest
+            .createCollection(COLLECTION_NAME, "conf", 3, 1)
+            .process(cluster.getSolrClient());
+
+    cluster.waitForActiveCollection(COLLECTION_NAME, 3, 3);
+
+    CollectionAdminRequest.SplitShard splitShard = CollectionAdminRequest.splitShard(COLLECTION_NAME)
+            .setShardName("shard1");
+    SolrResponse response = splitShard.process(cluster.getSolrClient());
+    assertTrue(response.getResponse().toString().contains("hardLinkCopy"));

Review comment:
       nit: assertThat will get you a better failure message




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] madrob commented on a change in pull request #2265: SOLR-15119 Add logs and make default splitMethod to be LINK

Posted by GitBox <gi...@apache.org>.
madrob commented on a change in pull request #2265:
URL: https://github.com/apache/lucene-solr/pull/2265#discussion_r567947008



##########
File path: solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
##########
@@ -580,6 +597,10 @@ public boolean split(ClusterState clusterState, ZkNodeProps message, NamedList<O
       if (withTiming) {
         results.add(CommonParams.TIMING, timings.asNamedList());
       }
+
+      if (log.isDebugEnabled()) {
+        log.debug("Timings for split sub-ops: ", timings.toString());

Review comment:
       Needs a `{}` logging placeholder.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] megancarey commented on a change in pull request #2265: SOLR-15119 Add logs and make default splitMethod to be LINK

Posted by GitBox <gi...@apache.org>.
megancarey commented on a change in pull request #2265:
URL: https://github.com/apache/lucene-solr/pull/2265#discussion_r566495456



##########
File path: solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
##########
@@ -407,15 +432,6 @@ public boolean split(ClusterState clusterState, ZkNodeProps message, NamedList<O
 
       log.debug("Successfully applied buffered updates on : {}", subShardNames);
 
-      // Replica creation for the new Slices
-
-      Set<String> nodes = clusterState.getLiveNodes();
-      List<String> nodeList = new ArrayList<>(nodes.size());
-      nodeList.addAll(nodes);
-
-      // Remove the node that hosts the parent shard for replica creation.
-      nodeList.remove(nodeName);

Review comment:
       Also cleaned this up because it was bugging me :) nodeList is never used.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] madrob commented on a change in pull request #2265: SOLR-15119 Add logs and make default splitMethod to be LINK

Posted by GitBox <gi...@apache.org>.
madrob commented on a change in pull request #2265:
URL: https://github.com/apache/lucene-solr/pull/2265#discussion_r566505171



##########
File path: solr/core/src/test/org/apache/solr/cloud/SplitShardTest.java
##########
@@ -106,6 +108,50 @@ public void doTest() throws IOException, SolrServerException {
     }
   }
 
+  /**
+   * Create a collection with 3 shards and split them each with a different splitMethod value.
+   * 1. No override specified. Verify that LINK method is used.
+   * 2. REWRITE method specified. Verify that LINK steps are skipped.
+   * 3. Invalid override specified. Verify that split fails.
+   */
+  @Test
+  public void testSplitMethods() throws Exception {
+    CollectionAdminRequest
+            .createCollection(COLLECTION_NAME, "conf", 3, 1)
+            .process(cluster.getSolrClient());
+
+    cluster.waitForActiveCollection(COLLECTION_NAME, 3, 3);
+
+    CollectionAdminRequest.SplitShard splitShard = CollectionAdminRequest.splitShard(COLLECTION_NAME)
+            .setShardName("shard1");
+    SolrResponse response = splitShard.process(cluster.getSolrClient());
+    assertTrue(response.getResponse().toString().contains("hardLinkCopy"));

Review comment:
       It’s in hamcrest 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] megancarey commented on a change in pull request #2265: SOLR-15119 Add logs and make default splitMethod to be LINK

Posted by GitBox <gi...@apache.org>.
megancarey commented on a change in pull request #2265:
URL: https://github.com/apache/lucene-solr/pull/2265#discussion_r566488882



##########
File path: solr/core/src/test/org/apache/solr/cloud/SplitShardTest.java
##########
@@ -106,6 +108,50 @@ public void doTest() throws IOException, SolrServerException {
     }
   }
 
+  /**
+   * Create a collection with 3 shards and split them each with a different splitMethod value.
+   * 1. No override specified. Verify that LINK method is used.
+   * 2. REWRITE method specified. Verify that LINK steps are skipped.
+   * 3. Invalid override specified. Verify that split fails.
+   */
+  @Test
+  public void testSplitMethods() throws Exception {
+    CollectionAdminRequest
+            .createCollection(COLLECTION_NAME, "conf", 3, 1)
+            .process(cluster.getSolrClient());
+
+    cluster.waitForActiveCollection(COLLECTION_NAME, 3, 3);
+
+    CollectionAdminRequest.SplitShard splitShard = CollectionAdminRequest.splitShard(COLLECTION_NAME)
+            .setShardName("shard1");
+    SolrResponse response = splitShard.process(cluster.getSolrClient());
+    assertTrue(response.getResponse().toString().contains("hardLinkCopy"));

Review comment:
       I tried to update but found that assertThat was deprecated. What lib do you use for that?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene-solr] megancarey commented on a change in pull request #2265: SOLR-15119 Add logs and make default splitMethod to be LINK

Posted by GitBox <gi...@apache.org>.
megancarey commented on a change in pull request #2265:
URL: https://github.com/apache/lucene-solr/pull/2265#discussion_r566495266



##########
File path: solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
##########
@@ -60,7 +60,10 @@
 import static org.apache.solr.common.params.CommonAdminParams.ASYNC;
 import static org.apache.solr.common.params.CommonAdminParams.NUM_SUB_SHARDS;
 
-
+/**
+ * Index split request processed by Overseer. Requests from here go to the host of the parent shard,
+ * and are processed by @link.

Review comment:
       Oops.. meant to link to SplitOp here. I'll clean up in my next commit




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org