You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@solr.apache.org by ds...@apache.org on 2022/10/03 20:14:48 UTC

[solr] branch branch_9x updated (d057f54c398 -> b3c90185dd2)

This is an automated email from the ASF dual-hosted git repository.

dsmiley pushed a change to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/solr.git


    from d057f54c398 Upgrade forbiddenapis to 3.4 (#1052)
     new 102dd8835c4 dev-docs: Shard Splits (#977)
     new b3c90185dd2 Add logs and comments to split workflow (#1027)

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 .../images/replica-state-transition-diagram.png    | Bin 0 -> 16611 bytes
 .../shard-split/images/shard-split-diagram.png     | Bin 0 -> 689646 bytes
 .../images/shard-state-transition-diagram.png      | Bin 0 -> 27534 bytes
 dev-docs/shard-split/shard-split.adoc              | 165 +++++++++++++++++++++
 .../solr/cloud/api/collections/SplitShardCmd.java  |  74 ++++++---
 .../org/apache/solr/handler/admin/SplitOp.java     |  20 ++-
 6 files changed, 240 insertions(+), 19 deletions(-)
 create mode 100644 dev-docs/shard-split/images/replica-state-transition-diagram.png
 create mode 100644 dev-docs/shard-split/images/shard-split-diagram.png
 create mode 100644 dev-docs/shard-split/images/shard-state-transition-diagram.png
 create mode 100644 dev-docs/shard-split/shard-split.adoc

[solr] 01/02: dev-docs: Shard Splits (#977)

Posted by ds...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

dsmiley pushed a commit to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/solr.git

commit 102dd8835c44a6336bbce89434c20a14c12baef2
Author: Nazerke Seidan <se...@gmail.com>
AuthorDate: Thu Sep 22 17:02:45 2022 -0400

    dev-docs: Shard Splits (#977)
    
    
    Co-authored-by: Nazerke Seidan <ns...@salesforce.com>
---
 .../images/replica-state-transition-diagram.png    | Bin 0 -> 16611 bytes
 .../shard-split/images/shard-split-diagram.png     | Bin 0 -> 689646 bytes
 .../images/shard-state-transition-diagram.png      | Bin 0 -> 27534 bytes
 dev-docs/shard-split/shard-split.adoc              | 165 +++++++++++++++++++++
 4 files changed, 165 insertions(+)

diff --git a/dev-docs/shard-split/images/replica-state-transition-diagram.png b/dev-docs/shard-split/images/replica-state-transition-diagram.png
new file mode 100644
index 00000000000..f66f867d3ec
Binary files /dev/null and b/dev-docs/shard-split/images/replica-state-transition-diagram.png differ
diff --git a/dev-docs/shard-split/images/shard-split-diagram.png b/dev-docs/shard-split/images/shard-split-diagram.png
new file mode 100644
index 00000000000..d6d0add424a
Binary files /dev/null and b/dev-docs/shard-split/images/shard-split-diagram.png differ
diff --git a/dev-docs/shard-split/images/shard-state-transition-diagram.png b/dev-docs/shard-split/images/shard-state-transition-diagram.png
new file mode 100644
index 00000000000..b4e9d4e6bf6
Binary files /dev/null and b/dev-docs/shard-split/images/shard-state-transition-diagram.png differ
diff --git a/dev-docs/shard-split/shard-split.adoc b/dev-docs/shard-split/shard-split.adoc
new file mode 100644
index 00000000000..ab6ac1d6efb
--- /dev/null
+++ b/dev-docs/shard-split/shard-split.adoc
@@ -0,0 +1,165 @@
+= Shard Split
+:toc: macro
+:toclevels: 3
+
+The document explains how shard split works in SolrCloud at a high level. This explanation assumes that shard is split into 2 parts using the `default` settings.
+
+toc::[]
+
+== Background
+Constantly adding new documents to Solr will slow down query performance as index size increases. To handle this, shard split is introduced. Shard split feature works in both Standalone and SolrCloud modes.
+
+Shard is a logical partition of collection, containing a subset of documents from collection. Which shard contains which document depends on the sharding strategy. It is the "router" that determines this -- e.g. "implicit" vs "compositeId"  When a document is sent to Solr for indexing, the system first determines which shard the document belongs to and finds the leader of that shard. Then the leader forwards the updates to other replicas.
+
+== Shard States
+Shard can have one of the following states:
+
+* ACTIVE
+** shard receives updates, participates in distributed search.
+* CONSTRUCTION
+** shard receives updates only from the parent shard leader, but doesn’t participate in distributed search.
+** shard is put in that state when shard split operation is in progress or shard is undergoing data restoration.
+* RECOVERY
+** shard receives updates only from the parent shard leader, but doesn’t participate in distributed search.
+** shard is put in that state to create replicas in order to meet collection’s replicationFactor.
+* RECOVERY_FAILED
+** shard doesn’t receive any updates, doesn’t participate in distributed search.
+** shard is put in that state when parent shard leader is not live.
+* INACTIVE
+** shard is put in that state after it has been successfully split.
+
+Detail: Shard is referred to Slice in the codebase context.
+
+== Shard State Transition Diagram
+
+image::images/shard-state-transition-diagram.png[]
+
+== Replica States
+
+Replica is a core, physical partition of index, placed on a node. Replica location is `/var/solr/data`.
+
+Replica can have one of the following states:
+
+* ACTIVE
+** replica is ready to receive updates and queries.
+* DOWN
+** replica is actively trying to move to RECOVERING or ACTIVE state.
+* RECOVERING
+** replica is recovering from leader. This includes peer sync and full replication.
+* RECOVERY_FAILED
+** recovery didn't succeed.
+
+== Replica State Transition Diagram
+
+image::images/replica-state-transition-diagram.png[]
+
+== Simplified Explanation
+
+Before digging into the explanation, let us define a few terminologies which will help us understand the content better:
+
+* *parent shard* for a shard which will be split.
+* *sub shard* is a new shard to be created after splitting a parent shard.
+* *initial replica* is a first replica/core to be added for a sub shard.
+* *additional replica* is a replica to be created in order to meet `replicationFactor` of collection.
+
+SPLITSHARD can be split into multiple sub shards when one of the following params is used: `ranges`, `numSubShards`. In this explanation, shard is split into two pieces which are written into disk as two new shards (sub shards).  Behind the scene, original shard's hash range is computed in order to break a shard into two pieces. Furthermore, we can specify `split method` which can be either `rewrite` (default) or `link`.
+
+Simple Shard Split Steps:
+
+* Sub shards are created in `CONSTRUCTION` state.
+* Initial replica is created for each sub shard.
+* Parent shard leader is “split” (as parent shard can be split into n sub shards, n new indices of sub shards are created).
+* Buffered updates are applied on sub shards.
+* Additional replicas of sub shards are created (satisfy `replicationFactor` of collection).
+* Sub shards become `ACTIVE` and parent shard becomes `INACTIVE`.
+
+Notes:
+
+* No downtime during split process -- on the fly; client continues to query, index; replication factor is maintained.
+* `SPLITSHARD` operation is executed by Overseer.
+* The demanding I/O when using `splitMethod=rewrite` with the requirement of having enough free disk space i.e., 2x the core size.
+* Split operation is async.
+* `INACTIVE` shards have to be cleaned up manually.
+
+
+== Updates while in the process of Shard Split
+
+`UpdateLog` starts to buffer updates on initial replica.
+When update request comes to parent shard, parent shard forwards the updates to sub shards. A new transaction log file is created `../replicaName/data/tlog/buffer.tlog.timestamp` for each initial replica of sub shards. `DirectUpdateHandler2` writes the updates to buffer tlog file. Later new updates will be appended at the end of that tlog file.
+
+Apply buffered updates on sub shards:
+
+`UpdateLog` starts log replay. It gets updates from the buffered tlog file (`../replicaName/data/tlog/buffer.tlog.timestamp`) and creates a new transaction log file, `../replicaName/data/tlog/tlog.timestamp` `DirectUpdateHandler2` writes the buffered updates into tlog file.
+
+
+== Shard Split Process Diagram (High Level)
+
+The following diagram illustrates the shard splitting process at a high level.
+
+image::images/shard-split-diagram.png[]
+
+== Shard Split Details
+
+Shard split code is mostly in `SplitShardCmd`. Actual index split is processed in `SplitOp`.
+
+1. `SPLITSHARD`, split operation is triggered via Collections API, executed by Overseer. Overseer Collections Handler receives the request and sends it to Collection Processor.
+
+2. Verify if there is enough disk space on the parent shard node to create sub shards.
+
+3. Collection Processor creates a sub shard in `CONSTRUCTION` state and puts it in ZK.
+
+4. Create initial replica/core, `ADDREPLICA → AddReplicaCmd → CoreAdminOperation.CREATE`
+    ** 4.a Only `CoreDescriptor` is created; initial replica state is set to `DOWN` by `SliceMutator` .
+    ** 4.b Create `SolrCore` from `CoreDescriptor`; initial replica state is updated to `ACTIVE` by `ReplicaMutator`.
+
+5. Initial replica waits for the parent shard leader to acknowledge it, `CoreAdminRequest.WaitForState() → CoreAdminAction.PREPRECOVERY → PrepRecoveryOp`
+
+6. `SPLIT` request is made to `SplitOp` by providing parent shard core, `targetCore` and `splitMethod`; targetCore is the initial replica of each sub shard, `splitMethod=rewrite` by default.
+    ** `SplitOp` determines which router is associated with parent shard core.
+    ** `SplitIndexCommand` is called to partition the index.
+    ** `SolrIndexSplitter` splits index using either REWRITE or LINK method.
+
+7. Apply buffered updates on sub shard replicas. `CoreAdminAction.REQUESTAPPLYUPDATES → RequestApplyUpdatesOp`. `UpdateLog` state has to be `BUFFERING`. `UpdateLog` starts log replay; gets updates from the buffered tlog file and creates a new transaction log file, `/var/solr/data/replicaName/data/tlog/tlog.timestamp`. `DirectUpdateHandler2 writes buffered updates into tlog file`.
+
+8. Identify locations/nodes for the additional replicas to be created.
+
+9. Create additional replica as part of sub shard.
+    ** 9.a Skip creating a replica, instead put it in `Overseer`, by setting replica state to `DOWN`.
+    ** 9.b As replicationFactor is not 1, SplitShardCmd requests sub shard state to set to `RECOVERY`, executed by `SliceMutator`. And actually create an additional replica/core, but the additional replica state remains `DOWN` because sub shard is in `RECOVERY` state.
+
+10. Wait for replicas to be in RECOVERING state and run replication.
+    ** 10.a Set additional replicas state to `RECOVERING`.
+    ** 10.b As additional replicas are in `RECOVERING` state, run replication -- replicate from sub shard leader using `ReplicationHandler`.
+
+11. Switch shard states:
+    ** update sub shards state from `RECOVERY` to `ACTIVE`.
+    ** update parent shard state from `ACTIVE` to `INACTIVE`.
+
+== Testing/Debugging
+
+We can manually test/debug shard split process.
+
+* Configure log levels to `DEBUG` in `log4j2.xml` file, for example:
+
+    <Logger name="org.apache.solr.handler" level="DEBUG"/>
+    <Logger name="org.apache.solr.cloud" level="DEBUG"/>
+    <Logger name="org.apache.solr.core" level="DEBUG"/>
+
+* Build and run solr in SolrCloud mode
+* Create collection, `name=test` with `replicationFactor=2`
+* Send the following curl command to solr:
+
+    curl -i -v http://localhost:8983/solr/admin/collections?action=SPLITSHARD&collection=test&shard=shard1
+
+* Add some sleeps -- `Thread.sleep()` in `ShardSplitCmd` and add some documents and finally, observe how new documents are buffered during shard split.
+
+
+
+
+
+
+
+
+
+
+

[solr] 02/02: Add logs and comments to split workflow (#1027)

Posted by ds...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

dsmiley pushed a commit to branch branch_9x
in repository https://gitbox.apache.org/repos/asf/solr.git

commit b3c90185dd2bf6bf5c67d109f20bb012c5d0cf98
Author: Nazerke Seidan <se...@gmail.com>
AuthorDate: Mon Oct 3 13:50:35 2022 -0400

    Add logs and comments to split workflow (#1027)
    
    Co-authored-by: Megan Carey <mc...@berkeley.edu>
    Co-authored-by: Nazerke Seidan <ns...@salesforce.com>
---
 .../solr/cloud/api/collections/SplitShardCmd.java  | 74 +++++++++++++++++-----
 .../org/apache/solr/handler/admin/SplitOp.java     | 20 +++++-
 2 files changed, 75 insertions(+), 19 deletions(-)

diff --git a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
index 7f78286cdab..bdd2946da40 100644
--- a/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
+++ b/solr/core/src/java/org/apache/solr/cloud/api/collections/SplitShardCmd.java
@@ -80,6 +80,7 @@ import org.apache.zookeeper.data.Stat;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+/** SolrCloud logic for splitting a shard. It's complicated! See {@code split()} below. */
 public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
   private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
   private static final int MIN_NUM_SUB_SHARDS = 2;
@@ -102,6 +103,31 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
     split(state, message, results);
   }
 
+  /**
+   * Shard splits start here and make additional requests to the host of the parent shard. The
+   * sequence of requests is as follows:
+   *
+   * <ul>
+   *   <li>1. Verify that there is enough disk space to create sub-shards.
+   *   <li>2. If splitByPrefix is true, make request to get prefix ranges.
+   *   <li>3. If this split was attempted previously and there are lingering sub-shards, delete
+   *       them.
+   *   <li>4. Create sub-shards in CONSTRUCTION state.
+   *   <li>5. Add an initial replica to each sub-shard.
+   *   <li>6. Request that parent shard wait for children to become ACTIVE.
+   *   <li>7. Execute split: either LINK or REWRITE.
+   *   <li>8. Apply buffered updates to the sub-shards so they are up-to-date with parent.
+   *   <li>9. Determine node placement for additional replicas (but do not create yet).
+   *   <li>10. If replicationFactor is more than 1, set shard state for sub-shards to RECOVERY; else
+   *       mark ACTIVE.
+   *   <li>11. Create additional replicas of sub-shards.
+   * </ul>
+   *
+   * <br>
+   *
+   * <p>There is a shard split doc (dev-docs/shard-split/shard-split.adoc) on how shard split works;
+   * illustrated with diagrams.
+   */
   public boolean split(ClusterState clusterState, ZkNodeProps message, NamedList<Object> results)
       throws Exception {
     final String asyncId = message.getStr(ASYNC);
@@ -140,6 +166,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
     String splitKey = message.getStr("split.key");
     DocCollection collection = clusterState.getCollection(collectionName);
 
+    // verify that parent shard is active; if not, throw exception
     Slice parentSlice = getParentSlice(clusterState, collectionName, slice, splitKey);
     if (parentSlice.getState() != Slice.State.ACTIVE) {
       throw new SolrException(
@@ -152,7 +179,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
               + parentSlice.getState());
     }
 
-    // find the leader for the shard
+    // find the leader for the parent shard
     Replica parentShardLeader;
     try {
       parentShardLeader = zkStateReader.getLeaderRetry(collectionName, slice.get(), 10000);
@@ -163,10 +190,17 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
 
     RTimerTree t;
     if (ccc.getCoreContainer().getNodeConfig().getMetricsConfig().isEnabled()) {
-      t = timings.sub("checkDiskSpace");
-      checkDiskSpace(
-          collectionName, slice.get(), parentShardLeader, splitMethod, ccc.getSolrCloudManager());
-      t.stop();
+      // check disk space for shard split
+      if (Boolean.parseBoolean(System.getProperty(SHARDSPLIT_CHECKDISKSPACE_ENABLED, "true"))) {
+        // 1. verify that there is enough space on disk to create sub-shards
+        log.debug(
+            "SplitShardCmd: verify that there is enough space on disk to create sub-shards for slice: {}",
+            parentShardLeader);
+        t = timings.sub("checkDiskSpace");
+        checkDiskSpace(
+            collectionName, slice.get(), parentShardLeader, splitMethod, ccc.getSolrCloudManager());
+        t.stop();
+      }
     }
 
     // let's record the ephemeralOwner of the parent leader node
@@ -245,6 +279,8 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
 
       ShardHandler shardHandler = ccc.newShardHandler();
 
+      // 2. if split request has splitByPrefix set to true, make request to SplitOp to get prefix
+      // ranges of sub-shards
       if (message.getBool(CommonAdminParams.SPLIT_BY_PREFIX, false)) {
         t = timings.sub("getRanges");
 
@@ -253,7 +289,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
         params.set(CoreAdminParams.GET_RANGES, "true");
         params.set(CommonAdminParams.SPLIT_METHOD, splitMethod.toLower());
         params.set(CoreAdminParams.CORE, parentShardLeader.getStr("core"));
-        // Only 2 is currently supported
+        // only 2 sub-shards are currently supported
         // int numSubShards = message.getInt(NUM_SUB_SHARDS, DEFAULT_NUM_SUB_SHARDS);
         // params.set(NUM_SUB_SHARDS, Integer.toString(numSubShards));
 
@@ -302,6 +338,8 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
               firstNrtReplica);
       t.stop();
 
+      // 3. if this shard has attempted a split before and failed, there will be lingering INACTIVE
+      // sub-shards.  Clean these up before proceeding
       boolean oldShardsDeleted = false;
       for (String subSlice : subSlices) {
         Slice oSlice = collection.getSlice(subSlice);
@@ -340,6 +378,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
         collection = clusterState.getCollection(collectionName);
       }
 
+      // 4. create the child sub-shards in CONSTRUCTION state
       String nodeName = parentShardLeader.getNodeName();
 
       t = timings.sub("createSubSlicesAndLeadersInState");
@@ -377,6 +416,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
             CollectionHandlingUtils.waitForNewShard(
                 collectionName, subSlice, ccc.getZkStateReader());
 
+        // 5. and add the initial replica for each sub-shard
         log.debug(
             "Adding first replica {} as part of slice {} of collection {} on {}",
             subShardName,
@@ -414,6 +454,8 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
         handleFailureOnAsyncRequest(results, msgOnError);
       }
       t.stop();
+
+      // 6. request that parent shard wait for children to become active
       t = timings.sub("waitForSubSliceLeadersAlive");
       {
         final ShardRequestTracker shardRequestTracker =
@@ -458,6 +500,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
             parentShardLeader);
       }
 
+      // 7. execute actual split
       ModifiableSolrParams params = new ModifiableSolrParams();
       params.set(CoreAdminParams.ACTION, CoreAdminParams.CoreAdminAction.SPLIT.toString());
       params.set(CommonAdminParams.SPLIT_METHOD, splitMethod.toLower());
@@ -484,8 +527,8 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
         log.debug("Index on shard: {} split into {} successfully", nodeName, subShardNames.size());
       }
 
+      // 8. apply buffered updates on sub-shards
       t = timings.sub("applyBufferedUpdates");
-      // apply buffered updates on sub-shards
       {
         final ShardRequestTracker shardRequestTracker =
             CollectionHandlingUtils.asyncRequestTracker(asyncId, ccc);
@@ -513,8 +556,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
 
       log.debug("Successfully applied buffered updates on : {}", subShardNames);
 
-      // Replica creation for the new Slices
-
+      // 9. determine node placement for additional replicas
       Set<String> nodes = clusterState.getLiveNodes();
       List<String> nodeList = new ArrayList<>(nodes.size());
       nodeList.addAll(nodes);
@@ -694,6 +736,8 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
       // this ensures that the logic inside ReplicaMutator to update sub-shard state to 'active'
       // always gets a chance to execute. See SOLR-7673
 
+      // 10. if replicationFactor > 1, set shard state for sub-shards to RECOVERY; otherwise mark
+      // ACTIVE
       if (repFactor == 1) {
         // A commit is needed so that documents are visible when the sub-shard replicas come up
         // (Note: This commit used to be after the state switch, but was brought here before the
@@ -723,7 +767,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
           ccc.offerStateUpdate(Utils.toJSON(m));
         }
       } else {
-        log.info("Requesting shard state be set to 'recovery'");
+        log.debug("Requesting shard state be set to 'recovery' for sub-shards: {}", subSlices);
         Map<String, Object> propMap = new HashMap<>();
         propMap.put(Overseer.QUEUE_OPERATION, OverseerAction.UPDATESHARDSTATE.toLower());
         for (String subSlice : subSlices) {
@@ -744,7 +788,7 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
       }
 
       t = timings.sub("createCoresForReplicas");
-      // now actually create replica cores on sub shard nodes
+      // 11. now actually create replica cores on sub shard nodes
       for (Map<String, Object> replica : replicas) {
         new AddReplicaCmd(ccc).addReplica(clusterState, new ZkNodeProps(replica), results, null);
       }
@@ -774,8 +818,8 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
         results.add(CommonParams.TIMING, timings.asNamedList());
       }
       success = true;
-      // don't unlock the shard yet - only do this if the final switch-over in
-      // ReplicaMutator succeeds (or fails)
+      // don't unlock the shard yet - only do this if the final switch-over in ReplicaMutator
+      // succeeds (or fails)
       return true;
     } catch (SolrException e) {
       throw e;
@@ -814,10 +858,6 @@ public class SplitShardCmd implements CollApiCmds.CollectionApiCommand {
       SolrIndexSplitter.SplitMethod method,
       SolrCloudManager cloudManager)
       throws SolrException {
-    // check that the system property is enabled. It should not be disabled by default.
-    if (!Boolean.parseBoolean(System.getProperty(SHARDSPLIT_CHECKDISKSPACE_ENABLED, "true"))) {
-      return;
-    }
     // check that enough disk space is available on the parent leader node
     // otherwise the actual index splitting will always fail
     NodeStateProvider nodeStateProvider = cloudManager.getNodeStateProvider();
diff --git a/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java b/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java
index 2e3481bc966..c743b27824b 100644
--- a/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java
+++ b/solr/core/src/java/org/apache/solr/handler/admin/SplitOp.java
@@ -35,6 +35,7 @@ import org.apache.lucene.util.BytesRef;
 import org.apache.lucene.util.StringHelper;
 import org.apache.solr.cloud.CloudDescriptor;
 import org.apache.solr.cloud.ZkShardTerms;
+import org.apache.solr.cloud.api.collections.SplitShardCmd;
 import org.apache.solr.common.SolrException;
 import org.apache.solr.common.cloud.ClusterState;
 import org.apache.solr.common.cloud.CompositeIdRouter;
@@ -51,11 +52,19 @@ import org.apache.solr.request.SolrQueryRequest;
 import org.apache.solr.search.SolrIndexSearcher;
 import org.apache.solr.update.SolrIndexSplitter;
 import org.apache.solr.update.SplitIndexCommand;
+import org.apache.solr.update.UpdateHandler;
 import org.apache.solr.util.RTimer;
 import org.apache.solr.util.RefCounted;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+/**
+ * CoreAdminOp implementation for shard splits. This request is enqueued when {@link SplitShardCmd}
+ * is processed. This operation handles two types of requests: 1. If {@link
+ * CommonAdminParams#SPLIT_BY_PREFIX} is true, the request to calculate document ranges for the
+ * sub-shards is processed here. 2. For any split request, the actual index split is processed here.
+ * This calls into {@link UpdateHandler#split(SplitIndexCommand)} to execute split.
+ */
 class SplitOp implements CoreAdminHandler.CoreAdminOp {
 
   private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
@@ -63,16 +72,18 @@ class SplitOp implements CoreAdminHandler.CoreAdminOp {
   @Override
   public void execute(CoreAdminHandler.CallInfo it) throws Exception {
     SolrParams params = it.req.getParams();
-
     String splitKey = params.get("split.key");
     String[] newCoreNames = params.getParams("targetCore");
     String cname = params.get(CoreAdminParams.CORE, "");
 
+    // if split request has splitByPrefix set to true, we will first make a request to SplitOp
+    // to calculate the prefix ranges, and do the actual split in a separate request
     if (params.getBool(GET_RANGES, false)) {
       handleGetRanges(it, cname);
       return;
     }
 
+    // if not using splitByPrefix, determine split partitions
     List<DocRouter.Range> ranges = null;
 
     String[] pathsArr = params.getParams(PATH);
@@ -98,6 +109,7 @@ class SplitOp implements CoreAdminHandler.CoreAdminOp {
       }
     }
 
+    // if not using splitByPrefix, ensure either path or targetCore specified
     if ((pathsArr == null || pathsArr.length == 0)
         && (newCoreNames == null || newCoreNames.length == 0)) {
       throw new SolrException(
@@ -124,7 +136,9 @@ class SplitOp implements CoreAdminHandler.CoreAdminOp {
 
       DocRouter router = null;
       String routeFieldName = null;
+      // if in SolrCloud mode, get collection and shard names
       if (it.handler.coreContainer.isZooKeeperAware()) {
+        log.trace("SplitOp: Determine which router is associated with the shard for core");
         ClusterState clusterState = it.handler.coreContainer.getZkController().getClusterState();
         String collectionName =
             parentCore.getCoreDescriptor().getCloudDescriptor().getCollectionName();
@@ -145,6 +159,7 @@ class SplitOp implements CoreAdminHandler.CoreAdminOp {
       }
 
       if (pathsArr == null) {
+        log.trace("SplitOp: Create array of paths for sub-shards of core");
         newCores = new ArrayList<>(partitions);
         for (String newCoreName : newCoreNames) {
           SolrCore newcore = it.handler.coreContainer.getCore(newCoreName);
@@ -189,6 +204,7 @@ class SplitOp implements CoreAdminHandler.CoreAdminOp {
       parentCore.getUpdateHandler().split(cmd);
 
       if (it.handler.coreContainer.isZooKeeperAware()) {
+        log.trace("SplitOp: Create cloud descriptors for sub-shards of core");
         for (SolrCore newcore : newCores) {
           // the index of the core changed from empty to have some data, its term must be not zero
           CloudDescriptor cd = newcore.getCoreDescriptor().getCloudDescriptor();
@@ -204,7 +220,7 @@ class SplitOp implements CoreAdminHandler.CoreAdminOp {
       // After the split has completed, someone (here?) should start the process of replaying the
       // buffered updates.
     } catch (Exception e) {
-      log.error("ERROR executing split:", e);
+      log.error("ERROR executing split: ", e);
       throw e;
     } finally {
       if (req != null) req.close();