You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/02/25 21:21:26 UTC
[GitHub] [solr] risdenk commented on a change in pull request #705: SOLR-14920: Spotless formatting for core - non-test only

risdenk commented on a change in pull request #705:
URL: https://github.com/apache/solr/pull/705#discussion_r815150824



##########
File path: solr/core/src/java/org/apache/solr/api/ContainerPluginsRegistry.java
##########
@@ -88,16 +88,16 @@ public boolean onChange(Map<String, Object> properties) {
     Phaser localPhaser = phaser; // volatile read
     if (localPhaser != null) {
       assert localPhaser.getRegisteredParties() == 1;
-      localPhaser.arrive(); // we should be the only ones registered, so this will advance phase each time
+      localPhaser
+          .arrive(); // we should be the only ones registered, so this will advance phase each time

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -93,48 +95,57 @@ public DistributedApiAsyncTracker(SolrZkClient zkClient, String rootPath) {
     persistentIdsPath = rootPath + ZK_ASYNC_PERSISTENT;
     inFlightIdsPath = rootPath + ZK_ASYNC_INFLIGHT;
 
-    trackedAsyncTasks = new SizeLimitedDistributedMap(zkClient, persistentIdsPath, maxTrackedTasks, null);
+    trackedAsyncTasks =
+        new SizeLimitedDistributedMap(zkClient, persistentIdsPath, maxTrackedTasks, null);
     inFlightAsyncTasks = new InFlightJobs(zkClient, inFlightIdsPath);
   }
 
   /**
-   * After a successful call to this method, caller MUST eventually call {@link #setTaskCompleted} or {@link #cancelAsyncId}
-   * otherwise the task will forever be considered as in progress.
+   * After a successful call to this method, caller MUST eventually call {@link #setTaskCompleted}
+   * or {@link #cancelAsyncId} otherwise the task will forever be considered as in progress.
+   *
    * @param asyncId if {@code null} this method will do nothing.
-   * @return {@code true} if the asyncId was not already in use (or is {@code null}) and {@code false} if it is already
-   * in use and can't be allocated again.
+   * @return {@code true} if the asyncId was not already in use (or is {@code null}) and {@code
+   *     false} if it is already in use and can't be allocated again.
    */
   public boolean createNewAsyncJobTracker(String asyncId) {
     if (asyncId == null) {
       return true;
     }
     try {
-      // First create the persistent node, with no content. If that fails, it means the asyncId has been previously used
+      // First create the persistent node, with no content. If that fails, it means the asyncId has
+      // been previously used
       // and not yet cleared...
       if (!trackedAsyncTasks.putIfAbsent(asyncId, null)) {
         return false;
       }
 
-      // ...then create the transient node. If the corresponding ephemeral node already exists, it means the persistent node
-      // was removed (maybe trackedAsyncTasks grew too large? It has a max size then evicts). We cannot then track the new
+      // ...then create the transient node. If the corresponding ephemeral node already exists, it
+      // means the persistent node
+      // was removed (maybe trackedAsyncTasks grew too large? It has a max size then evicts). We
+      // cannot then track the new
       // provided asyncId, and have simply "revived" its persistent node...

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -226,15 +244,20 @@ public void cancelAsyncId(String asyncId) {
       return new Pair<>(RequestStatusState.RUNNING, null);
     }
 
-    // The task has failed, but there are two options: if response is null, it has failed because the node on which it was
-    // running has crashed. If it is not null, it has failed because the execution has failed. Because caller expects a non
+    // The task has failed, but there are two options: if response is null, it has failed because
+    // the node on which it was
+    // running has crashed. If it is not null, it has failed because the execution has failed.
+    // Because caller expects a non
     // null response in any case, let's make up one if needed...
     if (response == null) {
-      // Node crash has removed the ephemeral node, but the command did not complete execution (or didn't even start it, who
+      // Node crash has removed the ephemeral node, but the command did not complete execution (or
+      // didn't even start it, who
       // knows). We have a failure to report though so let's create a reasonable return response.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -129,17 +148,21 @@ private boolean collectCurrentPropStats() {
       for (Replica replica : slice.getReplicas()) {
         if (onlyActiveNodes && isActive(replica) == false) {
           if (StringUtils.isNotBlank(replica.getStr(property))) {
-            removeProp(slice, replica.getName()); // Note, we won't be committing this to ZK until later.
+            removeProp(
+                slice, replica.getName()); // Note, we won't be committing this to ZK until later.
           }

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -342,12 +387,15 @@ boolean balanceProperty() {
       }
     }
 
-    // At this point, nodesHostingProp should contain _only_ lists of replicas that belong to slices that do _not_
+    // At this point, nodesHostingProp should contain _only_ lists of replicas that belong to slices
+    // that do _not_
     // have any replica hosting the property. So let's assign them.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -283,18 +310,22 @@ public void run() {
             continue;
           }
 
-          // clear the blocked tasks, may get refilled below. Given blockedTasks can only get entries from heads and heads
-          // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never exceed MAX_BLOCKED_TASKS entries.
-          // Note blockedTasks can't be cleared too early as it is used in the excludedTasks Predicate above.
+          // clear the blocked tasks, may get refilled below. Given blockedTasks can only get
+          // entries from heads and heads
+          // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never exceed MAX_BLOCKED_TASKS
+          // entries.
+          // Note blockedTasks can't be cleared too early as it is used in the excludedTasks
+          // Predicate above.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ShardLeaderElectionContext.java
##########
@@ -133,8 +157,10 @@ void runLeaderProcess(boolean weAreReplacement, int pauseBeforeStart) throws Kee
       }
 
       if (isClosed) {
-        // Solr is shutting down or the ZooKeeper session expired while waiting for replicas. If the later,
-        // we cannot be sure we are still the leader, so we should bail out. The OnReconnect handler will
+        // Solr is shutting down or the ZooKeeper session expired while waiting for replicas. If the
+        // later,
+        // we cannot be sure we are still the leader, so we should bail out. The OnReconnect handler
+        // will
         // re-register the cores and handle a new leadership election.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -564,9 +588,11 @@ public final void doSyncOrReplicateRecovery(SolrCore core) throws Exception {
 
     final String ourUrl = ZkCoreNodeProps.getCoreUrl(baseUrl, coreName);
     Future<RecoveryInfo> replayFuture = null;
-    while (!successfulRecovery && !Thread.currentThread().isInterrupted() && !isClosed()) { // don't use interruption or
-                                                                                            // it will close channels
-                                                                                            // though
+    while (!successfulRecovery
+        && !Thread.currentThread().isInterrupted()
+        && !isClosed()) { // don't use interruption or
+      // it will close channels
+      // though

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) should be separated.
-      // For now trying to diverge as little as possible from existing data structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of handling cluster state update.
+      // For now trying to diverge as little as possible from existing data structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of handling cluster state
+      // update.
       final Set<String> liveNodes = Collections.emptySet();
 
-      // Per Replica States updates are done before all other updates and not subject to the number of attempts of CAS
-      // made here, given they have their own CAS strategy and implementation (see PerReplicaStatesOps.persist()).
+      // Per Replica States updates are done before all other updates and not subject to the number
+      // of attempts of CAS
+      // made here, given they have their own CAS strategy and implementation (see
+      // PerReplicaStatesOps.persist()).

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -421,17 +480,26 @@ private void applyUpdate() throws KeeperException, InterruptedException {
         }
 
         // Get the latest version of the collection from the cluster state first.
-        // There is no notion of "cached" here (the boolean passed below) as we the updatedState is based on CollectionRef
-        DocCollection docCollection = updatedState.getCollectionOrNull(updater.getCollectionName(), true);
-
-        // If we did update per replica states and we're also updating state.json, update the content of state.json to reflect
-        // the changes made to replica states. Not strictly necessary (the state source of truth is in per replica states), but nice to have...
+        // There is no notion of "cached" here (the boolean passed below) as we the updatedState is
+        // based on CollectionRef
+        DocCollection docCollection =
+            updatedState.getCollectionOrNull(updater.getCollectionName(), true);
+
+        // If we did update per replica states and we're also updating state.json, update the
+        // content of state.json to reflect
+        // the changes made to replica states. Not strictly necessary (the state source of truth is
+        // in per replica states), but nice to have...

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -705,36 +850,54 @@ public ClusterState getUpdatedClusterState() {
     }
 
     /**
-     * Using optimistic locking (and retries when needed) updates Zookeeper with the changes previously recorded by calls
-     * to {@link #record(MutatingCommand, ZkNodeProps)}.
+     * Using optimistic locking (and retries when needed) updates Zookeeper with the changes
+     * previously recorded by calls to {@link #record(MutatingCommand, ZkNodeProps)}.
      */
-    public void executeStateUpdates(SolrCloudManager scm, ZkStateReader zkStateReader) throws KeeperException, InterruptedException {
+    public void executeStateUpdates(SolrCloudManager scm, ZkStateReader zkStateReader)
+        throws KeeperException, InterruptedException {
       if (log.isDebugEnabled()) {
-        log.debug("Executing updates for collection " + collectionName + ", is creation=" + isCollectionCreation + ", " + mutations.size() + " recorded mutations.", new Exception("StackTraceOnly")); // nowarn
+        log.debug(
+            "Executing updates for collection "
+                + collectionName
+                + ", is creation="
+                + isCollectionCreation
+                + ", "
+                + mutations.size()
+                + " recorded mutations.",
+            new Exception("StackTraceOnly")); // nowarn
       }
       if (mutations.isEmpty()) {
-        final String err = "Internal bug. Unexpected empty set of mutations to apply for collection " + collectionName;
+        final String err =
+            "Internal bug. Unexpected empty set of mutations to apply for collection "
+                + collectionName;
         log.error(err);
         throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, err);
       }
 
-      RecordedMutationsPlayer mutationPlayer = new RecordedMutationsPlayer(scm, collectionName, isCollectionCreation, mutations);
+      RecordedMutationsPlayer mutationPlayer =
+          new RecordedMutationsPlayer(scm, collectionName, isCollectionCreation, mutations);
       ZkUpdateApplicator.applyUpdate(zkStateReader, mutationPlayer);
 
       // TODO update stats here for the various commands executed successfully or not?
-      // This would replace the stats about cluster state updates that the Collection API currently makes available using
-      // the OVERSEERSTATUS command, but obviously would be per node and will not have stats about queues (since there
-      // will be no queues). Would be useful in some tests though, for example TestSkipOverseerOperations.
-      // Probably better to rethink what types of stats are expected from a distributed system rather than trying to present
+      // This would replace the stats about cluster state updates that the Collection API currently
+      // makes available using
+      // the OVERSEERSTATUS command, but obviously would be per node and will not have stats about
+      // queues (since there
+      // will be no queues). Would be useful in some tests though, for example
+      // TestSkipOverseerOperations.
+      // Probably better to rethink what types of stats are expected from a distributed system
+      // rather than trying to present
       // those previously provided by a central server in the system (the Overseer).

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -146,19 +157,21 @@ public void setTaskRunning(String asyncId) {
       throw new SolrException(SERVER_ERROR, "Error setting async task as running " + asyncId, ke);
     } catch (InterruptedException ie) {
       Thread.currentThread().interrupt();
-      throw new SolrException(SERVER_ERROR, "Interrupted setting async task as running " + asyncId, ie);
+      throw new SolrException(
+          SERVER_ERROR, "Interrupted setting async task as running " + asyncId, ie);
     }
   }
 
   /**
-   * Mark the completion (success or error) of an async task. The success or error is judged by the contents
-   * of the {@link OverseerSolrResponse}.
+   * Mark the completion (success or error) of an async task. The success or error is judged by the
+   * contents of the {@link OverseerSolrResponse}.
    */
   public void setTaskCompleted(String asyncId, OverseerSolrResponse solrResponse) {
     if (asyncId == null) {
       return;
     }
-    // First update the persistent node with the execution result, only then remove the transient node
+    // First update the persistent node with the execution result, only then remove the transient
+    // node
     // (otherwise a status check might report the task in error)

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -226,15 +244,20 @@ public void cancelAsyncId(String asyncId) {
       return new Pair<>(RequestStatusState.RUNNING, null);
     }
 
-    // The task has failed, but there are two options: if response is null, it has failed because the node on which it was
-    // running has crashed. If it is not null, it has failed because the execution has failed. Because caller expects a non
+    // The task has failed, but there are two options: if response is null, it has failed because
+    // the node on which it was
+    // running has crashed. If it is not null, it has failed because the execution has failed.
+    // Because caller expects a non
     // null response in any case, let's make up one if needed...

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -56,12 +56,14 @@
   private final DocCollection collection;
   private final String collectionName;
 
-  // Key structure. For each node, list all replicas on it regardless of whether they have the property or not.
+  // Key structure. For each node, list all replicas on it regardless of whether they have the
+  // property or not.
   private final Map<String, List<SliceReplica>> nodesHostingReplicas = new HashMap<>();
   // Key structure. For each node, a list of the replicas _currently_ hosting the property.
   private final Map<String, List<SliceReplica>> nodesHostingProp = new HashMap<>();
   Set<String> shardsNeedingHosts = new HashSet<>();
-  Map<String, Slice> changedSlices = new HashMap<>(); // Work on copies rather than the underlying cluster state.
+  Map<String, Slice> changedSlices =
+      new HashMap<>(); // Work on copies rather than the underlying cluster state.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -325,7 +369,8 @@ boolean balanceProperty() {
     // So, remove a replica from the nodes that have too many
     removeOverallocatedReplicas();
 
-    // prune replicas belonging to a slice that have the property currently assigned from the list of replicas
+    // prune replicas belonging to a slice that have the property currently assigned from the list
+    // of replicas
     // that could host the property.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedApiAsyncTracker.java
##########
@@ -93,48 +95,57 @@ public DistributedApiAsyncTracker(SolrZkClient zkClient, String rootPath) {
     persistentIdsPath = rootPath + ZK_ASYNC_PERSISTENT;
     inFlightIdsPath = rootPath + ZK_ASYNC_INFLIGHT;
 
-    trackedAsyncTasks = new SizeLimitedDistributedMap(zkClient, persistentIdsPath, maxTrackedTasks, null);
+    trackedAsyncTasks =
+        new SizeLimitedDistributedMap(zkClient, persistentIdsPath, maxTrackedTasks, null);
     inFlightAsyncTasks = new InFlightJobs(zkClient, inFlightIdsPath);
   }
 
   /**
-   * After a successful call to this method, caller MUST eventually call {@link #setTaskCompleted} or {@link #cancelAsyncId}
-   * otherwise the task will forever be considered as in progress.
+   * After a successful call to this method, caller MUST eventually call {@link #setTaskCompleted}
+   * or {@link #cancelAsyncId} otherwise the task will forever be considered as in progress.
+   *
    * @param asyncId if {@code null} this method will do nothing.
-   * @return {@code true} if the asyncId was not already in use (or is {@code null}) and {@code false} if it is already
-   * in use and can't be allocated again.
+   * @return {@code true} if the asyncId was not already in use (or is {@code null}) and {@code
+   *     false} if it is already in use and can't be allocated again.
    */
   public boolean createNewAsyncJobTracker(String asyncId) {
     if (asyncId == null) {
       return true;
     }
     try {
-      // First create the persistent node, with no content. If that fails, it means the asyncId has been previously used
+      // First create the persistent node, with no content. If that fails, it means the asyncId has
+      // been previously used
       // and not yet cleared...

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -208,7 +234,8 @@ private void balanceUnassignedReplicas() {
       SliceReplica srToChange = null;
       for (String slice : shardsNeedingHosts) {
         for (Map.Entry<String, List<SliceReplica>> ent : nodesHostingReplicas.entrySet()) {
-          // A little tricky. If we don't set this to something below, then it means all possible places to
+          // A little tricky. If we don't set this to something below, then it means all possible
+          // places to
           // put this property are full up, so just put it somewhere.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -157,24 +180,28 @@ private boolean collectCurrentPropStats() {
     }
 
     // If the total number of already-hosted properties assigned to nodes
-    // that have potential to host leaders is equal to the slice count _AND_ none of the current nodes has more than
+    // that have potential to host leaders is equal to the slice count _AND_ none of the current
+    // nodes has more than
     // the max number of properties, there's nothing to do.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -112,11 +128,14 @@ private boolean isActive(Replica replica) {
     return replica.getState() == Replica.State.ACTIVE;
   }
 
-  // Collect a list of all the nodes that _can_ host the indicated property. Along the way, also collect any of
-  // the replicas on that node that _already_ host the property as well as any slices that do _not_ have the
+  // Collect a list of all the nodes that _can_ host the indicated property. Along the way, also
+  // collect any of
+  // the replicas on that node that _already_ host the property as well as any slices that do _not_
+  // have the
   // property hosted.
   //
-  // Return true if anything node needs it's property reassigned. False if the property is already balanced for
+  // Return true if anything node needs it's property reassigned. False if the property is already
+  // balanced for
   // the collection.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -208,11 +217,17 @@ public void run() {
 
     // TODO: Make maxThreads configurable.
 
-    this.tpe = new ExecutorUtil.MDCAwareThreadPoolExecutor(5, MAX_PARALLEL_TASKS, 0L, TimeUnit.MILLISECONDS,
-        new SynchronousQueue<>(),
-        new SolrNamedThreadFactory("OverseerThreadFactory"));
-
-    // In OverseerCollectionMessageHandler, a new Session needs to be created for each new iteration over the tasks in the
+    this.tpe =
+        new ExecutorUtil.MDCAwareThreadPoolExecutor(
+            5,
+            MAX_PARALLEL_TASKS,
+            0L,
+            TimeUnit.MILLISECONDS,
+            new SynchronousQueue<>(),
+            new SolrNamedThreadFactory("OverseerThreadFactory"));
+
+    // In OverseerCollectionMessageHandler, a new Session needs to be created for each new iteration
+    // over the tasks in the
     // queue. Incrementing this id causes a new session to be created there.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -169,20 +173,24 @@ public void run() {
     LeaderStatus isLeader = amILeader();
     while (isLeader == LeaderStatus.DONT_KNOW) {
       log.debug("am_i_leader unclear {}", isLeader);
-      isLeader = amILeader();  // not a no, not a yes, try ask again
+      isLeader = amILeader(); // not a no, not a yes, try ask again
     }
 
     String oldestItemInWorkQueue = null;
-    // hasLeftOverItems - used for avoiding re-execution of async tasks that were processed by a previous Overseer.
-    // This variable is set in case there's any task found on the workQueue when the OCP starts up and
-    // the id for the queue tail is used as a marker to check for the task in completed/failed map in zk.
+    // hasLeftOverItems - used for avoiding re-execution of async tasks that were processed by a
+    // previous Overseer.
+    // This variable is set in case there's any task found on the workQueue when the OCP starts up
+    // and
+    // the id for the queue tail is used as a marker to check for the task in completed/failed map
+    // in zk.
     // Beyond the marker, all tasks can safely be assumed to have never been executed.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -169,20 +173,24 @@ public void run() {
     LeaderStatus isLeader = amILeader();
     while (isLeader == LeaderStatus.DONT_KNOW) {
       log.debug("am_i_leader unclear {}", isLeader);
-      isLeader = amILeader();  // not a no, not a yes, try ask again
+      isLeader = amILeader(); // not a no, not a yes, try ask again
     }
 
     String oldestItemInWorkQueue = null;
-    // hasLeftOverItems - used for avoiding re-execution of async tasks that were processed by a previous Overseer.
-    // This variable is set in case there's any task found on the workQueue when the OCP starts up and
-    // the id for the queue tail is used as a marker to check for the task in completed/failed map in zk.
+    // hasLeftOverItems - used for avoiding re-execution of async tasks that were processed by a
+    // previous Overseer.
+    // This variable is set in case there's any task found on the workQueue when the OCP starts up
+    // and
+    // the id for the queue tail is used as a marker to check for the task in completed/failed map
+    // in zk.
     // Beyond the marker, all tasks can safely be assumed to have never been executed.
     boolean hasLeftOverItems = true;
 
     try {
       oldestItemInWorkQueue = workQueue.getTailId();
     } catch (KeeperException e) {
-      // We don't need to handle this. This is just a fail-safe which comes in handy in skipping already processed
+      // We don't need to handle this. This is just a fail-safe which comes in handy in skipping
+      // already processed
       // async calls.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -332,27 +336,33 @@ final public void doRecovery(SolrCore core) throws Exception {
     }
   }
 
-  final private void doReplicateOnlyRecovery(SolrCore core) throws InterruptedException {
+  private final void doReplicateOnlyRecovery(SolrCore core) throws InterruptedException {
     final RTimer timer = new RTimer();
     boolean successfulRecovery = false;
 
     // if (core.getUpdateHandler().getUpdateLog() != null) {
-    // SolrException.log(log, "'replicate-only' recovery strategy should only be used if no update logs are present, but
+    // SolrException.log(log, "'replicate-only' recovery strategy should only be used if no update
+    // logs are present, but
     // this core has one: "
     // + core.getUpdateHandler().getUpdateLog());
     // return;
     // }
-    while (!successfulRecovery && !Thread.currentThread().isInterrupted() && !isClosed()) { // don't use interruption or
-                                                                                            // it will close channels
-                                                                                            // though
+    while (!successfulRecovery
+        && !Thread.currentThread().isInterrupted()
+        && !isClosed()) { // don't use interruption or
+      // it will close channels
+      // though

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -238,41 +256,50 @@ public void run() {
 
           while (runningTasks.size() > MAX_PARALLEL_TASKS) {
             synchronized (waitLock) {
-              waitLock.wait(100);//wait for 100 ms or till a task is complete
+              waitLock.wait(100); // wait for 100 ms or till a task is complete
             }
             waited = true;
           }
 
-          if (waited)
-            cleanUpWorkQueue();
+          if (waited) cleanUpWorkQueue();
 
           ArrayList<QueueEvent> heads = new ArrayList<>(blockedTasks.size() + MAX_PARALLEL_TASKS);
           heads.addAll(blockedTasks.values());
 
-          //If we have enough items in the blocked tasks already, it makes
+          // If we have enough items in the blocked tasks already, it makes
           // no sense to read more items from the work queue. it makes sense
           // to clear out at least a few items in the queue before we read more items
           if (heads.size() < MAX_BLOCKED_TASKS) {
-            //instead of reading MAX_PARALLEL_TASKS items always, we should only fetch as much as we can execute
-            int toFetch = Math.min(MAX_BLOCKED_TASKS - heads.size(), MAX_PARALLEL_TASKS - runningTasks.size());
+            // instead of reading MAX_PARALLEL_TASKS items always, we should only fetch as much as
+            // we can execute
+            int toFetch =
+                Math.min(
+                    MAX_BLOCKED_TASKS - heads.size(), MAX_PARALLEL_TASKS - runningTasks.size());
             List<QueueEvent> newTasks = workQueue.peekTopN(toFetch, excludedTasks, 2000L);
             if (log.isDebugEnabled()) {
               log.debug("Got {} tasks from work-queue : [{}]", newTasks.size(), newTasks);
             }
             // heads has at most MAX_BLOCKED_TASKS tasks.
             heads.addAll(newTasks);
           } else {
-            // The sleep below slows down spinning when heads is full from previous work dispatch attempt below and no new
-            // tasks got executed (all executors are busy or all waiting tasks require locks currently held by executors).
+            // The sleep below slows down spinning when heads is full from previous work dispatch
+            // attempt below and no new
+            // tasks got executed (all executors are busy or all waiting tasks require locks
+            // currently held by executors).
             //
-            // When heads is not full but no progress was made (no new work got dispatched in the for loop below), slowing down
+            // When heads is not full but no progress was made (no new work got dispatched in the
+            // for loop below), slowing down
             // of the spinning is done by the wait time in the call to workQueue.peekTopN() above.
-            // (at least in theory because the method eventually called from there is ZkDistributedQueue.peekElements()
-            // and because it filters out entries that have just completed on a Runner thread in a different way than the
-            // predicate based filtering, it can return quickly without waiting the configured delay time. Therefore spinning
+            // (at least in theory because the method eventually called from there is
+            // ZkDistributedQueue.peekElements()
+            // and because it filters out entries that have just completed on a Runner thread in a
+            // different way than the
+            // predicate based filtering, it can return quickly without waiting the configured delay
+            // time. Therefore spinning
             // can be observed, likely something to clean up at some point).
             //
-            // If heads is not empty and new tasks appeared in the queue there's no delay, workQueue.peekTopN() above will
+            // If heads is not empty and new tasks appeared in the queue there's no delay,
+            // workQueue.peekTopN() above will
             // return immediately.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ExclusiveSliceProperty.java
##########
@@ -157,24 +180,28 @@ private boolean collectCurrentPropStats() {
     }
 
     // If the total number of already-hosted properties assigned to nodes
-    // that have potential to host leaders is equal to the slice count _AND_ none of the current nodes has more than
+    // that have potential to host leaders is equal to the slice count _AND_ none of the current
+    // nodes has more than
     // the max number of properties, there's nothing to do.
     origMaxPropPerNode = collection.getSlices().size() / allHosts.size();
 
     // Some nodes can have one more of the proeprty if the numbers aren't exactly even.
     origModulo = collection.getSlices().size() % allHosts.size();
     if (origModulo > 0) {
-      origMaxPropPerNode++;  // have to have some nodes with 1 more property.
+      origMaxPropPerNode++; // have to have some nodes with 1 more property.
     }
 
-    // We can say for sure that we need to rebalance if we don't have as many assigned properties as slices.
+    // We can say for sure that we need to rebalance if we don't have as many assigned properties as
+    // slices.
     if (assigned != collection.getSlices().size()) {
       return true;
     }
 
     // Make sure there are no more slices at the limit than the "leftovers"
-    // Let's say there's 7 slices and 3 nodes. We need to distribute the property as 3 on node1, 2 on node2 and 2 on node3
-    // (3, 2, 2) We need to be careful to not distribute them as 3, 3, 1. that's what this check is all about.
+    // Let's say there's 7 slices and 3 nodes. We need to distribute the property as 3 on node1, 2
+    // on node2 and 2 on node3
+    // (3, 2, 2) We need to be careful to not distribute them as 3, 3, 1. that's what this check is
+    // all about.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ReplicateFromLeader.java
##########
@@ -80,29 +81,36 @@ public void startReplication(boolean switchTransactionLog) {
         pollIntervalStr = "00:00:01";
       }
       if (uinfo.autoCommmitMaxTime != -1) {
-        pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime/2);
+        pollIntervalStr = toPollIntervalStr(uinfo.autoCommmitMaxTime / 2);
       } else if (uinfo.autoSoftCommmitMaxTime != -1) {
-        pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime/2);
+        pollIntervalStr = toPollIntervalStr(uinfo.autoSoftCommmitMaxTime / 2);
       }
-      log.info("Will start replication from leader with poll interval: {}", pollIntervalStr );
+      log.info("Will start replication from leader with poll interval: {}", pollIntervalStr);
 
       NamedList<Object> followerConfig = new NamedList<>();
       followerConfig.add("fetchFromLeader", Boolean.TRUE);
 
-      // don't commit on leader version zero for PULL replicas as PULL should only get its index state from leader
+      // don't commit on leader version zero for PULL replicas as PULL should only get its index
+      // state from leader
       boolean skipCommitOnLeaderVersionZero = switchTransactionLog;
       if (!skipCommitOnLeaderVersionZero) {
         CloudDescriptor cloudDescriptor = core.getCoreDescriptor().getCloudDescriptor();
         if (cloudDescriptor != null) {
           Replica replica =
-              cc.getZkController().getZkStateReader().getCollection(cloudDescriptor.getCollectionName())
-                  .getSlice(cloudDescriptor.getShardId()).getReplica(cloudDescriptor.getCoreNodeName());
+              cc.getZkController()
+                  .getZkStateReader()
+                  .getCollection(cloudDescriptor.getCollectionName())
+                  .getSlice(cloudDescriptor.getShardId())
+                  .getReplica(cloudDescriptor.getCoreNodeName());
           if (replica != null && replica.getType() == Replica.Type.PULL) {
-            skipCommitOnLeaderVersionZero = true; // only set this to true if we're a PULL replica, otherwise use value of switchTransactionLog
+            skipCommitOnLeaderVersionZero =
+                true; // only set this to true if we're a PULL replica, otherwise use value of
+            // switchTransactionLog

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/api/ApiBag.java
##########
@@ -115,7 +122,8 @@ protected void attachValueToNode(PathTrie<Api>.Node node, Api o) {
         return;
       }
 
-      // If 'o' and 'node.obj' aren't both AnnotatedApi's then we can't aggregate the commands, so fallback to the
+      // If 'o' and 'node.obj' aren't both AnnotatedApi's then we can't aggregate the commands, so
+      // fallback to the
       // default behavior

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -273,25 +276,23 @@ final private void replicate(String nodeName, SolrCore core, ZkNodeProps leaderp
         log.debug("Error in solrcloud_debug block", e);
       }
     }
-
   }
 
-  final private void commitOnLeader(String leaderUrl) throws SolrServerException,
-      IOException {
+  private final void commitOnLeader(String leaderUrl) throws SolrServerException, IOException {
     try (HttpSolrClient client = recoverySolrClientBuilder(leaderUrl).build()) {
       UpdateRequest ureq = new UpdateRequest();
       ureq.setParams(new ModifiableSolrParams());
       // ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true);
-      // ureq.getParams().set(UpdateParams.OPEN_SEARCHER, onlyLeaderIndexes);// Why do we need to open searcher if
+      // ureq.getParams().set(UpdateParams.OPEN_SEARCHER, onlyLeaderIndexes);// Why do we need to
+      // open searcher if
       // "onlyLeaderIndexes"?

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ShardLeaderElectionContextBase.java
##########
@@ -89,10 +95,15 @@ public void cancelElection() throws InterruptedException, KeeperException {
         // no problem
         try {
           // We need to be careful and make sure we *only* delete our own leader registration node.
-          // We do this by using a multi and ensuring the parent znode of the leader registration node
-          // matches the version we expect - there is a setData call that increments the parent's znode
+          // We do this by using a multi and ensuring the parent znode of the leader registration
+          // node
+          // matches the version we expect - there is a setData call that increments the parent's
+          // znode
           // version whenever a leader registers.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -541,15 +563,17 @@ public final void doSyncOrReplicateRecovery(SolrCore core) throws Exception {
     }
 
     if (recoveringAfterStartup) {
-      // if we're recovering after startup (i.e. we have been down), then we need to know what the last versions were
+      // if we're recovering after startup (i.e. we have been down), then we need to know what the
+      // last versions were
       // when we went down. We may have received updates since then.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/RecoveryStrategy.java
##########
@@ -332,27 +336,33 @@ final public void doRecovery(SolrCore core) throws Exception {
     }
   }
 
-  final private void doReplicateOnlyRecovery(SolrCore core) throws InterruptedException {
+  private final void doReplicateOnlyRecovery(SolrCore core) throws InterruptedException {
     final RTimer timer = new RTimer();
     boolean successfulRecovery = false;
 
     // if (core.getUpdateHandler().getUpdateLog() != null) {
-    // SolrException.log(log, "'replicate-only' recovery strategy should only be used if no update logs are present, but
+    // SolrException.log(log, "'replicate-only' recovery strategy should only be used if no update
+    // logs are present, but
     // this core has one: "
     // + core.getUpdateHandler().getUpdateLog());
     // return;
     // }
-    while (!successfulRecovery && !Thread.currentThread().isInterrupted() && !isClosed()) { // don't use interruption or
-                                                                                            // it will close channels
-                                                                                            // though
+    while (!successfulRecovery
+        && !Thread.currentThread().isInterrupted()
+        && !isClosed()) { // don't use interruption or
+      // it will close channels
+      // though
       try {
         CloudDescriptor cloudDesc = this.coreDescriptor.getCloudDescriptor();
-        ZkNodeProps leaderprops = zkStateReader.getLeaderRetry(cloudDesc.getCollectionName(), cloudDesc.getShardId());
+        ZkNodeProps leaderprops =
+            zkStateReader.getLeaderRetry(cloudDesc.getCollectionName(), cloudDesc.getShardId());
         final String leaderUrl = ZkCoreNodeProps.getCoreUrl(leaderprops);
         final String ourUrl = ZkCoreNodeProps.getCoreUrl(baseUrl, coreName);
 
-        boolean isLeader = ourUrl.equals(leaderUrl); // TODO: We can probably delete most of this code if we say this
-                                                     // strategy can only be used for pull replicas
+        boolean isLeader =
+            ourUrl.equals(
+                leaderUrl); // TODO: We can probably delete most of this code if we say this
+        // strategy can only be used for pull replicas

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##########
@@ -310,30 +336,44 @@ else if (LeaderStatus.YES != isLeader) {
                 byte[] data = head.second();
                 final ZkNodeProps message = ZkNodeProps.load(data);
                 if (log.isDebugEnabled()) {
-                  log.debug("processMessage: queueSize: {}, message = {}", stateUpdateQueue.getZkStats().getQueueLength(), message);
+                  log.debug(
+                      "processMessage: queueSize: {}, message = {}",
+                      stateUpdateQueue.getZkStats().getQueueLength(),
+                      message);
                 }
 
                 processedNodes.add(head.first());
                 fallbackQueueSize = processedNodes.size();
-                // force flush to ZK after each message because there is no fallback if workQueue items
+                // force flush to ZK after each message because there is no fallback if workQueue
+                // items
                 // are removed from workQueue but fail to be written to ZK
                 while (unprocessedMessages.size() > 0) {
                   clusterState = zkStateWriter.writePendingUpdates();
                   Message m = unprocessedMessages.remove(0);
                   clusterState = m.run(clusterState, Overseer.this);
                 }
                 // The callback always be called on this thread
-                clusterState = processQueueItem(message, clusterState, zkStateWriter, true, () -> {
-                  stateUpdateQueue.remove(processedNodes);
-                  processedNodes.clear();
-                });
+                clusterState =
+                    processQueueItem(
+                        message,
+                        clusterState,
+                        zkStateWriter,
+                        true,
+                        () -> {
+                          stateUpdateQueue.remove(processedNodes);
+                          processedNodes.clear();
+                        });
               }
               if (isClosed) break;
               // if an event comes in the next 100ms batch it together
-              queue = new LinkedList<>(stateUpdateQueue.peekElements(1000, 100, node -> !processedNodes.contains(node)));
+              queue =
+                  new LinkedList<>(
+                      stateUpdateQueue.peekElements(
+                          1000, 100, node -> !processedNodes.contains(node)));
             }
             fallbackQueueSize = processedNodes.size();
-            // we should force write all pending updates because the next iteration might sleep until there
+            // we should force write all pending updates because the next iteration might sleep
+            // until there
             // are more items in the main queue

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##########
@@ -627,49 +707,73 @@ public Overseer(HttpShardHandler shardHandler,
     this.zkController = zkController;
     this.stats = new Stats();
     this.config = config;
-    this.distributedClusterStateUpdater = new DistributedClusterStateUpdater(config.getDistributedClusterStateUpdates());
-
-    this.solrMetricsContext = new SolrMetricsContext(zkController.getCoreContainer().getMetricManager(), SolrInfoBean.Group.overseer.toString(), metricTag);
+    this.distributedClusterStateUpdater =
+        new DistributedClusterStateUpdater(config.getDistributedClusterStateUpdates());
+
+    this.solrMetricsContext =
+        new SolrMetricsContext(
+            zkController.getCoreContainer().getMetricManager(),
+            SolrInfoBean.Group.overseer.toString(),
+            metricTag);
   }
 
   public synchronized void start(String id) {
-    MDCLoggingContext.setNode(zkController == null ?
-        null :
-        zkController.getNodeName());
+    MDCLoggingContext.setNode(zkController == null ? null : zkController.getNodeName());
     this.id = id;
     closed = false;
     doClose();
     stats = new Stats();
     log.info("Overseer (id={}) starting", id);
     createOverseerNode(reader.getZkClient());
-    //launch cluster state updater thread
+    // launch cluster state updater thread
     ThreadGroup tg = new ThreadGroup("Overseer state updater.");
-    updaterThread = new OverseerThread(tg, new ClusterStateUpdater(reader, id, stats), "OverseerStateUpdate-" + id);
+    updaterThread =
+        new OverseerThread(
+            tg, new ClusterStateUpdater(reader, id, stats), "OverseerStateUpdate-" + id);
     updaterThread.setDaemon(true);
 
     ThreadGroup ccTg = new ThreadGroup("Overseer collection creation process.");
 
-    // Below is the only non test usage of the "cluster state update" queue even when distributed cluster state updates are enabled.
-    // That queue is used to tell the Overseer to quit. As long as we have an Overseer, we need to support this.
-    OverseerNodePrioritizer overseerPrioritizer = new OverseerNodePrioritizer(reader, this, adminPath, shardHandler.getShardHandlerFactory());
-    overseerCollectionConfigSetProcessor = new OverseerCollectionConfigSetProcessor(reader, id, shardHandler, adminPath, stats, Overseer.this, overseerPrioritizer, solrMetricsContext);
-    ccThread = new OverseerThread(ccTg, overseerCollectionConfigSetProcessor, "OverseerCollectionConfigSetProcessor-" + id);
+    // Below is the only non test usage of the "cluster state update" queue even when distributed
+    // cluster state updates are enabled.
+    // That queue is used to tell the Overseer to quit. As long as we have an Overseer, we need to
+    // support this.
+    OverseerNodePrioritizer overseerPrioritizer =

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) should be separated.
-      // For now trying to diverge as little as possible from existing data structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of handling cluster state update.
+      // For now trying to diverge as little as possible from existing data structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of handling cluster state
+      // update.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/Overseer.java
##########
@@ -1047,19 +1172,26 @@ private void createOverseerNode(final SolrZkClient zkClient) {
       throw new RuntimeException(e);
     }
   }
-  
+
   public ZkStateReader getZkStateReader() {
     return reader;
   }
 
   public void offerStateUpdate(byte[] data) throws KeeperException, InterruptedException {
-    // When cluster state update is distributed, the Overseer cluster state update queue should only ever receive QUIT messages.
+    // When cluster state update is distributed, the Overseer cluster state update queue should only
+    // ever receive QUIT messages.
     // These go to sendQuitToOverseer for execution path clarity.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) should be separated.
-      // For now trying to diverge as little as possible from existing data structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of handling cluster state update.
+      // For now trying to diverge as little as possible from existing data structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of handling cluster state
+      // update.
       final Set<String> liveNodes = Collections.emptySet();
 
-      // Per Replica States updates are done before all other updates and not subject to the number of attempts of CAS
-      // made here, given they have their own CAS strategy and implementation (see PerReplicaStatesOps.persist()).
+      // Per Replica States updates are done before all other updates and not subject to the number
+      // of attempts of CAS
+      // made here, given they have their own CAS strategy and implementation (see
+      // PerReplicaStatesOps.persist()).
       boolean firstAttempt = true;
 
-      // When there are multiple retries of state.json write and the cluster state gets updated over and over again with
+      // When there are multiple retries of state.json write and the cluster state gets updated over
+      // and over again with
       // the changes done in the per replica states, we avoid refetching those multiple times.
       PerReplicaStates fetchedPerReplicaStates = null;
 
-      // Later on (when Collection API commands are distributed) we will have to rely on the version of state.json
-      // to implement the replacement of Collection API locking. Then we should not blindly retry cluster state updates
-      // as we do here but instead intelligently fail (or retry completely) the Collection API call when seeing that
+      // Later on (when Collection API commands are distributed) we will have to rely on the version
+      // of state.json
+      // to implement the replacement of Collection API locking. Then we should not blindly retry
+      // cluster state updates
+      // as we do here but instead intelligently fail (or retry completely) the Collection API call
+      // when seeing that
       // state.json was changed by a concurrent command execution.
-      // The loop below is ok for distributing cluster state updates from Overseer to all nodes while Collection API
+      // The loop below is ok for distributing cluster state updates from Overseer to all nodes
+      // while Collection API
       // commands are still executed on the Overseer and manage their locking the old fashioned way.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -394,9 +450,12 @@ private void applyUpdate() throws KeeperException, InterruptedException {
           initialClusterState = fetchStateForCollection();
         }
 
-        // Apply the desired changes. Note that the cluster state passed to the chain of mutators is totally up to date
-        // (it's read from ZK just above). So assumptions made in the mutators (like SliceMutator.removeReplica() deleting
-        // the whole collection if it's not found) are ok. Actually in the removeReplica case, the collection will always
+        // Apply the desired changes. Note that the cluster state passed to the chain of mutators is
+        // totally up to date
+        // (it's read from ZK just above). So assumptions made in the mutators (like
+        // SliceMutator.removeReplica() deleting
+        // the whole collection if it's not found) are ok. Actually in the removeReplica case, the
+        // collection will always
         // exist otherwise the call to fetchStateForCollection() above would have failed.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -365,27 +411,37 @@ private void applyUpdate() throws KeeperException, InterruptedException {
 
       // Note we DO NOT track nor use the live nodes in the cluster state.
       // That may means the two abstractions (collection metadata vs. nodes) should be separated.
-      // For now trying to diverge as little as possible from existing data structures and code given the need to
-      // support both the old way (Overseer) and new way (distributed) of handling cluster state update.
+      // For now trying to diverge as little as possible from existing data structures and code
+      // given the need to
+      // support both the old way (Overseer) and new way (distributed) of handling cluster state
+      // update.
       final Set<String> liveNodes = Collections.emptySet();
 
-      // Per Replica States updates are done before all other updates and not subject to the number of attempts of CAS
-      // made here, given they have their own CAS strategy and implementation (see PerReplicaStatesOps.persist()).
+      // Per Replica States updates are done before all other updates and not subject to the number
+      // of attempts of CAS
+      // made here, given they have their own CAS strategy and implementation (see
+      // PerReplicaStatesOps.persist()).
       boolean firstAttempt = true;
 
-      // When there are multiple retries of state.json write and the cluster state gets updated over and over again with
+      // When there are multiple retries of state.json write and the cluster state gets updated over
+      // and over again with
       // the changes done in the per replica states, we avoid refetching those multiple times.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -441,58 +509,83 @@ private void applyUpdate() throws KeeperException, InterruptedException {
           return; // state.json updated successfully.
         } catch (KeeperException.BadVersionException bve) {
           if (updater.isCollectionCreation()) {
-            // Not expecting to see this exception when creating new state.json fails, so throwing it up the food chain.
+            // Not expecting to see this exception when creating new state.json fails, so throwing
+            // it up the food chain.
             throw bve;
           }
         }
-        // We've tried to update an existing state.json and got a BadVersionException. We'll try again a few times.
-        // When only two threads compete, no point in waiting: if we lost this time we'll get it next time right away.
-        // But if more threads compete, then waiting a bit (random delay) can improve our chances. The delay should in
-        // theory grow as the number of concurrent threads attempting updates increase, but we don't know that number, so
+        // We've tried to update an existing state.json and got a BadVersionException. We'll try
+        // again a few times.
+        // When only two threads compete, no point in waiting: if we lost this time we'll get it
+        // next time right away.
+        // But if more threads compete, then waiting a bit (random delay) can improve our chances.
+        // The delay should in
+        // theory grow as the number of concurrent threads attempting updates increase, but we don't
+        // know that number, so
         // doing exponential backoff instead.
-        // With "per replica states" collections, concurrent attempts of even just two threads are expected to be extremely rare.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -441,58 +509,83 @@ private void applyUpdate() throws KeeperException, InterruptedException {
           return; // state.json updated successfully.
         } catch (KeeperException.BadVersionException bve) {
           if (updater.isCollectionCreation()) {
-            // Not expecting to see this exception when creating new state.json fails, so throwing it up the food chain.
+            // Not expecting to see this exception when creating new state.json fails, so throwing
+            // it up the food chain.
             throw bve;
           }
         }
-        // We've tried to update an existing state.json and got a BadVersionException. We'll try again a few times.
-        // When only two threads compete, no point in waiting: if we lost this time we'll get it next time right away.
-        // But if more threads compete, then waiting a bit (random delay) can improve our chances. The delay should in
-        // theory grow as the number of concurrent threads attempting updates increase, but we don't know that number, so
+        // We've tried to update an existing state.json and got a BadVersionException. We'll try
+        // again a few times.
+        // When only two threads compete, no point in waiting: if we lost this time we'll get it
+        // next time right away.
+        // But if more threads compete, then waiting a bit (random delay) can improve our chances.
+        // The delay should in
+        // theory grow as the number of concurrent threads attempting updates increase, but we don't
+        // know that number, so
         // doing exponential backoff instead.
-        // With "per replica states" collections, concurrent attempts of even just two threads are expected to be extremely rare.
-        Thread.sleep(CollectionHandlingUtils.RANDOM.nextInt(attempt < 13 ? 1 << attempt : 1 << 13)); // max wait 2^13ms=8.192 sec
+        // With "per replica states" collections, concurrent attempts of even just two threads are
+        // expected to be extremely rare.
+        Thread.sleep(
+            CollectionHandlingUtils.RANDOM.nextInt(
+                attempt < 13 ? 1 << attempt : 1 << 13)); // max wait 2^13ms=8.192 sec
       }
 
-      // We made quite a few attempts but failed repeatedly. This is pretty bad but we can't loop trying forever.
-      // Offering a job to the Overseer wouldn't usually fail if the ZK queue can be written to (but the Overseer can then
+      // We made quite a few attempts but failed repeatedly. This is pretty bad but we can't loop
+      // trying forever.
+      // Offering a job to the Overseer wouldn't usually fail if the ZK queue can be written to (but
+      // the Overseer can then
       // loop forever attempting the update).
-      // We do want whoever called us to fail right away rather than to wait for a cluster change and timeout because it
-      // didn't happen. Likely need to review call by call what is the appropriate behaviour, especially once Collection
-      // API is distributed (because then the Collection API call will fail if the underlying cluster state update cannot
+      // We do want whoever called us to fail right away rather than to wait for a cluster change
+      // and timeout because it
+      // didn't happen. Likely need to review call by call what is the appropriate behaviour,
+      // especially once Collection
+      // API is distributed (because then the Collection API call will fail if the underlying
+      // cluster state update cannot
       // be done, and that's a desirable thing).

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -681,11 +820,17 @@ public void computeUpdates(ClusterState clusterState, SolrZkClient client) {
               perReplicaStateOps.add(zkcmd.ops);
             }
           } catch (Exception e) {
-            // Seems weird to skip rather than fail, but that's what Overseer is doing (see ClusterStateUpdater.processQueueItem()).
-            // Maybe in the new distributed update world we should make the caller fail? (something Overseer cluster state updater can't do)
-            // To be reconsidered once Collection API commands are distributed because then cluster updates are done synchronously and
+            // Seems weird to skip rather than fail, but that's what Overseer is doing (see
+            // ClusterStateUpdater.processQueueItem()).
+            // Maybe in the new distributed update world we should make the caller fail? (something
+            // Overseer cluster state updater can't do)
+            // To be reconsidered once Collection API commands are distributed because then cluster
+            // updates are done synchronously and
             // have the opportunity to make the Collection API call fail directly.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -747,36 +910,47 @@ public void executeStateUpdates(SolrCloudManager scm, ZkStateReader zkStateReade
     private List<PerReplicaStatesOps> replicaOpsList = null;
 
     /**
-     * Entry point to mark all replicas of all collections present on a single node as being DOWN (because the node is down)
+     * Entry point to mark all replicas of all collections present on a single node as being DOWN
+     * (because the node is down)
      */
     public static void executeNodeDownStateUpdate(String nodeName, ZkStateReader zkStateReader) {
-      // This code does a version of what NodeMutator.downNode() is doing. We can't assume we have a cache of the collections,
-      // so we're going to read all of them from ZK, fetch the state.json for each and if it has any replicas on the
+      // This code does a version of what NodeMutator.downNode() is doing. We can't assume we have a
+      // cache of the collections,
+      // so we're going to read all of them from ZK, fetch the state.json for each and if it has any
+      // replicas on the
       // failed node, do an update (conditional of course) of the state.json
 
-      // For Per Replica States collections there is still a need to read state.json, but the update of state.json is replaced
-      // by a few znode deletions and creations. Might be faster or slower overall, depending on the number of impacted
+      // For Per Replica States collections there is still a need to read state.json, but the update
+      // of state.json is replaced
+      // by a few znode deletions and creations. Might be faster or slower overall, depending on the
+      // number of impacted
       // replicas of such a collection and the total size of that collection's state.json.
 
-      // Note code here also has to duplicate some of the work done in ZkStateReader because ZkStateReader couples reading of
-      // the cluster state and maintaining a cached copy of the cluster state. Something likely to be refactored later (once
+      // Note code here also has to duplicate some of the work done in ZkStateReader because
+      // ZkStateReader couples reading of
+      // the cluster state and maintaining a cached copy of the cluster state. Something likely to
+      // be refactored later (once
       // Overseer is totally removed and Zookeeper access patterns become clearer).

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/DistributedClusterStateUpdater.java
##########
@@ -441,58 +509,83 @@ private void applyUpdate() throws KeeperException, InterruptedException {
           return; // state.json updated successfully.
         } catch (KeeperException.BadVersionException bve) {
           if (updater.isCollectionCreation()) {
-            // Not expecting to see this exception when creating new state.json fails, so throwing it up the food chain.
+            // Not expecting to see this exception when creating new state.json fails, so throwing
+            // it up the food chain.
             throw bve;
           }
         }
-        // We've tried to update an existing state.json and got a BadVersionException. We'll try again a few times.
-        // When only two threads compete, no point in waiting: if we lost this time we'll get it next time right away.
-        // But if more threads compete, then waiting a bit (random delay) can improve our chances. The delay should in
-        // theory grow as the number of concurrent threads attempting updates increase, but we don't know that number, so
+        // We've tried to update an existing state.json and got a BadVersionException. We'll try
+        // again a few times.
+        // When only two threads compete, no point in waiting: if we lost this time we'll get it
+        // next time right away.
+        // But if more threads compete, then waiting a bit (random delay) can improve our chances.
+        // The delay should in
+        // theory grow as the number of concurrent threads attempting updates increase, but we don't
+        // know that number, so
         // doing exponential backoff instead.
-        // With "per replica states" collections, concurrent attempts of even just two threads are expected to be extremely rare.
-        Thread.sleep(CollectionHandlingUtils.RANDOM.nextInt(attempt < 13 ? 1 << attempt : 1 << 13)); // max wait 2^13ms=8.192 sec
+        // With "per replica states" collections, concurrent attempts of even just two threads are
+        // expected to be extremely rare.
+        Thread.sleep(
+            CollectionHandlingUtils.RANDOM.nextInt(
+                attempt < 13 ? 1 << attempt : 1 << 13)); // max wait 2^13ms=8.192 sec
       }
 
-      // We made quite a few attempts but failed repeatedly. This is pretty bad but we can't loop trying forever.
-      // Offering a job to the Overseer wouldn't usually fail if the ZK queue can be written to (but the Overseer can then
+      // We made quite a few attempts but failed repeatedly. This is pretty bad but we can't loop
+      // trying forever.
+      // Offering a job to the Overseer wouldn't usually fail if the ZK queue can be written to (but
+      // the Overseer can then
       // loop forever attempting the update).
-      // We do want whoever called us to fail right away rather than to wait for a cluster change and timeout because it
-      // didn't happen. Likely need to review call by call what is the appropriate behaviour, especially once Collection
-      // API is distributed (because then the Collection API call will fail if the underlying cluster state update cannot
+      // We do want whoever called us to fail right away rather than to wait for a cluster change
+      // and timeout because it
+      // didn't happen. Likely need to review call by call what is the appropriate behaviour,
+      // especially once Collection
+      // API is distributed (because then the Collection API call will fail if the underlying
+      // cluster state update cannot
       // be done, and that's a desirable thing).
-      throw new KeeperException.BadVersionException(ZkStateReader.getCollectionPath(updater.getCollectionName()));
+      throw new KeeperException.BadVersionException(
+          ZkStateReader.getCollectionPath(updater.getCollectionName()));
     }
 
     /**
-     * After the computing of the new {@link ClusterState} containing all needed updates to the collection based on what the
-     * {@link StateChangeCalculator} computed, this method does an update in ZK to the collection's {@code state.json}. It is the
-     * equivalent of Overseer's {@link ZkStateWriter#writePendingUpdates} (in its actions related to {@code state.json}
-     * as opposed to the per replica states).
-     * <p>
-     * Note that in a similar way to what happens in {@link ZkStateWriter#writePendingUpdates}, collection delete is handled
-     * as a special case. (see comment on {@link DistributedClusterStateUpdater.StateChangeRecorder.RecordedMutationsPlayer}
-     * on why the code has to be duplicated)<p>
+     * After the computing of the new {@link ClusterState} containing all needed updates to the
+     * collection based on what the {@link StateChangeCalculator} computed, this method does an
+     * update in ZK to the collection's {@code state.json}. It is the equivalent of Overseer's
+     * {@link ZkStateWriter#writePendingUpdates} (in its actions related to {@code state.json} as
+     * opposed to the per replica states).
+     *
+     * <p>Note that in a similar way to what happens in {@link ZkStateWriter#writePendingUpdates},
+     * collection delete is handled as a special case. (see comment on {@link
+     * DistributedClusterStateUpdater.StateChangeRecorder.RecordedMutationsPlayer} on why the code
+     * has to be duplicated)
      *
-     * <b>Note for the future:</b> Given this method is where the actually write to ZK is done, that's the place where we
-     * can rebuild a DocCollection with updated zk version. Eventually if we maintain a cache of recently used collections,
-     * we want to capture the updated collection and put it in the cache to avoid reading it again (unless it changed,
-     * the CAS will fail and we will refresh).<p>
+     * <p><b>Note for the future:</b> Given this method is where the actually write to ZK is done,
+     * that's the place where we can rebuild a DocCollection with updated zk version. Eventually if
+     * we maintain a cache of recently used collections, we want to capture the updated collection
+     * and put it in the cache to avoid reading it again (unless it changed, the CAS will fail and
+     * we will refresh).
      *
-     * This could serve as the basis for a strategy where each node does not need any view of all collections in the cluster
-     * but only a cache of recently used collections (possibly not even needing watches on them, but we'll discuss this later).
+     * <p>This could serve as the basis for a strategy where each node does not need any view of all
+     * collections in the cluster but only a cache of recently used collections (possibly not even
+     * needing watches on them, but we'll discuss this later).
      */
-    private void doStateDotJsonCasUpdate(ClusterState updatedState) throws KeeperException, InterruptedException {
+    private void doStateDotJsonCasUpdate(ClusterState updatedState)
+        throws KeeperException, InterruptedException {
       String jsonPath = ZkStateReader.getCollectionPath(updater.getCollectionName());
 
       // Collection delete
       if (!updatedState.hasCollection(updater.getCollectionName())) {
-        // We do not have a collection znode version to test we delete the right version of state.json. But this doesn't really matter:
-        // if we had one, and the delete failed (because state.json got updated in the meantime), we would re-read the collection
-        // state, update our version, run the CAS delete again and it will pass. Which means that one way or another, deletes are final.
-        // I hope nobody deletes a collection then creates a new one with the same name immediately (although the creation should fail
-        // if the znode still exists, so the creation would only succeed after the delete made it, and we're ok).
-        // With Overseer based updates the same behavior can be observed: a collection update is enqueued followed by the
+        // We do not have a collection znode version to test we delete the right version of
+        // state.json. But this doesn't really matter:
+        // if we had one, and the delete failed (because state.json got updated in the meantime), we
+        // would re-read the collection
+        // state, update our version, run the CAS delete again and it will pass. Which means that
+        // one way or another, deletes are final.
+        // I hope nobody deletes a collection then creates a new one with the same name immediately
+        // (although the creation should fail
+        // if the znode still exists, so the creation would only succeed after the delete made it,
+        // and we're ok).
+        // With Overseer based updates the same behavior can be observed: a collection update is
+        // enqueued followed by the
         // collection delete before the update was executed.

Review comment:
       Fix this

##########
File path: solr/core/src/java/org/apache/solr/cloud/ZkDistributedLockFactory.java
##########
@@ -35,11 +35,15 @@
 
   protected DistributedLock doCreateLock(boolean isWriteLock, String lockPath) {
     try {
-      // TODO optimize by first attempting to create the ZkDistributedLock without calling makeLockPath() and only call it
-      //  if the lock creation fails. This will be less costly on high contention (and slightly more on low contention)
+      // TODO optimize by first attempting to create the ZkDistributedLock without calling
+      // makeLockPath() and only call it
+      //  if the lock creation fails. This will be less costly on high contention (and slightly more
+      // on low contention)

Review comment:
       Fix this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org