You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2020/06/11 17:39:18 UTC

[GitHub] [lucene-solr] madrob commented on a change in pull request #1561: SOLR-14546: OverseerTaskProcessor can process messages out of order

madrob commented on a change in pull request #1561:
URL: https://github.com/apache/lucene-solr/pull/1561#discussion_r438907365



##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerMessageHandler.java
##########
@@ -50,7 +50,7 @@
   /**Try to provide an exclusive lock for this particular task
    * return null if locking is not possible. If locking is not necessary

Review comment:
       This javadoc includes a sentence fragment, can we complete the thought while we're improving documentation in this area?

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set<String> runningZKTasks;

Review comment:
       Since there is so much synchronized access to this, should it be a `ConcurrentHashMap.newKeySet();`

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set<String> runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map<String, QueueEvent> blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>());
-  final private Predicate<String> excludedTasks = new Predicate<String>() {
+
+  /**
+   * Predicate used to filter out tasks from the Zookeeper queue that should not be returned for processing.
+   */
+  final private Predicate<String> excludedTasks = new Predicate<>() {
     @Override
     public boolean test(String s) {
       return runningTasks.contains(s) || blockedTasks.containsKey(s);

Review comment:
       This reference to runningTasks isn't synchronized. Is that an issue?

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -95,16 +95,25 @@
 
   private volatile Stats stats;
 
-  // Set of tasks that have been picked up for processing but not cleaned up from zk work-queue.
-  // It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not
-  // deleted from the work-queue as that is a batched operation.
+  /**
+   * Set of tasks that have been picked up for processing but not cleaned up from zk work-queue.
+   * It may contain tasks that have completed execution, have been entered into the completed/failed map in zk but not
+   * deleted from the work-queue as that is a batched operation.
+   */
   final private Set<String> runningZKTasks;
-  // This map may contain tasks which are read from work queue but could not
-  // be executed because they are blocked or the execution queue is full
-  // This is an optimization to ensure that we do not read the same tasks
-  // again and again from ZK.
+
+  /**
+   * This map may contain tasks which are read from work queue but could not
+   * be executed because they are blocked or the execution queue is full
+   * This is an optimization to ensure that we do not read the same tasks
+   * again and again from ZK.
+   */
   final private Map<String, QueueEvent> blockedTasks = Collections.synchronizedMap(new LinkedHashMap<>());

Review comment:
       Similar here, can this be a ConcurrentHashMap instead of a synchronized map?

##########
File path: solr/core/src/java/org/apache/solr/cloud/OverseerTaskProcessor.java
##########
@@ -253,20 +277,22 @@ public void run() {
             continue;
           }
 
-          blockedTasks.clear(); // clear it now; may get refilled below.
+          // clear the blocked tasks, may get refilled below. Given blockedTasks can only get entries from heads and heads
+          // has at most MAX_BLOCKED_TASKS tasks, blockedTasks will never exceed MAX_BLOCKED_TASKS entries.
+          // Note blockedTasks can't be cleared too early as it is used in the excludedTasks Predicate above.
+          blockedTasks.clear();
+
+          // Trigger the creation of a new Session used for locking when/if a lock is later acquired on the OverseerCollectionMessageHandler
+          batchSessionId++;
 
-          taskBatch.batchId++;
           boolean tooManyTasks = false;
           for (QueueEvent head : heads) {
             if (!tooManyTasks) {
-              synchronized (runningTasks) {
                 tooManyTasks = runningTasksSize() >= MAX_PARALLEL_TASKS;
-              }
             }
             if (tooManyTasks) {
               // Too many tasks are running, just shove the rest into the "blocked" queue.
-              if(blockedTasks.size() < MAX_BLOCKED_TASKS)
-                blockedTasks.put(head.getId(), head);
+              blockedTasks.put(head.getId(), head);

Review comment:
       Why do we no longer need to check against the upper limit of BLOCKED_TASKS?

##########
File path: solr/core/src/java/org/apache/solr/cloud/api/collections/OverseerCollectionMessageHandler.java
##########
@@ -867,26 +866,25 @@ public String getTaskKey(ZkNodeProps message) {
   }
 
 
+  // -1 is not a possible batchSessionId so -1 will force initialization of lockSession
   private long sessionId = -1;
   private LockTree.Session lockSession;
 
   @Override
-  public Lock lockTask(ZkNodeProps message, OverseerTaskProcessor.TaskBatch taskBatch) {
-    if (lockSession == null || sessionId != taskBatch.getId()) {
+  public Lock lockTask(ZkNodeProps message, long batchSessionId) {
+    if (sessionId != batchSessionId) {
       //this is always called in the same thread.
       //Each batch is supposed to have a new taskBatch
       //So if taskBatch changes we must create a new Session
-      // also check if the running tasks are empty. If yes, clear lockTree
-      // this will ensure that locks are not 'leaked'
-      if(taskBatch.getRunningTasks() == 0) lockTree.clear();

Review comment:
       This is the only usage of clear in the code, can we remove that method completely? Is this safe to not call clear?

##########
File path: solr/core/src/java/org/apache/solr/cloud/LockTree.java
##########
@@ -89,22 +98,29 @@ public Lock lock(CollectionParams.CollectionAction action, List<String> path) {
       this.level = level;
     }
 
-    void markBusy(List<String> path, int depth) {
-      if (path.size() == depth) {
+    /**
+     * Marks busy the SessionNode corresponding to lockLevel (node names coming from <code>path</code>).
+     * @param path size is at least <code>lockLevel.getHeight()</code>, to capture which node should be marked busy

Review comment:
       I don't understand what this comment means.

##########
File path: solr/solrj/src/java/org/apache/solr/common/params/CollectionParams.java
##########
@@ -42,31 +42,30 @@
 
 
   enum LockLevel {
-    CLUSTER(0),
-    COLLECTION(1),
-    SHARD(2),
-    REPLICA(3),
-    NONE(10);
-
-    public final int level;
-
-    LockLevel(int i) {
-      this.level = i;
+    NONE(10, null),

Review comment:
       I don't think it makes sense to reorder them here.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org