You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/01/07 20:02:22 UTC

[GitHub] [druid] jihoonson opened a new pull request #10736: Fix potential deadlock in batch ingestion

jihoonson opened a new pull request #10736:
URL: https://github.com/apache/druid/pull/10736


   ### Description
   
   `AbstractBatchIndexTask.tryTimeChunkLock()` is called `isReady()` which is called in the Overlord to check if the task is ready for execution. Since the Overlord only checks the result of `isReady()` but does nothing with locks by itself, locking individual interval can lead to deadlock. Imagine that there are two tasks that want to work on overlapping intervals. One task could lock some of intervals but failed for others for some reason. Later, another task could lock those intervals where the first task failed to lock. Now they would wait for each other to release locks they possess. 
   
   This PR changes to lock an umbrella interval of the input intervals instead of locking them individually. This is an easy fix, but I'm not sure if it's the best. Maybe the Overlord should take care of those tasks who are not ready yet and release any locks they acquired during `isReady()`. This approach will work better when the input intervals are not consecutive (the approach locking an umbrella interval will unnecessarily lock some intervals even though the task doesn't need them). We can get back to this approach when this turns out to be an issue.
   
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on pull request #10736: Fix potential deadlock in batch ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson commented on pull request #10736:
URL: https://github.com/apache/druid/pull/10736#issuecomment-758968538


   @loquisgon @maytasm thanks for the review!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #10736: Fix potential deadlock in batch ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #10736:
URL: https://github.com/apache/druid/pull/10736#discussion_r553712831



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractBatchIndexTask.java
##########
@@ -409,13 +413,19 @@ protected boolean tryTimeChunkLock(TaskActionClient client, List<Interval> inter
       }
     }
 
-    for (Interval interval : uniqueIntervals) {
-      final TaskLock lock = client.submit(new TimeChunkLockTryAcquireAction(TaskLockType.EXCLUSIVE, interval));
-      if (lock == null) {
-        return false;
-      }
-    }
-    return true;
+    // This method is called isReady() which is called in the Overlord to check if the task is ready for execution.

Review comment:
       Thanks. Improved the comment a bit.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson merged pull request #10736: Fix potential deadlock in batch ingestion

Posted by GitBox <gi...@apache.org>.
jihoonson merged pull request #10736:
URL: https://github.com/apache/druid/pull/10736


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] loquisgon commented on a change in pull request #10736: Fix potential deadlock in batch ingestion

Posted by GitBox <gi...@apache.org>.
loquisgon commented on a change in pull request #10736:
URL: https://github.com/apache/druid/pull/10736#discussion_r553578192



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/AbstractBatchIndexTask.java
##########
@@ -409,13 +413,19 @@ protected boolean tryTimeChunkLock(TaskActionClient client, List<Interval> inter
       }
     }
 
-    for (Interval interval : uniqueIntervals) {
-      final TaskLock lock = client.submit(new TimeChunkLockTryAcquireAction(TaskLockType.EXCLUSIVE, interval));
-      if (lock == null) {
-        return false;
-      }
-    }
-    return true;
+    // This method is called isReady() which is called in the Overlord to check if the task is ready for execution.

Review comment:
       This method is called by the `isReady` method elsewhere (comment: it'd be nice to explain where the `isReady` method lives...)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org