You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/05/03 15:57:36 UTC

[GitHub] [druid] kfaraz opened a new pull request #11190: Temporarily skip compaction for locked intervals

kfaraz opened a new pull request #11190:
URL: https://github.com/apache/druid/pull/11190


   Compaction Tasks, both auto and manual, try to acquire locks on the Datasource Intervals which they need to compact. If a higher priority ingestion task is in progress for an overlapping interval, the compaction task waits until it can acquire a lock. This can lead to the following potential issues:
   - Due to poor configuration of `skipOffsetFromLatest`, compaction can get stuck for long periods of time.
   - Once the lock is released by the ingestion task, it is possible that some new segments are published (and/or removed). This invalidates the segment spec in the compaction task, causing it to fail. Such failures are unnecessary and often difficult to debug.
   
   ### Resolution
   Rather than waiting on already locked intervals, we can proceed to the compaction of other intervals (these would be older intervals, if the compaction policy is Newest First). When the locked intervals are freed up, subsequent compaction runs can submit compaction tasks for them.
   
   In every compaction run (invocation of`CompactSegments.run()`)
   - the Coordinator makes a call to the Overlord API `/lockedIntervals`
   - the Overlord returns a list of intervals locked by each currently running task, using the 
     in-memory state of the `TaskLockbox`
    - the Coordinator then skips the locked intervals while submitting compaction tasks
   
   ### Code Changes:
    * Add class `DatasourceIntervals` to encapsulate a datasource name and a List of Intervals
    * Add response class `LockedIntervalsResponse`
    * Add Overlord REST endpoint `/druid/indexer/v1/lockedIntervals`
    * Use the above API in `CompactSegments.run()` to skip locked intervals
   
   <hr>
   
   <!-- Check the items by putting "x" in the brackets for the done things. Not all of these items apply to every PR. Remove the items which are not done or not relevant to the PR. None of the items from the checklist below are strictly necessary, but it would be very helpful if you at least self-review the PR. -->
   
   This PR has:
   - [x] been self-reviewed.
      - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
rohangarg commented on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-862232381


   >Normally, the user would not have to specify any value for this config. The default value of true would suffice. The config has been provided to be able to disable the feature of skipping locked intervals in case of a bug.
   Ok but given that the feature/change-set is not too big, I'm not sure whether adding a config would help. Is it possible to know some scenarios or unknowns where this could fail and might be hard to cover them in tests? 
   
   Apologies but one more question I had was that if this flag is turned on, will it always ensure that there is no locking conflict between compaction task and other tasks? or is it still possible that compaction task might get stuck due to unavailable lock? my initial guess is that there is still a chance that compaction might get stuck, so confirming? :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626321654



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -674,6 +675,64 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
     }
   }
 
+  /**
+   * Gets a Map containing intervals locked by active tasks. Intervals locked
+   * by revoked TaskLocks are not included in the returned Map.
+   *
+   * @return Map from Task Id to locked intervals.
+   */
+  public Map<String, DatasourceIntervals> getLockedIntervals()

Review comment:
       The response would contain a List of `Interval` objects (either locked by TimeChunkLocks or SegmentLocks). So I guess `getLockedIntervals` should be fine for now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-862073584


   - Can you add a test in ITAutoCompactionLockContentionTest that verify that setting the new flag `druid.coordinator.compaction.skipLockedIntervals` to false will actually disable this new feature.
   - Actually if getLockedIntervalsToSkip does return lock of Compaction task, then you will need https://github.com/apache/druid/pull/11190#discussion_r625757970
   (as you mentioned, in CompactSegments.run(), some currently running compaction tasks can get cancelled and hence we have to return the lock)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r654860055



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -710,8 +710,8 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
                             // Do not proceed if the lock is revoked
                             return;
                           } else if (taskLockPosse.getTaskLock().getPriority() == null
-                                     || taskLockPosse.getTaskLock().getPriority() <= minTaskPriority.get(datasource)) {
-                            // Do not proceed if the lock has a priority less than or equal to the minimum
+                                     || taskLockPosse.getTaskLock().getPriority() < minTaskPriority.get(datasource)) {

Review comment:
       Why is this changed back?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r625757970



##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`
+
+Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
+Task IDs to the list of Intervals locked by the respective Tasks.

Review comment:
       See here:
   https://github.com/apache/druid/blob/5eb91aba52e22d9c72fd36f3994dc9b103ff5a69/server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java#L147




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626308179



##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`

Review comment:
       Actually, instead of exposing this API, maybe a better thing we need is how to communicate to the user that compaction skips an intervals because of some task holding lock. Like what intervals was skipped for which datasource and what was the task that holds the lock.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r625753895



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -154,6 +160,17 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
           }
         }
 
+        // Skip all the locked intervals
+        LOG.debug(
+            "Skipping the following intervals for Compaction as they are currently locked: %s",
+            taskToLockedIntervals
+        );
+        taskToLockedIntervals.forEach(
+            (taskId, datasourceIntervals) -> compactionTaskIntervals
+                .computeIfAbsent(datasourceIntervals.getDatasource(), ds -> new ArrayList<>())
+                .addAll(datasourceIntervals.getIntervals())
+        );
+
         final CompactionSegmentIterator iterator =
             policy.reset(compactionConfigs, dataSources, compactionTaskIntervals);

Review comment:
       Yeah, should we just call it `skipIntervals`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-843320044


   Fixed lockedIntervals API to use taskPriority
   Added integration tests.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg commented on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
rohangarg commented on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-860438761


   @kfaraz : Thank a lot for the PR - I have a couple of doubts regarding the new config added : 
   1. does the config `druid.coordinator.compaction.skipLockedIntervals` indicate that the segments which are locked during compaction are missed forever? or is it just for that run?
   2. what would be the user impact of this config? for example, if I'm a user when should I enable/disable this config and what will be its benefits/problems?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r625731629



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -154,6 +160,17 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
           }
         }
 
+        // Skip all the locked intervals
+        LOG.debug(
+            "Skipping the following intervals for Compaction as they are currently locked: %s",
+            taskToLockedIntervals
+        );
+        taskToLockedIntervals.forEach(
+            (taskId, datasourceIntervals) -> compactionTaskIntervals
+                .computeIfAbsent(datasourceIntervals.getDatasource(), ds -> new ArrayList<>())
+                .addAll(datasourceIntervals.getIntervals())
+        );
+
         final CompactionSegmentIterator iterator =
             policy.reset(compactionConfigs, dataSources, compactionTaskIntervals);

Review comment:
       maybe `compactionTaskIntervals` needs to be called something else now since it can also include intervals for which there is no lock. 

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -674,6 +675,64 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
     }
   }
 
+  /**
+   * Gets a Map containing intervals locked by active tasks. Intervals locked
+   * by revoked TaskLocks are not included in the returned Map.
+   *
+   * @return Map from Task Id to locked intervals.
+   */
+  public Map<String, DatasourceIntervals> getLockedIntervals()
+  {
+    final Map<String, List<Interval>> taskToIntervals = new HashMap<>();
+    final Map<String, String> taskToDatasource = new HashMap<>();
+
+    // Take a lock and populate the maps
+    giant.lock();
+    try {
+      running.forEach(
+          (datasource, datasourceLocks) -> datasourceLocks.forEach(
+              (startTime, startTimeLocks) -> startTimeLocks.forEach(
+                  (interval, taskLockPosses) -> taskLockPosses.forEach(
+                      taskLockPosse -> taskLockPosse.taskIds.forEach(taskId -> {
+                        // Do not proceed if the lock is revoked
+                        if (taskLockPosse.getTaskLock().isRevoked()) {

Review comment:
       what is the rationale behind this? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626321654



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -674,6 +675,64 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
     }
   }
 
+  /**
+   * Gets a Map containing intervals locked by active tasks. Intervals locked
+   * by revoked TaskLocks are not included in the returned Map.
+   *
+   * @return Map from Task Id to locked intervals.
+   */
+  public Map<String, DatasourceIntervals> getLockedIntervals()

Review comment:
       The response would contain a List of `Interval` objects (either locked by TimeChunkLocks or SegmentLocks). So I guess `getLockedIntervals` should be fine for now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r625735729



##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`
+
+Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
+Task IDs to the list of Intervals locked by the respective Tasks.

Review comment:
       why do we need task ids? can we just return a map of `datasource --> intervals`? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r625753895



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -154,6 +160,17 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
           }
         }
 
+        // Skip all the locked intervals
+        LOG.debug(
+            "Skipping the following intervals for Compaction as they are currently locked: %s",
+            taskToLockedIntervals
+        );
+        taskToLockedIntervals.forEach(
+            (taskId, datasourceIntervals) -> compactionTaskIntervals
+                .computeIfAbsent(datasourceIntervals.getDatasource(), ds -> new ArrayList<>())
+                .addAll(datasourceIntervals.getIntervals())
+        );
+
         final CompactionSegmentIterator iterator =
             policy.reset(compactionConfigs, dataSources, compactionTaskIntervals);

Review comment:
       Yeah, should we just call it `compactionSkipIntervals`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626253490



##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`
+
+Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
+Task IDs to the list of Intervals locked by the respective Tasks.

Review comment:
       Good point, @maytasm . I will make the changes.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626290806



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -674,6 +675,64 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
     }
   }
 
+  /**
+   * Gets a Map containing intervals locked by active tasks. Intervals locked
+   * by revoked TaskLocks are not included in the returned Map.
+   *
+   * @return Map from Task Id to locked intervals.
+   */
+  public Map<String, DatasourceIntervals> getLockedIntervals()

Review comment:
       `getLockedInterval()` seems a misnomer because this method returns the intervals of segment locks as well, but they don't lock intervals. I don't have a better suggestion though..

##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`

Review comment:
       Is this supposed to be called by users? What is the use case?

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -154,6 +160,17 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
           }
         }
 
+        // Skip all the locked intervals
+        LOG.debug(
+            "Skipping the following intervals for Compaction as they are currently locked: %s",
+            taskToLockedIntervals
+        );
+        taskToLockedIntervals.forEach(

Review comment:
       I think it should behave differently depending on what `lockGranularity` is used. If both the compaction task to run and the task that is already running use the segment lock, the compaction task can safely run. Otherwise, the entire locked interval should be skipped as what this code does. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r625753535



##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`
+
+Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
+Task IDs to the list of Intervals locked by the respective Tasks.

Review comment:
       In `CompactSegments.run()`, some currently running compaction tasks can get cancelled if their spec is out of date. We use the taskId to identify the intervals for these tasks getting cancelled so that we don't consider those intervals as locked.
   
   This is the only case where taskId is useful. Otherwise, datasource would be sufficient.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r634527922



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -154,6 +160,17 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
           }
         }
 
+        // Skip all the locked intervals
+        LOG.debug(
+            "Skipping the following intervals for Compaction as they are currently locked: %s",
+            taskToLockedIntervals
+        );
+        taskToLockedIntervals.forEach(

Review comment:
       For simplicity, we are treating Segment Locks the same as Time Chunk Locks i.e. the whole interval would be skipped while submitting compaction tasks even if there is just one Segment in that interval that is locked by a higher priority task.
   
   Added this as a javadoc comment here:
   https://github.com/apache/druid/blob/a8e30c3c8c9c2b58b0615f3f50d863c334cadd75/server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java#L233
   
   https://github.com/apache/druid/blob/a8e30c3c8c9c2b58b0615f3f50d863c334cadd75/indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java#L690




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r645129523



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskStorageQueryAdapter.java
##########
@@ -62,9 +62,9 @@ public TaskStorageQueryAdapter(TaskStorage storage, TaskLockbox taskLockbox)
    *
    * @return Map from Task Id to locked intervals.

Review comment:
       Is this Map from datasource to locked intervals?

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -115,11 +114,17 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
             .stream()
             .collect(Collectors.toMap(DataSourceCompactionConfig::getDataSource, Function.identity()));
         final List<TaskStatusPlus> compactionTasks = filterNonCompactionTasks(indexingServiceClient.getActiveTasks());
-        // dataSource -> list of intervals of compaction tasks
-        final Map<String, List<Interval>> compactionTaskIntervals = Maps.newHashMapWithExpectedSize(
+        // dataSource -> list of intervals for which compaction will be skipped in this run
+        final Map<String, List<Interval>> intervalsToSkipCompaction = Maps.newHashMapWithExpectedSize(
             compactionConfigList.size());
-        final Map<String, DatasourceIntervals> taskToLockedIntervals = new HashMap<>(
-            indexingServiceClient.getLockedIntervals());
+
+        // Skip all the intervals locked by higher priority tasks for each datasource
+        getLockedIntervalsToSkip(compactionConfigList).forEach(

Review comment:
       Is this needed? Doesn't getLockedIntervalsToSkip already return dataSource -> list of intervals

##########
File path: indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskLockboxTest.java
##########
@@ -1149,29 +1149,24 @@ public void testGetLockedIntervals()
     );
 
     // Verify the locked intervals

Review comment:
       Can you add test where the existing taskLock priority is lower than the minTaskPriority argument (and hence the lock should not be returned)? 

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -676,61 +675,63 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
   }
 
   /**
-   * Gets a Map containing intervals locked by active tasks. Intervals locked
-   * by revoked TaskLocks are not included in the returned Map.
+   * Gets a List of Intervals locked by higher priority tasks for each datasource.
+   * Here, Segment Locks are being treated the same as Time Chunk Locks i.e.
+   * a Task with a Segment Lock is assumed to lock a whole Interval and not just
+   * the corresponding Segment.
    *
-   * @return Map from Task Id to locked intervals.
+   * @param minTaskPriority Minimum task priority for each datasource. Only the
+   *                        Intervals that are locked by Tasks higher than this
+   *                        priority are returned. Tasks for datasources that
+   *                        are not present in this Map are not returned.
+   * @return Map from Datasource to List of Intervals locked by Tasks that have
+   * priority strictly greater than the {@code minTaskPriority} for that datasource.

Review comment:
       should this be > or >= the {@code minTaskPriority}?
   Do we want to submit auto compaction task if there is a task with equal task priority already running?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-854446880


   Thank you for the PR, @kfaraz. will there be a way to disable this code-path in a running druid cluster (either dynamically or by restarting services)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r645129523



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskStorageQueryAdapter.java
##########
@@ -62,9 +62,9 @@ public TaskStorageQueryAdapter(TaskStorage storage, TaskLockbox taskLockbox)
    *
    * @return Map from Task Id to locked intervals.

Review comment:
       Is this Map from datasource to locked intervals?

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -115,11 +114,17 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
             .stream()
             .collect(Collectors.toMap(DataSourceCompactionConfig::getDataSource, Function.identity()));
         final List<TaskStatusPlus> compactionTasks = filterNonCompactionTasks(indexingServiceClient.getActiveTasks());
-        // dataSource -> list of intervals of compaction tasks
-        final Map<String, List<Interval>> compactionTaskIntervals = Maps.newHashMapWithExpectedSize(
+        // dataSource -> list of intervals for which compaction will be skipped in this run
+        final Map<String, List<Interval>> intervalsToSkipCompaction = Maps.newHashMapWithExpectedSize(
             compactionConfigList.size());
-        final Map<String, DatasourceIntervals> taskToLockedIntervals = new HashMap<>(
-            indexingServiceClient.getLockedIntervals());
+
+        // Skip all the intervals locked by higher priority tasks for each datasource
+        getLockedIntervalsToSkip(compactionConfigList).forEach(

Review comment:
       Is this needed? Doesn't getLockedIntervalsToSkip already return dataSource -> list of intervals

##########
File path: indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskLockboxTest.java
##########
@@ -1149,29 +1149,24 @@ public void testGetLockedIntervals()
     );
 
     // Verify the locked intervals

Review comment:
       Can you add test where the existing taskLock priority is lower than the minTaskPriority argument (and hence the lock should not be returned)? 

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -676,61 +675,63 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
   }
 
   /**
-   * Gets a Map containing intervals locked by active tasks. Intervals locked
-   * by revoked TaskLocks are not included in the returned Map.
+   * Gets a List of Intervals locked by higher priority tasks for each datasource.
+   * Here, Segment Locks are being treated the same as Time Chunk Locks i.e.
+   * a Task with a Segment Lock is assumed to lock a whole Interval and not just
+   * the corresponding Segment.
    *
-   * @return Map from Task Id to locked intervals.
+   * @param minTaskPriority Minimum task priority for each datasource. Only the
+   *                        Intervals that are locked by Tasks higher than this
+   *                        priority are returned. Tasks for datasources that
+   *                        are not present in this Map are not returned.
+   * @return Map from Datasource to List of Intervals locked by Tasks that have
+   * priority strictly greater than the {@code minTaskPriority} for that datasource.

Review comment:
       should this be > or >= the {@code minTaskPriority}?
   Do we want to submit auto compaction task if there is a task with equal task priority already running?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626342828



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
##########
@@ -227,6 +227,34 @@ public Response isLeader()
     }
   }
 
+  @GET
+  @Path("/lockedIntervals")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(
+        new Resource("lockedIntervals", ResourceType.STATE),
+        Action.READ
+    );
+    final Access authResult = AuthorizationUtils
+        .authorizeResourceAction(request, resourceAction, authorizerMapper);
+    if (!authResult.isAllowed()) {
+      throw new WebApplicationException(
+          Response.status(Response.Status.FORBIDDEN)
+                  .entity(StringUtils.format("Access-Check-Result: %s", authResult.toString()))
+                  .build()
+      );
+    }
+
+    // Build the response
+    final LockedIntervalsResponse response = new LockedIntervalsResponse(
+        taskStorageQueryAdapter.getLockedIntervals()
+    );
+    log.warn("Found Intervals: %s", response.getLockedIntervals());

Review comment:
       Removed.

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
##########
@@ -227,6 +227,34 @@ public Response isLeader()
     }
   }
 
+  @GET
+  @Path("/lockedIntervals")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(
+        new Resource("lockedIntervals", ResourceType.STATE),
+        Action.READ
+    );
+    final Access authResult = AuthorizationUtils
+        .authorizeResourceAction(request, resourceAction, authorizerMapper);
+    if (!authResult.isAllowed()) {
+      throw new WebApplicationException(
+          Response.status(Response.Status.FORBIDDEN)
+                  .entity(StringUtils.format("Access-Check-Result: %s", authResult.toString()))
+                  .build()
+      );
+    }
+
+    // Build the response
+    final LockedIntervalsResponse response = new LockedIntervalsResponse(

Review comment:
       Fixed.

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -139,6 +142,9 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
                          compactionTaskQuery.getGranularitySpec().getSegmentGranularity(),
                          configuredSegmentGranularity);
                 indexingServiceClient.cancelTask(status.getId());
+
+                // Remove this from the locked intervals
+                taskToLockedIntervals.remove(status.getId());

Review comment:
       Removed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r634529620



##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`

Review comment:
       For now, we are just logging the intervals that were skipped due to locks.
   The part about notifying the user about skipped intervals will most likely be done through Task Reports in a follow up PR.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] abhishekagarwal87 commented on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
abhishekagarwal87 commented on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-854446880


   Thank you for the PR, @kfaraz. will there be a way to disable this code-path in a running druid cluster (either dynamically or by restarting services)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm merged pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm merged pull request #11190:
URL: https://github.com/apache/druid/pull/11190


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-860494765


   >     1. does the config `druid.coordinator.compaction.skipLockedIntervals` indicate that the segments which are locked during compaction are missed forever? or is it just for that run?
   
   @rohangarg , the locked intervals would be skipped only for that run, they would be retried in the next run
    
   >     2. what would be the user impact of this config? for example, if I'm a user when should I enable/disable this config and what will be its benefits/problems?
   
   Normally, the user would not have to specify any value for this config. The default value of true would suffice. The config has been provided to be able to disable the feature of skipping locked intervals in case of a bug.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r634530243



##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`

Review comment:
       > Is this supposed to be called by users? What is the use case?
   Removed changes to `api-reference.md` as this API is for internal use (between Coordinator and Overlord) only.

##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`

Review comment:
       > Is this supposed to be called by users? What is the use case?
   
   Removed changes to `api-reference.md` as this API is for internal use (between Coordinator and Overlord) only.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r654987383



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -710,8 +710,8 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
                             // Do not proceed if the lock is revoked
                             return;
                           } else if (taskLockPosse.getTaskLock().getPriority() == null
-                                     || taskLockPosse.getTaskLock().getPriority() <= minTaskPriority.get(datasource)) {
-                            // Do not proceed if the lock has a priority less than or equal to the minimum
+                                     || taskLockPosse.getTaskLock().getPriority() < minTaskPriority.get(datasource)) {

Review comment:
       LGTM




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r625755792



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -674,6 +675,64 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
     }
   }
 
+  /**
+   * Gets a Map containing intervals locked by active tasks. Intervals locked
+   * by revoked TaskLocks are not included in the returned Map.
+   *
+   * @return Map from Task Id to locked intervals.
+   */
+  public Map<String, DatasourceIntervals> getLockedIntervals()
+  {
+    final Map<String, List<Interval>> taskToIntervals = new HashMap<>();
+    final Map<String, String> taskToDatasource = new HashMap<>();
+
+    // Take a lock and populate the maps
+    giant.lock();
+    try {
+      running.forEach(
+          (datasource, datasourceLocks) -> datasourceLocks.forEach(
+              (startTime, startTimeLocks) -> startTimeLocks.forEach(
+                  (interval, taskLockPosses) -> taskLockPosses.forEach(
+                      taskLockPosse -> taskLockPosse.taskIds.forEach(taskId -> {
+                        // Do not proceed if the lock is revoked
+                        if (taskLockPosse.getTaskLock().isRevoked()) {

Review comment:
       A lock that is revoked is effectively not locking any interval. So we don't need to consider revoked locks while looking for locked intervals.
   
   Revoked locks are kept in the TaskLockbox to notify that those locks are revoked to the callers when they acquire the same locks again.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r654920862



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -710,8 +710,8 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
                             // Do not proceed if the lock is revoked
                             return;
                           } else if (taskLockPosse.getTaskLock().getPriority() == null
-                                     || taskLockPosse.getTaskLock().getPriority() <= minTaskPriority.get(datasource)) {
-                            // Do not proceed if the lock has a priority less than or equal to the minimum
+                                     || taskLockPosse.getTaskLock().getPriority() < minTaskPriority.get(datasource)) {

Review comment:
       I had missed changing it the first time. 
   
   The behaviour after this change is that intervals of tasks with equal priority will be considered locked. That is the required behaviour, right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] rohangarg edited a comment on pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
rohangarg edited a comment on pull request #11190:
URL: https://github.com/apache/druid/pull/11190#issuecomment-862232381


   >Normally, the user would not have to specify any value for this config. The default value of true would suffice. The config has been provided to be able to disable the feature of skipping locked intervals in case of a bug.
   
   Ok but given that the feature/change-set is not too big, I'm not sure whether adding a config would help. Is it possible to know some scenarios or unknowns where this could fail and might be hard to cover them in tests? 
   
   Apologies but one more question I had was that if this flag is turned on, will it always ensure that there is no locking conflict between compaction task and other tasks? or is it still possible that compaction task might get stuck due to unavailable lock? my initial guess is that there is still a chance that compaction might get stuck, so confirming? :) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626342667



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
##########
@@ -227,6 +227,34 @@ public Response isLeader()
     }
   }
 
+  @GET
+  @Path("/lockedIntervals")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(

Review comment:
       Fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on a change in pull request #11190: Temporarily skip compaction for locked intervals

Posted by GitBox <gi...@apache.org>.
maytasm commented on a change in pull request #11190:
URL: https://github.com/apache/druid/pull/11190#discussion_r626181627



##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskLockbox.java
##########
@@ -674,6 +675,64 @@ public TaskLock apply(TaskLockPosse taskLockPosse)
     }
   }
 
+  /**
+   * Gets a Map containing intervals locked by active tasks. Intervals locked
+   * by revoked TaskLocks are not included in the returned Map.
+   *
+   * @return Map from Task Id to locked intervals.
+   */
+  public Map<String, DatasourceIntervals> getLockedIntervals()
+  {
+    final Map<String, List<Interval>> taskToIntervals = new HashMap<>();
+    final Map<String, String> taskToDatasource = new HashMap<>();
+
+    // Take a lock and populate the maps
+    giant.lock();

Review comment:
       why do we need the lock here?

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
##########
@@ -227,6 +227,34 @@ public Response isLeader()
     }
   }
 
+  @GET
+  @Path("/lockedIntervals")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(

Review comment:
       Any particular this API needs a finer grained access control rather than using the StateResourceFilter ?
   The StateResourceFilter is already used for API like /taskStatus which I think is similar in access control to this API. 

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
##########
@@ -227,6 +227,34 @@ public Response isLeader()
     }
   }
 
+  @GET
+  @Path("/lockedIntervals")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(
+        new Resource("lockedIntervals", ResourceType.STATE),
+        Action.READ
+    );
+    final Access authResult = AuthorizationUtils
+        .authorizeResourceAction(request, resourceAction, authorizerMapper);
+    if (!authResult.isAllowed()) {
+      throw new WebApplicationException(
+          Response.status(Response.Status.FORBIDDEN)
+                  .entity(StringUtils.format("Access-Check-Result: %s", authResult.toString()))
+                  .build()
+      );
+    }
+
+    // Build the response
+    final LockedIntervalsResponse response = new LockedIntervalsResponse(
+        taskStorageQueryAdapter.getLockedIntervals()
+    );
+    log.warn("Found Intervals: %s", response.getLockedIntervals());

Review comment:
       Why is this a WARN log?

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
##########
@@ -227,6 +227,34 @@ public Response isLeader()
     }
   }
 
+  @GET
+  @Path("/lockedIntervals")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(
+        new Resource("lockedIntervals", ResourceType.STATE),
+        Action.READ
+    );
+    final Access authResult = AuthorizationUtils
+        .authorizeResourceAction(request, resourceAction, authorizerMapper);
+    if (!authResult.isAllowed()) {
+      throw new WebApplicationException(
+          Response.status(Response.Status.FORBIDDEN)
+                  .entity(StringUtils.format("Access-Check-Result: %s", authResult.toString()))
+                  .build()
+      );
+    }
+
+    // Build the response
+    final LockedIntervalsResponse response = new LockedIntervalsResponse(

Review comment:
       Can the API just return Map<String, DatasourceIntervals> (removing the need for another class)?

##########
File path: indexing-service/src/main/java/org/apache/druid/indexing/overlord/http/OverlordResource.java
##########
@@ -227,6 +227,34 @@ public Response isLeader()
     }
   }
 
+  @GET
+  @Path("/lockedIntervals")
+  @Produces(MediaType.APPLICATION_JSON)
+  public Response getTaskLockedIntervals(@Context HttpServletRequest request)
+  {
+    // Perform authorization check
+    final ResourceAction resourceAction = new ResourceAction(
+        new Resource("lockedIntervals", ResourceType.STATE),
+        Action.READ
+    );
+    final Access authResult = AuthorizationUtils
+        .authorizeResourceAction(request, resourceAction, authorizerMapper);
+    if (!authResult.isAllowed()) {
+      throw new WebApplicationException(
+          Response.status(Response.Status.FORBIDDEN)
+                  .entity(StringUtils.format("Access-Check-Result: %s", authResult.toString()))
+                  .build()
+      );
+    }
+
+    // Build the response
+    final LockedIntervalsResponse response = new LockedIntervalsResponse(

Review comment:
       Response.ok(taskStorageQueryAdapter.getLockedIntervals()).build()

##########
File path: docs/operations/api-reference.md
##########
@@ -614,6 +614,11 @@ Retrieve information about the segments of a task.
 
 Retrieve a [task completion report](../ingestion/tasks.md#task-reports) for a task. Only works for completed tasks.
 
+* `/druid/indexer/v1/lockedIntervals`
+
+Retrieve the list of Intervals locked by currently running ingestion/compaction tasks. The response contains a Map from
+Task IDs to the list of Intervals locked by the respective Tasks.

Review comment:
       Do we need to return interval locked by compaction tasks?
   If the existing compaction task are out of date, we already have code to cancelled and submit for that interval (hence the lock those to-be-cancel compaction task does not matter). If they are not canceled, then there is already code to skip the interval of running compaction task. 
   
   In the above case, we can just return a datasource --> intervals of locked by non compaction tasks

##########
File path: server/src/main/java/org/apache/druid/server/coordinator/duty/CompactSegments.java
##########
@@ -139,6 +142,9 @@ public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params)
                          compactionTaskQuery.getGranularitySpec().getSegmentGranularity(),
                          configuredSegmentGranularity);
                 indexingServiceClient.cancelTask(status.getId());
+
+                // Remove this from the locked intervals
+                taskToLockedIntervals.remove(status.getId());

Review comment:
       As mentioned earlier, this is not really needed. This would be the same as taskToLockedIntervals not containing lock for compaction task in the first place. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org