You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/07/27 21:00:41 UTC

[GitHub] [druid] jihoonson commented on a change in pull request #11501: Allow kill task to mark segments as unused

jihoonson commented on a change in pull request #11501:
URL: https://github.com/apache/druid/pull/11501#discussion_r677779492



##########
File path: docs/ingestion/data-management.md
##########
@@ -95,18 +95,23 @@ A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/
 
 ## Kill Task
 
-Kill tasks delete all information about a segment and removes it from deep storage. Segments to kill must be unused (used==0) in the Druid segment table. The available grammar is:
+Kill tasks delete all information about a segment and removes it from deep storage. Segments to kill must be unused (used==0) in the Druid segment table.

Review comment:
       Hmm maybe
    
   ```suggestion
   The kill task deletes all information about segments and removes them from deep storage. Segments to kill must be unused (used==0) in the Druid segment table.
   ```

##########
File path: docs/ingestion/data-management.md
##########
@@ -95,18 +95,23 @@ A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/
 
 ## Kill Task
 
-Kill tasks delete all information about a segment and removes it from deep storage. Segments to kill must be unused (used==0) in the Druid segment table. The available grammar is:
+Kill tasks delete all information about a segment and removes it from deep storage. Segments to kill must be unused (used==0) in the Druid segment table.
+
+The available grammar is:
 
 ```json
 {
     "type": "kill",
     "id": <task_id>,
     "dataSource": <task_datasource>,
     "interval" : <all_segments_in_this_interval_will_die!>,
+    "markAsUnused": <true|false>,

Review comment:
       It seems worth to mention what the default is.

##########
File path: server/src/main/java/org/apache/druid/metadata/IndexerSQLMetadataStorageCoordinator.java
##########
@@ -226,6 +226,33 @@ public void start()
     return matchingSegments;
   }
 
+  @Override
+  public int markSegmentsAsUnusedWithinInterval(String dataSource, Interval interval)
+  {
+    int numSegmentsMarkedUnused = connector.retryTransaction(
+        (handle, status) -> {
+          return handle
+              .createStatement(
+                  StringUtils.format(
+                      "UPDATE %s SET used=false WHERE dataSource = :dataSource "
+                      + "AND start >= :start AND %2$send%2$s <= :end",

Review comment:
       Hmm should the end be exclusive? I see `IndexerSQLMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval()` uses the same filter of `"end" <= :end`, which seems to break the contract of `IndexerMetadataStorageCoordinator.retrieveUnusedSegmentsForInterval()` because the end time is exclusive for segment intervals. Maybe this hasn't caused much troubles so far because `retrieveUnusedSegmentsForInterval` returns only unused segments, even though it seems like a bug. But this method will unset the used flag and thus probably should respect the exclusivity of the end time?

##########
File path: docs/ingestion/data-management.md
##########
@@ -95,18 +95,23 @@ A data deletion tutorial is available at [Tutorial: Deleting data](../tutorials/
 
 ## Kill Task
 
-Kill tasks delete all information about a segment and removes it from deep storage. Segments to kill must be unused (used==0) in the Druid segment table. The available grammar is:
+Kill tasks delete all information about a segment and removes it from deep storage. Segments to kill must be unused (used==0) in the Druid segment table.
+
+The available grammar is:
 
 ```json
 {
     "type": "kill",
     "id": <task_id>,
     "dataSource": <task_datasource>,
     "interval" : <all_segments_in_this_interval_will_die!>,
+    "markAsUnused": <true|false>,
     "context": <task context>
 }
 ```
 
+If `markAsUnused` is true, the kill task will first mark any segments within the specified interval as unused, before deleting the unused segments within the interval.

Review comment:
       Perhaps we could make this more scary because it cannot be undone once they delete segments with `markAsUnused` set?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org