You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2022/01/21 18:19:05 UTC

[GitHub] [druid] suneet-s opened a new pull request #12187: Enable auto kill segments by default

suneet-s opened a new pull request #12187:
URL: https://github.com/apache/druid/pull/12187


   ### Description
   
   To help prevent the metadata store from growing unbounded, enable kill unused segments by default on all datasources.
   
   This is helpful when users have auto-compaction running or they age out data. The old segments are marked as unused but live on in the metadata store and in deep storage.
   
   The new defaults should kill all unused segments older than 90 days. If users do not want this behavior on an upgrade, they should explicitly disable the behavior, or change the specific datasources they would want to run on.
   <hr>
   
   This PR has:
   - [x] been self-reviewed.
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #12187:
URL: https://github.com/apache/druid/pull/12187#discussion_r791841648



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
##########
@@ -315,7 +318,9 @@ public int getBalancerComputeThreads()
   @JsonProperty("killAllDataSources")
   public boolean isKillUnusedSegmentsInAllDataSources()
   {
-    return killUnusedSegmentsInAllDataSources;
+    return killUnusedSegmentsInAllDataSources != null
+           ? killUnusedSegmentsInAllDataSources
+           : specificDataSourcesToKillUnusedSegmentsIn.isEmpty();

Review comment:
       Great suggestion. This should simplify the user experience. I will do this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] maytasm commented on pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
maytasm commented on pull request #12187:
URL: https://github.com/apache/druid/pull/12187#issuecomment-1023630511


   Can you set druid_coordinator_kill_period=PT360000S in integration-tests/docker/environment-configs/coordinator too? Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #12187:
URL: https://github.com/apache/druid/pull/12187#discussion_r791841648



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
##########
@@ -315,7 +318,9 @@ public int getBalancerComputeThreads()
   @JsonProperty("killAllDataSources")
   public boolean isKillUnusedSegmentsInAllDataSources()
   {
-    return killUnusedSegmentsInAllDataSources;
+    return killUnusedSegmentsInAllDataSources != null
+           ? killUnusedSegmentsInAllDataSources
+           : specificDataSourcesToKillUnusedSegmentsIn.isEmpty();

Review comment:
       Great suggestion. This should simplify the user experience. I will do this




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12187:
URL: https://github.com/apache/druid/pull/12187#discussion_r791208409



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
##########
@@ -65,7 +65,8 @@
   /**
    * If true {@link KillUnusedSegments} sends kill tasks for unused segments in all data sources.
    */
-  private final boolean killUnusedSegmentsInAllDataSources;
+  @Nullable
+  private final Boolean killUnusedSegmentsInAllDataSources;

Review comment:
       nit: can we compute its default per the given `killDataSourceWhitelist` instead of making it nullable? It could simplify the code a bit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] kfaraz commented on a change in pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
kfaraz commented on a change in pull request #12187:
URL: https://github.com/apache/druid/pull/12187#discussion_r790162086



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
##########
@@ -315,7 +318,9 @@ public int getBalancerComputeThreads()
   @JsonProperty("killAllDataSources")
   public boolean isKillUnusedSegmentsInAllDataSources()
   {
-    return killUnusedSegmentsInAllDataSources;
+    return killUnusedSegmentsInAllDataSources != null
+           ? killUnusedSegmentsInAllDataSources
+           : specificDataSourcesToKillUnusedSegmentsIn.isEmpty();

Review comment:
       Can this be simplified as?
   
   ```suggestion
       return specificDataSourcesToKillUnusedSegmentsIn.isEmpty();
   ```
   
   The method returns true when
   - either `killUnusedSegmentsInAllDataSources = true` which implies that `specifiedDataSourcesToKillUnusedSegmentsIn` is empty, otherwise we would have thrown an exception
   - or `killUnusedSegmentsInAllDataSources = null` and `specifiedDataSourcesToKillUnusedSegmentsIn` is empty
   
   With these new defaults, maybe we don't even need `killAll` anymore?
   If `killList` is empty, we kill all
   Otherwise, we kill only the ones specified.
   Unless I am missing a case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on a change in pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
suneet-s commented on a change in pull request #12187:
URL: https://github.com/apache/druid/pull/12187#discussion_r797942121



##########
File path: docs/configuration/index.md
##########
@@ -902,7 +902,7 @@ Issuing a GET request at the same URL will return the spec that is currently in
 |`balancerComputeThreads`|Thread pool size for computing moving cost of segments in segment balancing. Consider increasing this if you have a lot of segments and moving segments starts to get stuck.|1|
 |`emitBalancingStats`|Boolean flag for whether or not we should emit balancing stats. This is an expensive operation.|false|
 |`killDataSourceWhitelist`|List of specific data sources for which kill tasks are sent if property `druid.coordinator.kill.on` is true. This can be a list of comma-separated data source names or a JSON array.|none|
-|`killAllDataSources`|Send kill tasks for ALL dataSources if property `druid.coordinator.kill.on` is true. If this is set to true then `killDataSourceWhitelist` must not be specified or be empty list.|false|
+|`killAllDataSources`|Send kill tasks for ALL dataSources if property `druid.coordinator.kill.on` is true. If this is set to true then `killDataSourceWhitelist` must not be specified or be empty list.|true if `killDataSourceWhitelist` is not set or empty|

Review comment:
       ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] jihoonson commented on a change in pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
jihoonson commented on a change in pull request #12187:
URL: https://github.com/apache/druid/pull/12187#discussion_r791208409



##########
File path: server/src/main/java/org/apache/druid/server/coordinator/CoordinatorDynamicConfig.java
##########
@@ -65,7 +65,8 @@
   /**
    * If true {@link KillUnusedSegments} sends kill tasks for unused segments in all data sources.
    */
-  private final boolean killUnusedSegmentsInAllDataSources;
+  @Nullable
+  private final Boolean killUnusedSegmentsInAllDataSources;

Review comment:
       nit: can we compute its default per the given `killDataSourceWhitelist` instead of making it nullable? It could simplify the code a bit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s merged pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
suneet-s merged pull request #12187:
URL: https://github.com/apache/druid/pull/12187


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org


[GitHub] [druid] suneet-s commented on pull request #12187: Enable auto kill segments by default

Posted by GitBox <gi...@apache.org>.
suneet-s commented on pull request #12187:
URL: https://github.com/apache/druid/pull/12187#issuecomment-1030460563


   @kfaraz I should have removed all references to `killUnusedSegmentsInAllDataSources` in the docs and the code. It is now implicit. Can you take another look when you get a chance please?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org