You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@druid.apache.org by GitBox <gi...@apache.org> on 2021/04/29 01:33:48 UTC

[GitHub] [druid] jihoonson commented on a change in pull request #11164: Add feature to automatically remove rules based on retention period

jihoonson commented on a change in pull request #11164:
URL: https://github.com/apache/druid/pull/11164#discussion_r622661391



##########
File path: server/src/main/java/org/apache/druid/metadata/MetadataRuleManager.java
##########
@@ -42,4 +42,12 @@
   List<Rule> getRulesWithDefault(String dataSource);
 
   boolean overrideRule(String dataSource, List<Rule> rulesConfig, AuditInfo auditInfo);
+
+  /**
+   * Remove rules for non-existence datasource (datasource with no segment) created older than the given timestamp.
+   *
+   * @param timestamp timestamp in milliseconds
+   * @return number of rules removed
+   */
+  int removeRulesOlderThan(long timestamp);

Review comment:
       Maybe `removeRulesForEmptyDatasourcesOlderThan` is better

##########
File path: server/src/main/java/org/apache/druid/metadata/SQLMetadataRuleManager.java
##########
@@ -421,8 +422,35 @@ public Void inTransaction(Handle handle, TransactionStatus transactionStatus) th
     return true;
   }
 
+  @Override
+  public int removeRulesOlderThan(long timestamp)
+  {
+    DateTime dateTime = DateTimes.utc(timestamp);
+    synchronized (lock) {
+      return dbi.withHandle(
+          handle -> {
+            Update sql = handle.createStatement(
+                StringUtils.format(
+                    "DELETE FROM %1$s WHERE datasource NOT IN (SELECT DISTINCT datasource from %2$s) and datasource!=:default_rule and version < :date_time",

Review comment:
       Also, this query could be expensive when the segments table is large. I think maybe it's better to implement the join in this class. For example, this method can iterate all datasources in the rules table and check whether there is any entry in the segments table for each datasource. Then, you can use a much cheaper query for the existence check such as `SELECT datasource FROM segments where datasource = ${rule_datasource} LIMIT 1`. Maybe this approach needs to add a new index on `datasource` for the segments table.

##########
File path: server/src/main/java/org/apache/druid/metadata/SQLMetadataRuleManager.java
##########
@@ -421,8 +422,35 @@ public Void inTransaction(Handle handle, TransactionStatus transactionStatus) th
     return true;
   }
 
+  @Override
+  public int removeRulesOlderThan(long timestamp)
+  {
+    DateTime dateTime = DateTimes.utc(timestamp);
+    synchronized (lock) {
+      return dbi.withHandle(
+          handle -> {
+            Update sql = handle.createStatement(
+                StringUtils.format(
+                    "DELETE FROM %1$s WHERE datasource NOT IN (SELECT DISTINCT datasource from %2$s) and datasource!=:default_rule and version < :date_time",

Review comment:
       After this change, the `version` must be the timestamp of when the rule entry was added. It means [this code](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/metadata/SQLMetadataRuleManager.java#L381) must remain the same as it is unless we change this query together. Please add comments on both codes so that we won't forget it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@druid.apache.org
For additional commands, e-mail: commits-help@druid.apache.org