You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "danny0405 (via GitHub)" <gi...@apache.org> on 2023/03/31 03:15:33 UTC

[GitHub] [hudi] danny0405 commented on a diff in pull request #8238: [HUDI-5954] Infer cleaning policy based on clean configs

danny0405 commented on code in PR #8238:
URL: https://github.com/apache/hudi/pull/8238#discussion_r1153970726


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCleanConfig.java:
##########
@@ -59,25 +63,67 @@ public class HoodieCleanConfig extends HoodieConfig {
       .withDocumentation("Only applies when " + AUTO_CLEAN.key() + " is turned on. "
           + "When turned on runs cleaner async with writing, which can speed up overall write performance.");
 
+  // The cleaner policy config definition has to be before the following configs for inference:
+  // CLEANER_COMMITS_RETAINED, CLEANER_HOURS_RETAINED, CLEANER_FILE_VERSIONS_RETAINED
+  public static final ConfigProperty<String> CLEANER_POLICY = ConfigProperty
+      .key("hoodie.cleaner.policy")
+      .defaultValue(HoodieCleaningPolicy.KEEP_LATEST_COMMITS.name())
+      .withInferFunction(cfg -> {
+        boolean isCommitsRetainedConfigured = cfg.contains(CLEANER_COMMITS_RETAINED_KEY);
+        boolean isHoursRetainedConfigured = cfg.contains(CLEANER_HOURS_RETAINED_KEY);

Review Comment:
   I'm so confused by these options, does the option `hoodie.cleaner.policy` make any sense here? If all the specific cleaning param: `hoodie.cleaner.commits.retained`, `hoodie.cleaner.hours.retained`, `hoodie.cleaner.fileversions.retained` all have detemistic policy, then this option should be eliminated.
   
   For example, can we use a combination like `HoodieCleaningPolicy.KEEP_LATEST_COMMITS` policy and `hoodie.cleaner.fileversions.retained`, if not, introduce the redundant option key `hoodie.cleaner.policy` is totally unnecessary.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org