You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/14 09:42:21 UTC

[GitHub] [hudi] scxwhite opened a new pull request, #6670: [HUDI-4842] Support compression strategy based on delte file length

scxwhite opened a new pull request, #6670:
URL: https://github.com/apache/hudi/pull/6670

   ### Change Logs
   
   When reading Hudi data, the number of small files greatly affects the reading performance. When compressing logs, we can provide a strategy to compress more delte log files first
   
   
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] scxwhite commented on pull request #6670: [HUDI-4842] Support compaction strategy based on delta file num

Posted by GitBox <gi...@apache.org>.
scxwhite commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1250039272

   > @scxwhite Could you also correct the PR title by using compaction instead of compression?
   
   It's all done. Thank you for your code review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] scxwhite commented on a diff in pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
scxwhite commented on code in PR #6670:
URL: https://github.com/apache/hudi/pull/6670#discussion_r972559146


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -106,6 +106,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
       .withDocumentation("Only if the log file size is greater than the threshold in bytes,"
           + " the file group will be compacted.");
 
+  public static final ConfigProperty<Long> COMPACTION_LOG_FILE_LENGTH_THRESHOLD = ConfigProperty
+      .key("hoodie.compaction.logfile.length.threshold")

Review Comment:
   > `hoodie.compaction.logfile.length.threshold` -> `hoodie.compaction.logfile.num.threshold` `length` can be confused with the size of the log file.
   
   Great.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1247233965

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 462f77736f855dc277cc62e0778fb4c1fa04f09a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1248889062

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 462f77736f855dc277cc62e0778fb4c1fa04f09a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357) 
   * 9d0a67996db91ad90808ddb83aa281ad04e91a76 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compaction strategy based on delta file num

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1250045138

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409",
       "triggerID" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13d66a602715f9ecaca1f23d9474efcfc79512d8",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11456",
       "triggerID" : "13d66a602715f9ecaca1f23d9474efcfc79512d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d0a67996db91ad90808ddb83aa281ad04e91a76 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409) 
   * 13d66a602715f9ecaca1f23d9474efcfc79512d8 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11456) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] scxwhite commented on a diff in pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
scxwhite commented on code in PR #6670:
URL: https://github.com/apache/hudi/pull/6670#discussion_r973539801


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -106,6 +106,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
       .withDocumentation("Only if the log file size is greater than the threshold in bytes,"
           + " the file group will be compacted.");
 
+  public static final ConfigProperty<Long> COMPACTION_LOG_FILE_LENGTH_THRESHOLD = ConfigProperty
+      .key("hoodie.compaction.logfile.length.threshold")
+      .defaultValue(0L)

Review Comment:
   > Got it. That makes sense. Then do we even need this config? The new compaction strategy prioritizes the compaction of file groups with more log files, and should include all file groups for compaction nevertheless.
   
   The default value (0) can meet most of the user's business scenarios. However, for some deep users, they may feel that if the number of small files in delta log (such as three) is within the acceptable range of reading performance, it is unnecessary to trigger the compression operation that has a great impact on throughput performance. Therefore, I think it is necessary to retain this configuration.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6670:
URL: https://github.com/apache/hudi/pull/6670#discussion_r973534451


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -106,6 +106,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
       .withDocumentation("Only if the log file size is greater than the threshold in bytes,"
           + " the file group will be compacted.");
 
+  public static final ConfigProperty<Long> COMPACTION_LOG_FILE_LENGTH_THRESHOLD = ConfigProperty
+      .key("hoodie.compaction.logfile.length.threshold")
+      .defaultValue(0L)

Review Comment:
   Got it.  That makes sense.  Then do we even need this config?  The new compaction strategy prioritizes the compaction of file groups with more log files, and should include all file groups for compaction nevertheless.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6670:
URL: https://github.com/apache/hudi/pull/6670#discussion_r972540738


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -106,6 +106,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
       .withDocumentation("Only if the log file size is greater than the threshold in bytes,"
           + " the file group will be compacted.");
 
+  public static final ConfigProperty<Long> COMPACTION_LOG_FILE_LENGTH_THRESHOLD = ConfigProperty
+      .key("hoodie.compaction.logfile.length.threshold")
+      .defaultValue(0L)

Review Comment:
   Should this be set to a reasonable value like `5` for example?  Otherwise, it falls back to the behavior where all file groups are compacted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] scxwhite commented on a diff in pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
scxwhite commented on code in PR #6670:
URL: https://github.com/apache/hudi/pull/6670#discussion_r972557765


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -106,6 +106,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
       .withDocumentation("Only if the log file size is greater than the threshold in bytes,"
           + " the file group will be compacted.");
 
+  public static final ConfigProperty<Long> COMPACTION_LOG_FILE_LENGTH_THRESHOLD = ConfigProperty
+      .key("hoodie.compaction.logfile.length.threshold")
+      .defaultValue(0L)

Review Comment:
   > Should this be set to a reasonable value like `5` for example? Otherwise, it falls back to the behavior where all file groups are compacted.
   
   Thank you for your reply. I'm sorry, but I don't think we should adjust the default value to 5 or other values. The current default compression policy is based on the file size (LogFileSizeBasedCompactStrategy). If users want to adjust to the policy based on the number of files, and the default value is non-zero, they will find that there is no compression event triggered for a period of time, which will confuse them. So I think this value should be configured by the user, don't you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on pull request #6670: [HUDI-4842] Support compression strategy based on delta file num

Posted by GitBox <gi...@apache.org>.
yihua commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1250027858

   @scxwhite Could you also correct the PR title by using compaction instead of compression?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compaction strategy based on delta file num

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1250044386

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409",
       "triggerID" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "triggerType" : "PUSH"
     }, {
       "hash" : "13d66a602715f9ecaca1f23d9474efcfc79512d8",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "13d66a602715f9ecaca1f23d9474efcfc79512d8",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d0a67996db91ad90808ddb83aa281ad04e91a76 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409) 
   * 13d66a602715f9ecaca1f23d9474efcfc79512d8 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1246542811

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 462f77736f855dc277cc62e0778fb4c1fa04f09a UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6670:
URL: https://github.com/apache/hudi/pull/6670#discussion_r972532363


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -241,41 +255,61 @@ public class HoodieCompactionConfig extends HoodieConfig {
    */
   @Deprecated
   public static final String COMPACTION_STRATEGY_PROP = COMPACTION_STRATEGY.key();
-  /** @deprecated Use {@link #COMPACTION_STRATEGY} and its methods instead */
+  /**
+   * @deprecated Use {@link #COMPACTION_STRATEGY} and its methods instead
+   */
   @Deprecated
   public static final String DEFAULT_COMPACTION_STRATEGY = COMPACTION_STRATEGY.defaultValue();
-  /** @deprecated Use {@link #COMPACTION_LAZY_BLOCK_READ_ENABLE} and its methods instead */
+  /**
+   * @deprecated Use {@link #COMPACTION_LAZY_BLOCK_READ_ENABLE} and its methods instead
+   */
   @Deprecated
   public static final String COMPACTION_LAZY_BLOCK_READ_ENABLED_PROP = COMPACTION_LAZY_BLOCK_READ_ENABLE.key();
-  /** @deprecated Use {@link #COMPACTION_LAZY_BLOCK_READ_ENABLE} and its methods instead */
+  /**
+   * @deprecated Use {@link #COMPACTION_LAZY_BLOCK_READ_ENABLE} and its methods instead
+   */
   @Deprecated
   public static final String DEFAULT_COMPACTION_LAZY_BLOCK_READ_ENABLED = COMPACTION_REVERSE_LOG_READ_ENABLE.defaultValue();
-  /** @deprecated Use {@link #COMPACTION_REVERSE_LOG_READ_ENABLE} and its methods instead */
+  /**
+   * @deprecated Use {@link #COMPACTION_REVERSE_LOG_READ_ENABLE} and its methods instead
+   */
   @Deprecated
   public static final String COMPACTION_REVERSE_LOG_READ_ENABLED_PROP = COMPACTION_REVERSE_LOG_READ_ENABLE.key();
-  /** @deprecated Use {@link #COMPACTION_REVERSE_LOG_READ_ENABLE} and its methods instead */
+  /**
+   * @deprecated Use {@link #COMPACTION_REVERSE_LOG_READ_ENABLE} and its methods instead
+   */
   @Deprecated
   public static final String DEFAULT_COMPACTION_REVERSE_LOG_READ_ENABLED = COMPACTION_REVERSE_LOG_READ_ENABLE.defaultValue();
+  /**
+   * @deprecated Use {@link #TARGET_PARTITIONS_PER_DAYBASED_COMPACTION} and its methods instead
+   */
+  @Deprecated
+  public static final String TARGET_PARTITIONS_PER_DAYBASED_COMPACTION_PROP = TARGET_PARTITIONS_PER_DAYBASED_COMPACTION.key();
+  /**
+   * @deprecated Use {@link #TARGET_PARTITIONS_PER_DAYBASED_COMPACTION} and its methods instead
+   */
+  @Deprecated
+  public static final String DEFAULT_TARGET_PARTITIONS_PER_DAYBASED_COMPACTION = TARGET_PARTITIONS_PER_DAYBASED_COMPACTION.defaultValue();
   /**
    * @deprecated Use {@link #INLINE_COMPACT} and its methods instead
    */
   @Deprecated
   private static final String DEFAULT_INLINE_COMPACT = INLINE_COMPACT.defaultValue();
-  /** @deprecated Use {@link #INLINE_COMPACT_NUM_DELTA_COMMITS} and its methods instead */
+  /**
+   * @deprecated Use {@link #INLINE_COMPACT_NUM_DELTA_COMMITS} and its methods instead
+   */
   @Deprecated
   private static final String DEFAULT_INLINE_COMPACT_NUM_DELTA_COMMITS = INLINE_COMPACT_NUM_DELTA_COMMITS.defaultValue();
-  /** @deprecated Use {@link #INLINE_COMPACT_TIME_DELTA_SECONDS} and its methods instead */
+  /**
+   * @deprecated Use {@link #INLINE_COMPACT_TIME_DELTA_SECONDS} and its methods instead
+   */
   @Deprecated
   private static final String DEFAULT_INLINE_COMPACT_TIME_DELTA_SECONDS = INLINE_COMPACT_TIME_DELTA_SECONDS.defaultValue();
-  /** @deprecated Use {@link #INLINE_COMPACT_TRIGGER_STRATEGY} and its methods instead */
+  /**

Review Comment:
   Could you avoid code style changes to reduce review overhead?  It's OK to have a separate PR to reformat the code.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -381,6 +415,11 @@ public Builder withLogFileSizeThresholdBasedCompaction(long logFileSizeThreshold
       return this;
     }
 
+    public Builder withLogFileLengthThresholdBasedCompaction(int logFileLengthThreshold) {
+      compactionConfig.setValue(COMPACTION_LOG_FILE_LENGTH_THRESHOLD, String.valueOf(logFileLengthThreshold));
+      return this;

Review Comment:
   Same here for method name and variable renaming.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -106,6 +106,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
       .withDocumentation("Only if the log file size is greater than the threshold in bytes,"
           + " the file group will be compacted.");
 
+  public static final ConfigProperty<Long> COMPACTION_LOG_FILE_LENGTH_THRESHOLD = ConfigProperty
+      .key("hoodie.compaction.logfile.length.threshold")

Review Comment:
   `hoodie.compaction.logfile.length.threshold`
   ->
   `hoodie.compaction.logfile.num.threshold`
   `length` can be confused with the size of the log file.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1248891059

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409",
       "triggerID" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 462f77736f855dc277cc62e0778fb4c1fa04f09a Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357) 
   * 9d0a67996db91ad90808ddb83aa281ad04e91a76 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua merged pull request #6670: [HUDI-4842] Support compaction strategy based on delta file num

Posted by GitBox <gi...@apache.org>.
yihua merged PR #6670:
URL: https://github.com/apache/hudi/pull/6670


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1249287949

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     }, {
       "hash" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409",
       "triggerID" : "9d0a67996db91ad90808ddb83aa281ad04e91a76",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 9d0a67996db91ad90808ddb83aa281ad04e91a76 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11409) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] yihua commented on a diff in pull request #6670: [HUDI-4842] Support compression strategy based on delta file num

Posted by GitBox <gi...@apache.org>.
yihua commented on code in PR #6670:
URL: https://github.com/apache/hudi/pull/6670#discussion_r973557052


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -381,6 +387,11 @@ public Builder withLogFileSizeThresholdBasedCompaction(long logFileSizeThreshold
       return this;
     }
 
+    public Builder withLogFileNumThresholdBasedCompaction(int logFileNumThreshold) {

Review Comment:
   nit: -> `withCompactionLogFileNumThreshold`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/config/HoodieCompactionConfig.java:
##########
@@ -106,6 +106,12 @@ public class HoodieCompactionConfig extends HoodieConfig {
       .withDocumentation("Only if the log file size is greater than the threshold in bytes,"
           + " the file group will be compacted.");
 
+  public static final ConfigProperty<Long> COMPACTION_LOG_FILE_LENGTH_THRESHOLD = ConfigProperty
+      .key("hoodie.compaction.logfile.length.threshold")
+      .defaultValue(0L)

Review Comment:
   Got it.  Let’s keep it then.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6670: [HUDI-4842] Support compression strategy based on delte file length

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6670:
URL: https://github.com/apache/hudi/pull/6670#issuecomment-1246553833

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357",
       "triggerID" : "462f77736f855dc277cc62e0778fb4c1fa04f09a",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 462f77736f855dc277cc62e0778fb4c1fa04f09a Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11357) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org