You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/28 06:23:08 UTC

[GitHub] [hudi] chenshzh opened a new pull request, #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

chenshzh opened a new pull request, #5991:
URL: https://github.com/apache/hudi/pull/5991

   Add separate control for Flink compaction operation sync/async mode
   
   ## *Tips*
   - *Thank you very much for contributing to Apache Hudi.*
   - *Please review https://hudi.apache.org/contribute/how-to-contribute before opening a pull request.*
   
   ## What is the purpose of the pull request
   
   *(For example: This pull request adds quick-start document.)*
   
   ## Brief change log
   
   *(for example:)*
     - *Modify AnnotationLocation checkstyle rule in checkstyle.xml*
   
   ## Verify this pull request
   
   *(Please pick either of the following options)*
   
   This pull request is a trivial rework / code cleanup without any test coverage.
   
   *(or)*
   
   This pull request is already covered by existing tests, such as *(please describe tests)*.
   
   (or)
   
   This change added tests and can be verified as follows:
   
   *(example:)*
   
     - *Added integration tests for end-to-end.*
     - *Added HoodieClientWriteTest to verify the change.*
     - *Manually verified the change by running a job locally.*
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1168308178

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48e9a4672ed395aa375d4fc1b26061159eca2d64 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004108039


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   So, Why not jus add an option to indicate whether compaction is sync or not?
   It can not only solve the problem that we cannot open sync compaction in ingestion job for unbounded source, but also help refactor the weird bounded source compaction logics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1286613987

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612",
       "triggerID" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1d800b34d3abed3a432dfaef1f34ab3ee5c146db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612) 
   * 0ca6be853989eb27491728f7cfdc9cb9f727e08e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [HUDI-4329] Add separate control for Flink compaction operation sync/async mode [hudi]

Posted by "yihua (via GitHub)" <gi...@apache.org>.
yihua commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-2019214979

   @danny0405 is this still needed for Hudi Flink?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1168876349

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612",
       "triggerID" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1d800b34d3abed3a432dfaef1f34ab3ee5c146db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004101830


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   Yes, agreed. 
   Sync compaction is meaningful for both bounded and and unbounded source, because we sometimes will still use compaction in the ingestion job but not a separate compaction job for **unbounded** source: 
   But we met cases that async compaction with watermark will cause compaction msgs missed. Thus we will keep sync compaction as an option.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004118789


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   Watermark is also propagated by output.collector. 
   In an ingestion job with async compaction and watermark, when upstream operators forward watermark,  and at the same time we use async thread to execute compaction async and collect the result compaction msgs,  the output.collector will become thread unsafe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1001686558


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   Normally we use compaction.async.enabled to turn on compaction. But we could not make it sync because it's already been true.
   ```java
       if (asyncCompaction) {
         // executes the compaction task asynchronously to not block the checkpoint barrier propagate.
         executor.execute(
             () -> doCompaction(instantTime, compactionOperation, collector, reloadWriteConfig()),
             (errMsg, t) -> collector.collect(new CompactionCommitEvent(instantTime, compactionOperation.getFileId(), taskID)),
             "Execute compaction for instant %s from task %d", instantTime, taskID);
       } else {
         // executes the compaction task synchronously for batch mode.
         LOG.info("Execute compaction for instant {} from task {}", instantTime, taskID);
         doCompaction(instantTime, compactionOperation, collector, writeClient.getConfig());
       }
   ```
   
   
   > support sync compaction for bounded source
   
   We will use sync compaction mode for unbounded source in some scenarios. And actually the bounded source sync compaction seems weird. It use compaction.async.enabled  true to turn on compaction, and then switch it to fasle for sync mode.
   
   ```java  
        // compaction
         if (OptionsResolver.needsAsyncCompaction(conf)) {   // here FlinkOptions.COMPACTION_ASYNC_ENABLED decides that we need compaction
           // use synchronous compaction for bounded source.
           if (context.isBounded()) {
             conf.setBoolean(FlinkOptions.COMPACTION_ASYNC_ENABLED, false); // we come here because it is true, and it's so weird to turn  it false, actually we just want the operation to be executed sync.
           }
           return Pipelines.compact(conf, pipeline);
         } else {
           return Pipelines.clean(conf, pipeline);
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1286620415

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612",
       "triggerID" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d520045f9b339365203ef0f6c2379693ed9496fd",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "d520045f9b339365203ef0f6c2379693ed9496fd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1d800b34d3abed3a432dfaef1f34ab3ee5c146db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612) 
   * 0ca6be853989eb27491728f7cfdc9cb9f727e08e UNKNOWN
   * d520045f9b339365203ef0f6c2379693ed9496fd UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1286626822

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612",
       "triggerID" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d520045f9b339365203ef0f6c2379693ed9496fd",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12415",
       "triggerID" : "d520045f9b339365203ef0f6c2379693ed9496fd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 1d800b34d3abed3a432dfaef1f34ab3ee5c146db Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612) 
   * 0ca6be853989eb27491728f7cfdc9cb9f727e08e UNKNOWN
   * d520045f9b339365203ef0f6c2379693ed9496fd Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12415) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004111744


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   > But we met cases that async compaction with watermark will cause compaction msgs missed. Thus we will keep sync compaction as an option for unbounded source.
   
   Can you elaborate more for the details here ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004082925


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   > And actually the bounded source sync compaction seems weird. It use compaction.async.enabled true to turn on compaction, and then switch it to fasle for sync mode.
   
   I agree the logic here is a little weird, we can refactor it though,
   the bounded source sync compaction is meaningful, especially when people do batch ingestion for mor table, they do not what a separate compaction job running again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1001603063


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   Why switching to a new option key, what's the purpose here ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1005220313


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   @danny0405 do we have any other questions here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1001610137


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   We already support sync compaction for bounded source, it should be fine for async compaction for streaming source ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1286868239

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612",
       "triggerID" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "triggerType" : "PUSH"
     }, {
       "hash" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "0ca6be853989eb27491728f7cfdc9cb9f727e08e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "d520045f9b339365203ef0f6c2379693ed9496fd",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12415",
       "triggerID" : "d520045f9b339365203ef0f6c2379693ed9496fd",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 0ca6be853989eb27491728f7cfdc9cb9f727e08e UNKNOWN
   * d520045f9b339365203ef0f6c2379693ed9496fd Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=12415) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
danny0405 commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1006343669


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   > and at the same time we use async thread to execute compaction async and collect the result compaction msgs, the output.collector will become thread unsafe
   
   The output collector is thread unsafe here, let's fix it to be thread safe then ~



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1168584404

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48e9a4672ed395aa375d4fc1b26061159eca2d64 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602) 
   * 1d800b34d3abed3a432dfaef1f34ab3ee5c146db UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1220709345

   @danny0405 And if it's convenient, pls also check out this. It seems that it has been for some time. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [HUDI-4329] Add separate control for Flink compaction operation sync/async mode [hudi]

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-2019221078

   Yeah, let's move it to 1.0 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1003980221


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   @danny0405 pls help see whether there are any more problems here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004101830


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   Yes, agreed. 
   Sync compaction is meaningful for both bounded and and unbounded source, because we sometimes will still use compaction in the ingestion job but not a separate compaction job for **unbounded** source: 
   But we met cases that async compaction with watermark will cause compaction msgs missed. Thus we will keep sync compaction as an option for unbounded source.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004108039


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   So, Why not jus add an option to indicate whether compaction operation is sync or not?
   It can not only solve the problem that we cannot open sync compaction in ingestion job for unbounded source, but also help refactor the weird bounded source compaction logics.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1004129380


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   > keep sync compaction as an option for unbounded source.
   
   And in a whole for hudi feature, you might also agree that users should be provided sync compaction option for unbounded source, no matter the above mentioned scenarios?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1168312835

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48e9a4672ed395aa375d4fc1b26061159eca2d64 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1168437825

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48e9a4672ed395aa375d4fc1b26061159eca2d64 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #5991:
URL: https://github.com/apache/hudi/pull/5991#issuecomment-1168588597

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602",
       "triggerID" : "48e9a4672ed395aa375d4fc1b26061159eca2d64",
       "triggerType" : "PUSH"
     }, {
       "hash" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612",
       "triggerID" : "1d800b34d3abed3a432dfaef1f34ab3ee5c146db",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 48e9a4672ed395aa375d4fc1b26061159eca2d64 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9602) 
   * 1d800b34d3abed3a432dfaef1f34ab3ee5c146db Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=9612) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] chenshzh commented on a diff in pull request #5991: [HUDI-4329] Add separate control for Flink compaction operation sync/async mode

Posted by GitBox <gi...@apache.org>.
chenshzh commented on code in PR #5991:
URL: https://github.com/apache/hudi/pull/5991#discussion_r1006371610


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/compact/CompactFunction.java:
##########
@@ -74,14 +74,14 @@ public class CompactFunction extends ProcessFunction<CompactionPlanEvent, Compac
 
   public CompactFunction(Configuration conf) {
     this.conf = conf;
-    this.asyncCompaction = OptionsResolver.needsAsyncCompaction(conf);
+    this.asyncCompactionOperation = OptionsResolver.needsAsyncCompactionOperation(conf);
   }

Review Comment:
   OK, but that will be another issue, i think. 
   What about this pr ? We have discussed much about its necessity to keep sync compaction option for unbounded source.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org