You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Zouxxyy (via GitHub)" <gi...@apache.org> on 2023/04/14 02:59:03 UTC

[GitHub] [hudi] Zouxxyy opened a new pull request, #8456: [HUDI-6078] Make clean controlled by parameter in flink

Zouxxyy opened a new pull request, #8456:
URL: https://github.com/apache/hudi/pull/8456

   ### Change Logs
   
   Make clean controlled by parameters in flink
   
   ### Impact
   
   Make clean controlled by parameters in flink
   
   ### Risk level (write none, low medium or high below)
   
   low
   
   ### Documentation Update
   
   none
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy closed pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy closed pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink
URL: https://github.com/apache/hudi/pull/8456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1510864793

   > If we can make consesus that cleanign would finally take place in each async or sync way, then disabling the `clustering.async.enabled` somehow means we need to do the sync cleaning for the batch job. Does that make sense to you then?
   
   Yes, then can we have a way to turn off clean, whether it is sync or async


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507881515

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5fc5cf9e31b0209f280e06eb10bf2a75e201b807",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "5fc5cf9e31b0209f280e06eb10bf2a75e201b807",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5fc5cf9e31b0209f280e06eb10bf2a75e201b807 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1510835499

   Currently the meaning of `clean.async.enabled` is ambiguous:
   
   1. If it is used as whether enable asynchronous clean, then when it is false, synchronous clean should not be executed in compact or cluster. Instead, a parameter like `clean.async.enabled` should be added
   
   ```java
   if (!conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
      LOG.info("Running inline clean");
      this.writeClient.clean();
    }
   ```
   
   2. If it is used as the type of clean, then a parameter  like `clean.automatic` should be added to control whether to enable auto clean
   
   Hudi spark is using the second logic, but for other table services of flink, such as `clustering.async.enabled` and `compaction.async.enabled`, use the first logic (as far as I understand flink does not support synchronous cluster and compact, right?)
   
   What do you think? @danny0405 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1512615877

   > > If we can make consesus that cleanign would finally take place in each async or sync way, then disabling the `clustering.async.enabled` somehow means we need to do the sync cleaning for the batch job. Does that make sense to you then?
   > 
   > Yes, then can we have a way to turn off clean, whether it is sync or async
   
   Set `clean.async.enabled` as false would disable the async cleaning.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1508299165

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5fc5cf9e31b0209f280e06eb10bf2a75e201b807",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16345",
       "triggerID" : "5fc5cf9e31b0209f280e06eb10bf2a75e201b807",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5fc5cf9e31b0209f280e06eb10bf2a75e201b807 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16345) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1516387782

   Will trigger it when I know more about hudi flink sink ^_^


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "hudi-bot (via GitHub)" <gi...@apache.org>.
hudi-bot commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507885169

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "5fc5cf9e31b0209f280e06eb10bf2a75e201b807",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16345",
       "triggerID" : "5fc5cf9e31b0209f280e06eb10bf2a75e201b807",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 5fc5cf9e31b0209f280e06eb10bf2a75e201b807 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=16345) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1507861640

   @danny0405 One problem is that there is only `clean.async.enabled` in flink, but there is `hoodie.clean.automatic` in spark to control whether to clean automatically. 
   Should we add parameter, or use `clean.async.enabled` uniformly to control the clean behavior in flink?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1512698474

   > Set `clean.async.enabled` as false would disable the async cleaning.
   @danny0405 I mean completely turn off clean, set `clean.async.enabled` as false still will exec sync clean


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1510859103

   > But for other table services of flink, such as `clustering.async.enabled` and `compaction.async.enabled`, use the first logic (as far as I understand flink does not support sync cluster and compact, right?
   
   If we can make consesus that cleanign would finally take place in each async or sync way, then disabling the `clustering.async.enabled` somehow means we need to do the sync cleaning for the batch job. Does that make sense to you then?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] stream2000 commented on a diff in pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "stream2000 (via GitHub)" <gi...@apache.org>.
stream2000 commented on code in PR #8456:
URL: https://github.com/apache/hudi/pull/8456#discussion_r1166489345


##########
hudi-flink-datasource/hudi-flink/src/main/java/org/apache/hudi/sink/CleanFunction.java:
##########
@@ -60,11 +60,13 @@ public CleanFunction(Configuration conf) {
   @Override
   public void open(Configuration parameters) throws Exception {
     super.open(parameters);
-    this.writeClient = FlinkWriteClients.createWriteClient(conf, getRuntimeContext());
-    this.executor = NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
-    String instantTime = HoodieActiveTimeline.createNewInstantTime();
-    LOG.info(String.format("exec clean with instant time %s...", instantTime));
-    executor.execute(() -> writeClient.clean(instantTime), "wait for cleaning finish");
+    if (conf.getBoolean(FlinkOptions.CLEAN_ASYNC_ENABLED)) {
+      this.writeClient = FlinkWriteClients.createWriteClient(conf, getRuntimeContext());
+      this.executor = NonThrownExecutor.builder(LOG).waitForTasksFinish(true).build();
+      String instantTime = HoodieActiveTimeline.createNewInstantTime();
+      LOG.info(String.format("exec clean with instant time %s...", instantTime));
+      executor.execute(() -> writeClient.clean(instantTime), "wait for cleaning finish");
+    }

Review Comment:
   nit: we can add a line of log here to indicate to the user that async clean is disabled although the clean function is opened. 
   
   LOG.info("Async clean is disabled");
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] danny0405 commented on pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "danny0405 (via GitHub)" <gi...@apache.org>.
danny0405 commented on PR #8456:
URL: https://github.com/apache/hudi/pull/8456#issuecomment-1512771160

   > > Set `clean.async.enabled` as false would disable the async cleaning.
   > 
   > @danny0405 I mean completely turn off clean, set `clean.async.enabled` as false still will exec sync clean
   
   That's true, the pipeline can be optimized.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] Zouxxyy closed pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink

Posted by "Zouxxyy (via GitHub)" <gi...@apache.org>.
Zouxxyy closed pull request #8456: [HUDI-6078] Make clean controlled by parameter in flink
URL: https://github.com/apache/hudi/pull/8456


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org