You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/02/18 10:46:41 UTC

[GitHub] [hudi] xushiyan opened a new pull request #4847: [HUDI-3042] Refactoring clustering executors

xushiyan opened a new pull request #4847:
URL: https://github.com/apache/hudi/pull/4847


   Extract common code from
   - hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
   - hudi-client/hudi-java-client/src/main/java/org/apache/hudi/table/action/cluster/JavaExecuteClusteringCommitActionExecutor.java
   
   to
   
   hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
   
   ## Committer checklist
   
    - [ ] Has a corresponding JIRA in PR title & commit
    
    - [ ] Commit message is descriptive of the change
    
    - [ ] CI is green
   
    - [ ] Necessary doc changes done or have another open PR
          
    - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1046427086


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044324340


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044353443


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044356609


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044328778


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044353443


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044324340


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1049302427


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#discussion_r814479652



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
##########
@@ -200,4 +215,65 @@ protected boolean isWorkloadProfileNeeded() {
 
   protected abstract Iterator<List<WriteStatus>> handleUpdate(String partitionPath, String fileId,
       Iterator<HoodieRecord<T>> recordItr) throws IOException;
+
+  protected HoodieWriteMetadata<HoodieData<WriteStatus>> executeClustering(HoodieClusteringPlan clusteringPlan) {
+    HoodieInstant instant = HoodieTimeline.getReplaceCommitRequestedInstant(instantTime);

Review comment:
       I think the goal is to revamp the commit executors and write pipeline altogether later on, so the refactoring here is limited to code reuse.  @xushiyan is that the case?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044403831


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1046398925


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1046427086


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044506838


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#discussion_r809894233



##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -18,111 +18,48 @@
 
 package org.apache.hudi.table.action.cluster;
 
-import org.apache.hudi.avro.HoodieAvroUtils;
-import org.apache.hudi.avro.model.HoodieClusteringGroup;
 import org.apache.hudi.avro.model.HoodieClusteringPlan;
 import org.apache.hudi.client.WriteStatus;
-import org.apache.hudi.client.clustering.run.strategy.SparkSingleFileSortExecutionStrategy;
+import org.apache.hudi.common.data.HoodieData;
 import org.apache.hudi.common.engine.HoodieEngineContext;
-import org.apache.hudi.common.model.HoodieCommitMetadata;
-import org.apache.hudi.common.model.HoodieFileGroupId;
-import org.apache.hudi.common.model.HoodieKey;
-import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.WriteOperationType;
-import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.ClusteringUtils;
-import org.apache.hudi.common.util.CommitUtils;
-import org.apache.hudi.common.util.Option;
-import org.apache.hudi.common.util.ReflectionUtils;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.data.HoodieJavaRDD;
 import org.apache.hudi.exception.HoodieClusteringException;
 import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
-import org.apache.hudi.table.action.cluster.strategy.ClusteringExecutionStrategy;
 import org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor;
 
-import org.apache.avro.Schema;
-import org.apache.log4j.LogManager;
-import org.apache.log4j.Logger;
 import org.apache.spark.api.java.JavaRDD;
 
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-import java.util.stream.Collectors;
-
 public class SparkExecuteClusteringCommitActionExecutor<T extends HoodieRecordPayload<T>>
     extends BaseSparkCommitActionExecutor<T> {
 
-  private static final Logger LOG = LogManager.getLogger(SparkExecuteClusteringCommitActionExecutor.class);
   private final HoodieClusteringPlan clusteringPlan;
 
   public SparkExecuteClusteringCommitActionExecutor(HoodieEngineContext context,
                                                     HoodieWriteConfig config, HoodieTable table,
                                                     String instantTime) {
     super(context, config, table, instantTime, WriteOperationType.CLUSTER);
-    this.clusteringPlan = ClusteringUtils.getClusteringPlan(table.getMetaClient(), HoodieTimeline.getReplaceCommitRequestedInstant(instantTime))
-      .map(Pair::getRight).orElseThrow(() -> new HoodieClusteringException("Unable to read clustering plan for instant: " + instantTime));
+    this.clusteringPlan = ClusteringUtils.getClusteringPlan(
+        table.getMetaClient(), HoodieTimeline.getReplaceCommitRequestedInstant(instantTime))
+        .map(Pair::getRight).orElseThrow(() -> new HoodieClusteringException(
+            "Unable to read clustering plan for instant: " + instantTime));
   }
 
   @Override
   public HoodieWriteMetadata<JavaRDD<WriteStatus>> execute() {
-    HoodieInstant instant = HoodieTimeline.getReplaceCommitRequestedInstant(instantTime);
-    // Mark instant as clustering inflight
-    table.getActiveTimeline().transitionReplaceRequestedToInflight(instant, Option.empty());
-    table.getMetaClient().reloadActiveTimeline();
-
-    final Schema schema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()));
-    HoodieWriteMetadata<JavaRDD<WriteStatus>> writeMetadata = ((ClusteringExecutionStrategy<T, JavaRDD<HoodieRecord<? extends HoodieRecordPayload>>, JavaRDD<HoodieKey>, JavaRDD<WriteStatus>>)
-        ReflectionUtils.loadClass(config.getClusteringExecutionStrategyClass(),
-            new Class<?>[] {HoodieTable.class, HoodieEngineContext.class, HoodieWriteConfig.class}, table, context, config))
-        .performClustering(clusteringPlan, schema, instantTime);
-    JavaRDD<WriteStatus> writeStatusRDD = writeMetadata.getWriteStatuses();
-    JavaRDD<WriteStatus> statuses = updateIndex(writeStatusRDD, writeMetadata);
-    writeMetadata.setWriteStats(statuses.map(WriteStatus::getStat).collect());
-    writeMetadata.setPartitionToReplaceFileIds(getPartitionToReplacedFileIds(writeMetadata));
-    commitOnAutoCommit(writeMetadata);
-    if (!writeMetadata.getCommitMetadata().isPresent()) {
-      HoodieCommitMetadata commitMetadata = CommitUtils.buildMetadata(writeMetadata.getWriteStats().get(), writeMetadata.getPartitionToReplaceFileIds(),
-          extraMetadata, operationType, getSchemaToStoreInCommit(), getCommitActionType());
-      writeMetadata.setCommitMetadata(Option.of(commitMetadata));
-    }
-    return writeMetadata;

Review comment:
       extracted to `executeClustering()` in BaseCommitActionExecutor.java

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -18,111 +18,48 @@
 
 package org.apache.hudi.table.action.cluster;
 
-import org.apache.hudi.avro.HoodieAvroUtils;
-import org.apache.hudi.avro.model.HoodieClusteringGroup;
 import org.apache.hudi.avro.model.HoodieClusteringPlan;
 import org.apache.hudi.client.WriteStatus;
-import org.apache.hudi.client.clustering.run.strategy.SparkSingleFileSortExecutionStrategy;
+import org.apache.hudi.common.data.HoodieData;
 import org.apache.hudi.common.engine.HoodieEngineContext;
-import org.apache.hudi.common.model.HoodieCommitMetadata;
-import org.apache.hudi.common.model.HoodieFileGroupId;
-import org.apache.hudi.common.model.HoodieKey;
-import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.WriteOperationType;
-import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.ClusteringUtils;
-import org.apache.hudi.common.util.CommitUtils;
-import org.apache.hudi.common.util.Option;
-import org.apache.hudi.common.util.ReflectionUtils;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.data.HoodieJavaRDD;
 import org.apache.hudi.exception.HoodieClusteringException;
 import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
-import org.apache.hudi.table.action.cluster.strategy.ClusteringExecutionStrategy;
 import org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor;
 
-import org.apache.avro.Schema;
-import org.apache.log4j.LogManager;
-import org.apache.log4j.Logger;
 import org.apache.spark.api.java.JavaRDD;
 
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-import java.util.stream.Collectors;
-
 public class SparkExecuteClusteringCommitActionExecutor<T extends HoodieRecordPayload<T>>
     extends BaseSparkCommitActionExecutor<T> {
 
-  private static final Logger LOG = LogManager.getLogger(SparkExecuteClusteringCommitActionExecutor.class);
   private final HoodieClusteringPlan clusteringPlan;
 
   public SparkExecuteClusteringCommitActionExecutor(HoodieEngineContext context,
                                                     HoodieWriteConfig config, HoodieTable table,
                                                     String instantTime) {
     super(context, config, table, instantTime, WriteOperationType.CLUSTER);
-    this.clusteringPlan = ClusteringUtils.getClusteringPlan(table.getMetaClient(), HoodieTimeline.getReplaceCommitRequestedInstant(instantTime))
-      .map(Pair::getRight).orElseThrow(() -> new HoodieClusteringException("Unable to read clustering plan for instant: " + instantTime));
+    this.clusteringPlan = ClusteringUtils.getClusteringPlan(
+        table.getMetaClient(), HoodieTimeline.getReplaceCommitRequestedInstant(instantTime))
+        .map(Pair::getRight).orElseThrow(() -> new HoodieClusteringException(
+            "Unable to read clustering plan for instant: " + instantTime));
   }
 
   @Override
   public HoodieWriteMetadata<JavaRDD<WriteStatus>> execute() {
-    HoodieInstant instant = HoodieTimeline.getReplaceCommitRequestedInstant(instantTime);
-    // Mark instant as clustering inflight
-    table.getActiveTimeline().transitionReplaceRequestedToInflight(instant, Option.empty());
-    table.getMetaClient().reloadActiveTimeline();
-
-    final Schema schema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()));
-    HoodieWriteMetadata<JavaRDD<WriteStatus>> writeMetadata = ((ClusteringExecutionStrategy<T, JavaRDD<HoodieRecord<? extends HoodieRecordPayload>>, JavaRDD<HoodieKey>, JavaRDD<WriteStatus>>)
-        ReflectionUtils.loadClass(config.getClusteringExecutionStrategyClass(),
-            new Class<?>[] {HoodieTable.class, HoodieEngineContext.class, HoodieWriteConfig.class}, table, context, config))
-        .performClustering(clusteringPlan, schema, instantTime);
-    JavaRDD<WriteStatus> writeStatusRDD = writeMetadata.getWriteStatuses();
-    JavaRDD<WriteStatus> statuses = updateIndex(writeStatusRDD, writeMetadata);
-    writeMetadata.setWriteStats(statuses.map(WriteStatus::getStat).collect());
-    writeMetadata.setPartitionToReplaceFileIds(getPartitionToReplacedFileIds(writeMetadata));
-    commitOnAutoCommit(writeMetadata);
-    if (!writeMetadata.getCommitMetadata().isPresent()) {
-      HoodieCommitMetadata commitMetadata = CommitUtils.buildMetadata(writeMetadata.getWriteStats().get(), writeMetadata.getPartitionToReplaceFileIds(),
-          extraMetadata, operationType, getSchemaToStoreInCommit(), getCommitActionType());
-      writeMetadata.setCommitMetadata(Option.of(commitMetadata));
-    }
-    return writeMetadata;
-  }
-
-  /**
-   * Validate actions taken by clustering. In the first implementation, we validate at least one new file is written.
-   * But we can extend this to add more validation. E.g. number of records read = number of records written etc.
-   * We can also make these validations in BaseCommitActionExecutor to reuse pre-commit hooks for multiple actions.
-   */
-  private void validateWriteResult(HoodieWriteMetadata<JavaRDD<WriteStatus>> writeMetadata) {
-    if (writeMetadata.getWriteStatuses().isEmpty()) {
-      throw new HoodieClusteringException("Clustering plan produced 0 WriteStatus for " + instantTime
-          + " #groups: " + clusteringPlan.getInputGroups().size() + " expected at least "
-          + clusteringPlan.getInputGroups().stream().mapToInt(HoodieClusteringGroup::getNumOutputFileGroups).sum()
-          + " write statuses");
-    }
+    HoodieWriteMetadata<HoodieData<WriteStatus>> writeMetadata = executeClustering(clusteringPlan);
+    JavaRDD<WriteStatus> transformedWriteStatuses = HoodieJavaRDD.getJavaRDD(writeMetadata.getWriteStatuses());
+    return writeMetadata.clone(transformedWriteStatuses);
   }
 
   @Override
   protected String getCommitActionType() {
     return HoodieTimeline.REPLACE_COMMIT_ACTION;
   }
-
-  @Override
-  protected Map<String, List<String>> getPartitionToReplacedFileIds(HoodieWriteMetadata<JavaRDD<WriteStatus>> writeMetadata) {
-    Set<HoodieFileGroupId> newFilesWritten = writeMetadata.getWriteStats().get().stream()
-        .map(s -> new HoodieFileGroupId(s.getPartitionPath(), s.getFileId())).collect(Collectors.toSet());
-    // for the below execution strategy, new file group id would be same as old file group id
-    if (SparkSingleFileSortExecutionStrategy.class.getName().equals(config.getClusteringExecutionStrategyClass())) {
-      return ClusteringUtils.getFileGroupsFromClusteringPlan(clusteringPlan)
-          .collect(Collectors.groupingBy(fg -> fg.getPartitionPath(), Collectors.mapping(fg -> fg.getFileId(), Collectors.toList())));
-    }
-    return ClusteringUtils.getFileGroupsFromClusteringPlan(clusteringPlan)
-        .filter(fg -> !newFilesWritten.contains(fg))
-        .collect(Collectors.groupingBy(fg -> fg.getPartitionPath(), Collectors.mapping(fg -> fg.getFileId(), Collectors.toList())));
-  }

Review comment:
       extracted to BaseCommitActionExecutor.java

##########
File path: hudi-client/hudi-spark-client/src/main/java/org/apache/hudi/table/action/cluster/SparkExecuteClusteringCommitActionExecutor.java
##########
@@ -18,111 +18,48 @@
 
 package org.apache.hudi.table.action.cluster;
 
-import org.apache.hudi.avro.HoodieAvroUtils;
-import org.apache.hudi.avro.model.HoodieClusteringGroup;
 import org.apache.hudi.avro.model.HoodieClusteringPlan;
 import org.apache.hudi.client.WriteStatus;
-import org.apache.hudi.client.clustering.run.strategy.SparkSingleFileSortExecutionStrategy;
+import org.apache.hudi.common.data.HoodieData;
 import org.apache.hudi.common.engine.HoodieEngineContext;
-import org.apache.hudi.common.model.HoodieCommitMetadata;
-import org.apache.hudi.common.model.HoodieFileGroupId;
-import org.apache.hudi.common.model.HoodieKey;
-import org.apache.hudi.common.model.HoodieRecord;
 import org.apache.hudi.common.model.HoodieRecordPayload;
 import org.apache.hudi.common.model.WriteOperationType;
-import org.apache.hudi.common.table.timeline.HoodieInstant;
 import org.apache.hudi.common.table.timeline.HoodieTimeline;
 import org.apache.hudi.common.util.ClusteringUtils;
-import org.apache.hudi.common.util.CommitUtils;
-import org.apache.hudi.common.util.Option;
-import org.apache.hudi.common.util.ReflectionUtils;
 import org.apache.hudi.common.util.collection.Pair;
 import org.apache.hudi.config.HoodieWriteConfig;
+import org.apache.hudi.data.HoodieJavaRDD;
 import org.apache.hudi.exception.HoodieClusteringException;
 import org.apache.hudi.table.HoodieTable;
 import org.apache.hudi.table.action.HoodieWriteMetadata;
-import org.apache.hudi.table.action.cluster.strategy.ClusteringExecutionStrategy;
 import org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor;
 
-import org.apache.avro.Schema;
-import org.apache.log4j.LogManager;
-import org.apache.log4j.Logger;
 import org.apache.spark.api.java.JavaRDD;
 
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
-import java.util.stream.Collectors;
-
 public class SparkExecuteClusteringCommitActionExecutor<T extends HoodieRecordPayload<T>>
     extends BaseSparkCommitActionExecutor<T> {
 
-  private static final Logger LOG = LogManager.getLogger(SparkExecuteClusteringCommitActionExecutor.class);
   private final HoodieClusteringPlan clusteringPlan;
 
   public SparkExecuteClusteringCommitActionExecutor(HoodieEngineContext context,
                                                     HoodieWriteConfig config, HoodieTable table,
                                                     String instantTime) {
     super(context, config, table, instantTime, WriteOperationType.CLUSTER);
-    this.clusteringPlan = ClusteringUtils.getClusteringPlan(table.getMetaClient(), HoodieTimeline.getReplaceCommitRequestedInstant(instantTime))
-      .map(Pair::getRight).orElseThrow(() -> new HoodieClusteringException("Unable to read clustering plan for instant: " + instantTime));
+    this.clusteringPlan = ClusteringUtils.getClusteringPlan(
+        table.getMetaClient(), HoodieTimeline.getReplaceCommitRequestedInstant(instantTime))
+        .map(Pair::getRight).orElseThrow(() -> new HoodieClusteringException(
+            "Unable to read clustering plan for instant: " + instantTime));
   }
 
   @Override
   public HoodieWriteMetadata<JavaRDD<WriteStatus>> execute() {
-    HoodieInstant instant = HoodieTimeline.getReplaceCommitRequestedInstant(instantTime);
-    // Mark instant as clustering inflight
-    table.getActiveTimeline().transitionReplaceRequestedToInflight(instant, Option.empty());
-    table.getMetaClient().reloadActiveTimeline();
-
-    final Schema schema = HoodieAvroUtils.addMetadataFields(new Schema.Parser().parse(config.getSchema()));
-    HoodieWriteMetadata<JavaRDD<WriteStatus>> writeMetadata = ((ClusteringExecutionStrategy<T, JavaRDD<HoodieRecord<? extends HoodieRecordPayload>>, JavaRDD<HoodieKey>, JavaRDD<WriteStatus>>)
-        ReflectionUtils.loadClass(config.getClusteringExecutionStrategyClass(),
-            new Class<?>[] {HoodieTable.class, HoodieEngineContext.class, HoodieWriteConfig.class}, table, context, config))
-        .performClustering(clusteringPlan, schema, instantTime);
-    JavaRDD<WriteStatus> writeStatusRDD = writeMetadata.getWriteStatuses();
-    JavaRDD<WriteStatus> statuses = updateIndex(writeStatusRDD, writeMetadata);
-    writeMetadata.setWriteStats(statuses.map(WriteStatus::getStat).collect());
-    writeMetadata.setPartitionToReplaceFileIds(getPartitionToReplacedFileIds(writeMetadata));
-    commitOnAutoCommit(writeMetadata);
-    if (!writeMetadata.getCommitMetadata().isPresent()) {
-      HoodieCommitMetadata commitMetadata = CommitUtils.buildMetadata(writeMetadata.getWriteStats().get(), writeMetadata.getPartitionToReplaceFileIds(),
-          extraMetadata, operationType, getSchemaToStoreInCommit(), getCommitActionType());
-      writeMetadata.setCommitMetadata(Option.of(commitMetadata));
-    }
-    return writeMetadata;
-  }
-
-  /**
-   * Validate actions taken by clustering. In the first implementation, we validate at least one new file is written.
-   * But we can extend this to add more validation. E.g. number of records read = number of records written etc.
-   * We can also make these validations in BaseCommitActionExecutor to reuse pre-commit hooks for multiple actions.
-   */
-  private void validateWriteResult(HoodieWriteMetadata<JavaRDD<WriteStatus>> writeMetadata) {
-    if (writeMetadata.getWriteStatuses().isEmpty()) {
-      throw new HoodieClusteringException("Clustering plan produced 0 WriteStatus for " + instantTime
-          + " #groups: " + clusteringPlan.getInputGroups().size() + " expected at least "
-          + clusteringPlan.getInputGroups().stream().mapToInt(HoodieClusteringGroup::getNumOutputFileGroups).sum()
-          + " write statuses");
-    }

Review comment:
       extracted to BaseCommitActionExecutor.java




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] danny0405 commented on a change in pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
danny0405 commented on a change in pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#discussion_r810755690



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
##########
@@ -200,4 +215,65 @@ protected boolean isWorkloadProfileNeeded() {
 
   protected abstract Iterator<List<WriteStatus>> handleUpdate(String partitionPath, String fileId,
       Iterator<HoodieRecord<T>> recordItr) throws IOException;
+
+  protected HoodieWriteMetadata<HoodieData<WriteStatus>> executeClustering(HoodieClusteringPlan clusteringPlan) {
+    HoodieInstant instant = HoodieTimeline.getReplaceCommitRequestedInstant(instantTime);

Review comment:
       The `BaseCommitActionExecutor` responsibilities are a bit confusing, it handles regular writing process such as `insert`, `upsert` and with this path `clustering`, then what about the `compaction`?
   
   Should we make a new base class for table services then ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1046397635


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] yihua commented on a change in pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
yihua commented on a change in pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#discussion_r813406339



##########
File path: hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/JavaExecutionStrategy.java
##########
@@ -71,7 +73,7 @@
  * Clustering strategy for Java engine.
  */
 public abstract class JavaExecutionStrategy<T extends HoodieRecordPayload<T>>
-    extends ClusteringExecutionStrategy<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> {
+    extends ClusteringExecutionStrategy<T, HoodieData<HoodieRecord<T>>, HoodieData<HoodieKey>, HoodieData<WriteStatus>> {

Review comment:
       Is this Java-specific class going to be removed as a follow-up?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#discussion_r814774807



##########
File path: hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/commit/BaseCommitActionExecutor.java
##########
@@ -200,4 +215,65 @@ protected boolean isWorkloadProfileNeeded() {
 
   protected abstract Iterator<List<WriteStatus>> handleUpdate(String partitionPath, String fileId,
       Iterator<HoodieRecord<T>> recordItr) throws IOException;
+
+  protected HoodieWriteMetadata<HoodieData<WriteStatus>> executeClustering(HoodieClusteringPlan clusteringPlan) {
+    HoodieInstant instant = HoodieTimeline.getReplaceCommitRequestedInstant(instantTime);

Review comment:
       @danny0405 agreed that it looks like some mixed responsibilities there. i'll make clearer separation in https://issues.apache.org/jira/browse/HUDI-2439




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044328778


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1046397635


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044403831


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044506838


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1049304779


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot removed a comment on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot removed a comment on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1049302427


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1044356609


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "CANCELED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * c89db11dcd1e33a31405fef7fba2788fb3723338 Azure: [CANCELED](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122) 
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] hudi-bot commented on pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#issuecomment-1046398925


   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6122",
       "triggerID" : "c89db11dcd1e33a31405fef7fba2788fb3723338",
       "triggerType" : "PUSH"
     }, {
       "hash" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123",
       "triggerID" : "94c384ea344ec934fe236e2dad8a7f58fa3ac489",
       "triggerType" : "PUSH"
     }, {
       "hash" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155",
       "triggerID" : "7306cb45f7c353d982827c08dbadb4ddcbda31dc",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * 94c384ea344ec934fe236e2dad8a7f58fa3ac489 Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6123) 
   * 7306cb45f7c353d982827c08dbadb4ddcbda31dc Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=6155) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan merged pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
xushiyan merged pull request #4847:
URL: https://github.com/apache/hudi/pull/4847


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] xushiyan commented on a change in pull request #4847: [HUDI-3042] Refactoring clustering executors

Posted by GitBox <gi...@apache.org>.
xushiyan commented on a change in pull request #4847:
URL: https://github.com/apache/hudi/pull/4847#discussion_r814773194



##########
File path: hudi-client/hudi-java-client/src/main/java/org/apache/hudi/client/clustering/run/strategy/JavaExecutionStrategy.java
##########
@@ -71,7 +73,7 @@
  * Clustering strategy for Java engine.
  */
 public abstract class JavaExecutionStrategy<T extends HoodieRecordPayload<T>>
-    extends ClusteringExecutionStrategy<T, List<HoodieRecord<T>>, List<HoodieKey>, List<WriteStatus>> {
+    extends ClusteringExecutionStrategy<T, HoodieData<HoodieRecord<T>>, HoodieData<HoodieKey>, HoodieData<WriteStatus>> {

Review comment:
       @yihua yes i should make another PR to deal with `ClusteringExecutionStrategy` and subclasses, which can be a good separation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org