You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/09/03 22:40:26 UTC

[GitHub] [hudi] parisni opened a new pull request, #6580: Batch clean files to delete

parisni opened a new pull request, #6580:
URL: https://github.com/apache/hudi/pull/6580

   ### Change Logs
   
   This makes use of batch call to get fileGroup to delete during cleaning.
   This limit the number of call to the view and should fix the trouble with metadata table in context of lot of partitions.
   #fixes #6373
   
   ### Impact
   
   _Describe any public API or user-facing feature change or any performance impact._
   
   **Risk level: none | low | medium | high**
   
   _Choose one. If medium or high, explain what verification was done to mitigate the risks._
   
   ### Contributor's checklist
   
   - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute)
   - [ ] Change Logs and Impact were stated clearly
   - [ ] Adequate tests were added if applicable
   - [ ] CI passed
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1250004648

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 11ba7cd991ca83773aae03b1fd7271364079be21 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1253940345

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543",
       "triggerID" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "683f35351eae9705ae3863152a368b2b8e91abde",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11558",
       "triggerID" : "683f35351eae9705ae3863152a368b2b8e91abde",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 2cd893d322d281e59e40a062108931d60646d9d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543) 
   * 683f35351eae9705ae3863152a368b2b8e91abde Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11558) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on a diff in pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
parisni commented on code in PR #6580:
URL: https://github.com/apache/hudi/pull/6580#discussion_r973465697


##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -735,6 +735,34 @@ public final Stream<HoodieFileGroup> getAllFileGroups(String partitionStr) {
     return getAllFileGroupsIncludingReplaced(partitionStr).filter(fg -> !isFileGroupReplaced(fg));
   }
 
+  @Override
+  public final Stream<Pair<String, List<HoodieFileGroup>>> getAllFileGroups(List<String> partitionStr) {
+    return getAllFileGroupsIncludingReplaced(partitionStr)
+        .map(pair -> Pair.of(pair.getLeft(), pair.getRight().stream().filter(fg -> !isFileGroupReplaced(fg)).collect(Collectors.toList())));
+  }
+
+  private Stream<Pair<String, List<HoodieFileGroup>>> getAllFileGroupsIncludingReplaced(final List<String> partitionStrList) {
+    try {

Review Comment:
   That make sense



##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -735,6 +735,34 @@ public final Stream<HoodieFileGroup> getAllFileGroups(String partitionStr) {
     return getAllFileGroupsIncludingReplaced(partitionStr).filter(fg -> !isFileGroupReplaced(fg));
   }
 
+  @Override
+  public final Stream<Pair<String, List<HoodieFileGroup>>> getAllFileGroups(List<String> partitionStr) {
+    return getAllFileGroupsIncludingReplaced(partitionStr)
+        .map(pair -> Pair.of(pair.getLeft(), pair.getRight().stream().filter(fg -> !isFileGroupReplaced(fg)).collect(Collectors.toList())));
+  }
+
+  private Stream<Pair<String, List<HoodieFileGroup>>> getAllFileGroupsIncludingReplaced(final List<String> partitionStrList) {
+    try {

Review Comment:
   That makes sense



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1253144484

   there were some minor bugs in the source code. have fixed them and tests that were failing in last CI are succeeding locally w/ latest commit. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1253934565

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543",
       "triggerID" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "683f35351eae9705ae3863152a368b2b8e91abde",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "683f35351eae9705ae3863152a368b2b8e91abde",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 2cd893d322d281e59e40a062108931d60646d9d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543) 
   * 683f35351eae9705ae3863152a368b2b8e91abde UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1236214298

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * c292838205bb8eb57c529808c6b6da98635ac17d Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #6580:
URL: https://github.com/apache/hudi/pull/6580#discussion_r972428841


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -110,9 +112,15 @@ HoodieCleanerPlan requestClean(HoodieEngineContext context) {
       context.setJobStatus(this.getClass().getSimpleName(), "Generating list of file slices to be cleaned: " + config.getTableName());
 
       Map<String, Pair<Boolean, List<CleanFileInfo>>> cleanOpsWithPartitionMeta = context
-          .map(partitionsToClean, partitionPathToClean -> Pair.of(partitionPathToClean, planner.getDeletePaths(partitionPathToClean)), cleanerParallelism)
+          .parallelize(partitionsToClean, cleanerParallelism)
+          .mapPartitions((Iterator<String> it) -> {
+            List<String> list = new ArrayList<>();

Review Comment:
   minor: `list` -> `partitionList`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -110,9 +112,15 @@ HoodieCleanerPlan requestClean(HoodieEngineContext context) {
       context.setJobStatus(this.getClass().getSimpleName(), "Generating list of file slices to be cleaned: " + config.getTableName());
 
       Map<String, Pair<Boolean, List<CleanFileInfo>>> cleanOpsWithPartitionMeta = context
-          .map(partitionsToClean, partitionPathToClean -> Pair.of(partitionPathToClean, planner.getDeletePaths(partitionPathToClean)), cleanerParallelism)
+          .parallelize(partitionsToClean, cleanerParallelism)
+          .mapPartitions((Iterator<String> it) -> {
+            List<String> list = new ArrayList<>();
+            it.forEachRemaining(list::add);
+            Map<String, Pair<Boolean, List<CleanFileInfo>>> res = planner.getDeletePaths(list);

Review Comment:
   cleanResult



##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -735,6 +735,34 @@ public final Stream<HoodieFileGroup> getAllFileGroups(String partitionStr) {
     return getAllFileGroupsIncludingReplaced(partitionStr).filter(fg -> !isFileGroupReplaced(fg));
   }
 
+  @Override
+  public final Stream<Pair<String, List<HoodieFileGroup>>> getAllFileGroups(List<String> partitionStr) {
+    return getAllFileGroupsIncludingReplaced(partitionStr)
+        .map(pair -> Pair.of(pair.getLeft(), pair.getRight().stream().filter(fg -> !isFileGroupReplaced(fg)).collect(Collectors.toList())));
+  }
+
+  private Stream<Pair<String, List<HoodieFileGroup>>> getAllFileGroupsIncludingReplaced(final List<String> partitionStrList) {
+    try {

Review Comment:
   shouldn't we be looking to call the exiting method here. 
   ```
   getAllFileGroupsIncludingReplaced(final String partitionStr)
   ```
   and then union the outputs for multiple partition paths. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -735,6 +735,34 @@ public final Stream<HoodieFileGroup> getAllFileGroups(String partitionStr) {
     return getAllFileGroupsIncludingReplaced(partitionStr).filter(fg -> !isFileGroupReplaced(fg));
   }
 
+  @Override
+  public final Stream<Pair<String, List<HoodieFileGroup>>> getAllFileGroups(List<String> partitionStr) {
+    return getAllFileGroupsIncludingReplaced(partitionStr)

Review Comment:
   same here. lets try to see if we can re-use methods and avoid code dedup.



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -110,9 +112,15 @@ HoodieCleanerPlan requestClean(HoodieEngineContext context) {
       context.setJobStatus(this.getClass().getSimpleName(), "Generating list of file slices to be cleaned: " + config.getTableName());
 
       Map<String, Pair<Boolean, List<CleanFileInfo>>> cleanOpsWithPartitionMeta = context
-          .map(partitionsToClean, partitionPathToClean -> Pair.of(partitionPathToClean, planner.getDeletePaths(partitionPathToClean)), cleanerParallelism)
+          .parallelize(partitionsToClean, cleanerParallelism)
+          .mapPartitions((Iterator<String> it) -> {
+            List<String> list = new ArrayList<>();
+            it.forEachRemaining(list::add);
+            Map<String, Pair<Boolean, List<CleanFileInfo>>> res = planner.getDeletePaths(list);
+            return res.entrySet().iterator();
+          }, false).collectAsList()
           .stream()
-          .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
+          .collect(Collectors.toMap(it -> it.getKey(), it -> it.getValue()));

Review Comment:
   why this change ? we can leave it as Pair::getKey and Pair::getValue .



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -233,43 +235,47 @@ private Pair<Boolean, List<CleanFileInfo>> getFilesToCleanKeepingLatestVersions(
 
     // In this scenario, we will assume that once replaced a file group automatically becomes eligible for cleaning completely
     // In other words, the file versions only apply to the active file groups.
-    deletePaths.addAll(getReplacedFilesEligibleToClean(savepointedFiles, partitionPath, Option.empty()));
-    boolean toDeletePartition = false;
-    List<HoodieFileGroup> fileGroups = fileSystemView.getAllFileGroups(partitionPath).collect(Collectors.toList());
-    for (HoodieFileGroup fileGroup : fileGroups) {
-      int keepVersions = config.getCleanerFileVersionsRetained();
-      // do not cleanup slice required for pending compaction
-      Iterator<FileSlice> fileSliceIterator =
-          fileGroup.getAllFileSlices().filter(fs -> !isFileSliceNeededForPendingCompaction(fs)).iterator();
-      if (isFileGroupInPendingCompaction(fileGroup)) {
-        // We have already saved the last version of file-groups for pending compaction Id
-        keepVersions--;
-      }
+    List<Pair<String, List<HoodieFileGroup>>> fileGroups = fileSystemView.getAllFileGroups(partitionPaths).collect(Collectors.toList());
+    for (Pair<String, List<HoodieFileGroup>> pairFileGroup : fileGroups) {
+
+      deletePaths.addAll(getReplacedFilesEligibleToClean(savepointedFiles, pairFileGroup.getLeft(), Option.empty()));

Review Comment:
   guess this is the actual change right in this class? i.e. moving getReplacedFilesEligibleToClean() from outside for loop to within. 



##########
hudi-common/src/main/java/org/apache/hudi/common/table/view/AbstractTableFileSystemView.java:
##########
@@ -916,6 +944,8 @@ protected abstract Option<Pair<String, CompactionOperation>> getPendingCompactio
    */
   abstract Stream<HoodieFileGroup> fetchAllStoredFileGroups(String partitionPath);
 
+  abstract Stream<Pair<String, List<HoodieFileGroup>>> fetchAllStoredFileGroups(List<String> partitionPath);

Review Comment:
   we can probably avoid some of these additional methods if above suggestion is followed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1251769268

   have pushed out a commit addressing feedback. we should be good to land once CI succeeds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1249930760

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * c292838205bb8eb57c529808c6b6da98635ac17d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137) 
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 11ba7cd991ca83773aae03b1fd7271364079be21 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1253146271

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 675221955f01a2a4fdc138af346fc78a2d11a41b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513) 
   * 2cd893d322d281e59e40a062108931d60646d9d3 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1253258030

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543",
       "triggerID" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 2cd893d322d281e59e40a062108931d60646d9d3 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan merged pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
nsivabalan merged PR #6580:
URL: https://github.com/apache/hudi/pull/6580


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
parisni commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1238713329

   > @parisni could you follow the process of filing and claiming a JIRA ticket for this PR?
   
   done


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on a diff in pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
parisni commented on code in PR #6580:
URL: https://github.com/apache/hudi/pull/6580#discussion_r973441831


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -110,9 +112,15 @@ HoodieCleanerPlan requestClean(HoodieEngineContext context) {
       context.setJobStatus(this.getClass().getSimpleName(), "Generating list of file slices to be cleaned: " + config.getTableName());
 
       Map<String, Pair<Boolean, List<CleanFileInfo>>> cleanOpsWithPartitionMeta = context
-          .map(partitionsToClean, partitionPathToClean -> Pair.of(partitionPathToClean, planner.getDeletePaths(partitionPathToClean)), cleanerParallelism)
+          .parallelize(partitionsToClean, cleanerParallelism)
+          .mapPartitions((Iterator<String> it) -> {
+            List<String> list = new ArrayList<>();
+            it.forEachRemaining(list::add);
+            Map<String, Pair<Boolean, List<CleanFileInfo>>> res = planner.getDeletePaths(list);
+            return res.entrySet().iterator();
+          }, false).collectAsList()
           .stream()
-          .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
+          .collect(Collectors.toMap(it -> it.getKey(), it -> it.getValue()));

Review Comment:
   Then I go for `Entry::getX`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1249932711

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * c292838205bb8eb57c529808c6b6da98635ac17d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137) 
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 11ba7cd991ca83773aae03b1fd7271364079be21 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1236224929

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * c292838205bb8eb57c529808c6b6da98635ac17d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1236213459

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
parisni commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1249917468

   @nsivabalan I applied your review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on a diff in pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on code in PR #6580:
URL: https://github.com/apache/hudi/pull/6580#discussion_r974834548


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -290,9 +296,10 @@ private Pair<Boolean, List<CleanFileInfo>> getFilesToCleanKeepingLatestCommits(S
    * @return A {@link Pair} whose left is boolean indicating whether partition itself needs to be deleted,
    *         and right is a list of {@link CleanFileInfo} about the files in the partition that needs to be deleted.
    */
-  private Pair<Boolean, List<CleanFileInfo>> getFilesToCleanKeepingLatestCommits(String partitionPath, int commitsRetained, HoodieCleaningPolicy policy) {
+  private Map<String, Pair<Boolean, List<CleanFileInfo>>> getFilesToCleanKeepingLatestCommits(List<String> partitionPath, int commitsRetained, HoodieCleaningPolicy policy) {
     LOG.info("Cleaning " + partitionPath + ", retaining latest " + commitsRetained + " commits. ");
     List<CleanFileInfo> deletePaths = new ArrayList<>();
+    Map<String, Pair<Boolean, List<CleanFileInfo>>> map = new HashMap<>();

Review Comment:
   minor. `map` -> `cleanFileInfoPerPartitionMap`



##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanner.java:
##########
@@ -290,9 +296,10 @@ private Pair<Boolean, List<CleanFileInfo>> getFilesToCleanKeepingLatestCommits(S
    * @return A {@link Pair} whose left is boolean indicating whether partition itself needs to be deleted,
    *         and right is a list of {@link CleanFileInfo} about the files in the partition that needs to be deleted.
    */
-  private Pair<Boolean, List<CleanFileInfo>> getFilesToCleanKeepingLatestCommits(String partitionPath, int commitsRetained, HoodieCleaningPolicy policy) {
+  private Map<String, Pair<Boolean, List<CleanFileInfo>>> getFilesToCleanKeepingLatestCommits(List<String> partitionPath, int commitsRetained, HoodieCleaningPolicy policy) {

Review Comment:
   minor. lets name the argument as plural. `partitionPath` -> `partitionPaths`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
parisni commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1254745325

   @nsivabalan thanks,I was off these days sorry


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1236213909

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * c292838205bb8eb57c529808c6b6da98635ac17d UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1249906693

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * c292838205bb8eb57c529808c6b6da98635ac17d Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137) 
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1251788754

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 11ba7cd991ca83773aae03b1fd7271364079be21 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435) 
   * 675221955f01a2a4fdc138af346fc78a2d11a41b UNKNOWN
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1254090674

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543",
       "triggerID" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "triggerType" : "PUSH"
     }, {
       "hash" : "683f35351eae9705ae3863152a368b2b8e91abde",
       "status" : "SUCCESS",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11558",
       "triggerID" : "683f35351eae9705ae3863152a368b2b8e91abde",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 683f35351eae9705ae3863152a368b2b8e91abde Azure: [SUCCESS](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11558) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1251791338

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 11ba7cd991ca83773aae03b1fd7271364079be21 Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435) 
   * 675221955f01a2a4fdc138af346fc78a2d11a41b Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1251867741

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 675221955f01a2a4fdc138af346fc78a2d11a41b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] hudi-bot commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
hudi-bot commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1253174469

   <!--
   Meta data
   {
     "version" : 1,
     "metaDataEntries" : [ {
       "hash" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1",
       "triggerType" : "PUSH"
     }, {
       "hash" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11137",
       "triggerID" : "c292838205bb8eb57c529808c6b6da98635ac17d",
       "triggerType" : "PUSH"
     }, {
       "hash" : "99451dc89547f803eb6823b2baa620096e76459e",
       "status" : "UNKNOWN",
       "url" : "TBD",
       "triggerID" : "99451dc89547f803eb6823b2baa620096e76459e",
       "triggerType" : "PUSH"
     }, {
       "hash" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "status" : "DELETED",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11435",
       "triggerID" : "11ba7cd991ca83773aae03b1fd7271364079be21",
       "triggerType" : "PUSH"
     }, {
       "hash" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "status" : "FAILURE",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513",
       "triggerID" : "675221955f01a2a4fdc138af346fc78a2d11a41b",
       "triggerType" : "PUSH"
     }, {
       "hash" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "status" : "PENDING",
       "url" : "https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543",
       "triggerID" : "2cd893d322d281e59e40a062108931d60646d9d3",
       "triggerType" : "PUSH"
     } ]
   }-->
   ## CI report:
   
   * ff98ae0dda69ee611e4814fbae9c8ddc0a93a4f1 UNKNOWN
   * 99451dc89547f803eb6823b2baa620096e76459e UNKNOWN
   * 675221955f01a2a4fdc138af346fc78a2d11a41b Azure: [FAILURE](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11513) 
   * 2cd893d322d281e59e40a062108931d60646d9d3 Azure: [PENDING](https://dev.azure.com/apache-hudi-ci-org/785b6ef4-2f42-4a89-8f0e-5f0d7039a0cc/_build/results?buildId=11543) 
   
   <details>
   <summary>Bot commands</summary>
     @hudi-bot supports the following commands:
   
    - `@hudi-bot run azure` re-run the last Azure build
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] nsivabalan commented on pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
nsivabalan commented on PR #6580:
URL: https://github.com/apache/hudi/pull/6580#issuecomment-1251769488

   @parisni : can u checkout the CI failures from last run https://dev.azure.com/apache-hudi-ci-org/apache-hudi-ci/_build/results?buildId=11435&view=results
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [hudi] parisni commented on a diff in pull request #6580: [HUDI-4792] Batch clean files to delete

Posted by GitBox <gi...@apache.org>.
parisni commented on code in PR #6580:
URL: https://github.com/apache/hudi/pull/6580#discussion_r973439309


##########
hudi-client/hudi-client-common/src/main/java/org/apache/hudi/table/action/clean/CleanPlanActionExecutor.java:
##########
@@ -110,9 +112,15 @@ HoodieCleanerPlan requestClean(HoodieEngineContext context) {
       context.setJobStatus(this.getClass().getSimpleName(), "Generating list of file slices to be cleaned: " + config.getTableName());
 
       Map<String, Pair<Boolean, List<CleanFileInfo>>> cleanOpsWithPartitionMeta = context
-          .map(partitionsToClean, partitionPathToClean -> Pair.of(partitionPathToClean, planner.getDeletePaths(partitionPathToClean)), cleanerParallelism)
+          .parallelize(partitionsToClean, cleanerParallelism)
+          .mapPartitions((Iterator<String> it) -> {
+            List<String> list = new ArrayList<>();
+            it.forEachRemaining(list::add);
+            Map<String, Pair<Boolean, List<CleanFileInfo>>> res = planner.getDeletePaths(list);
+            return res.entrySet().iterator();
+          }, false).collectAsList()
           .stream()
-          .collect(Collectors.toMap(Pair::getKey, Pair::getValue));
+          .collect(Collectors.toMap(it -> it.getKey(), it -> it.getValue()));

Review Comment:
   ``Pair::getX` is not applicable anymore since the stream type has changed from `Stream<Pair...>` to `Stream<Entry...>` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org