You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "dramaticlly (via GitHub)" <gi...@apache.org> on 2023/02/02 01:05:01 UTC

[GitHub] [iceberg] dramaticlly commented on a diff in pull request #6682: Bulk delete

dramaticlly commented on code in PR #6682:
URL: https://github.com/apache/iceberg/pull/6682#discussion_r1093902105


##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/BaseSparkAction.java:
##########
@@ -253,6 +257,39 @@ protected DeleteSummary deleteFiles(
     return summary;
   }
 
+  protected DeleteSummary deleteFiles(

Review Comment:
   I am wondering if we shall have some sort of tests to cover this, where the `bulkDeleteFunc` can add files to a separate set instead of actual delete, to see we are properly group by its type and verify the DeleteSummary results?



##########
spark/v3.3/spark/src/main/java/org/apache/iceberg/spark/actions/DeleteOrphanFilesSparkAction.java:
##########
@@ -246,12 +268,22 @@ private DeleteOrphanFiles.Result doExecute() {
     List<String> orphanFiles =
         findOrphanFiles(spark(), actualFileIdentDS, validFileIdentDS, prefixMismatchMode);
 
-    Tasks.foreach(orphanFiles)
-        .noRetry()
-        .executeWith(deleteExecutorService)
-        .suppressFailureWhenFinished()
-        .onFailure((file, exc) -> LOG.warn("Failed to delete file: {}", file, exc))
-        .run(deleteFunc::accept);
+    if (table.io() instanceof SupportsBulkOperations) {
+      try {
+        bulkDeleteFunc.accept(orphanFiles);
+      } catch (BulkDeletionFailureException bulkDeletionFailureException) {
+        LOG.warn(
+            "Bulk delete failed to remove {} files",
+            bulkDeletionFailureException.numberFailedObjects());
+      }
+    } else {

Review Comment:
   I was think would it help to leave a message stating that bulk deletion is not supported on current IO to encourage for further check? Similar to how we add `Falling back to non-bulk deletes because the given io {} does not support bulk deletes.` for expire snapshots, I feel orphan deletion might benefit from it as well



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org