You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/08/12 05:39:00 UTC

[GitHub] [iceberg] aokolnychyi commented on a diff in pull request #5459: Core: Use Bulk Delete when dropping table data and metadata

aokolnychyi commented on code in PR #5459:
URL: https://github.com/apache/iceberg/pull/5459#discussion_r944126591


##########
core/src/main/java/org/apache/iceberg/CatalogUtil.java:
##########
@@ -151,20 +175,34 @@ private static void deleteFiles(FileIO io, Set<ManifestFile> allManifests) {
         .run(
             manifest -> {
               try (ManifestReader<?> reader = ManifestFiles.open(manifest, io)) {
+                List<String> pathsToDelete = Lists.newArrayList();
                 for (ManifestEntry<?> entry : reader.entries()) {
                   // intern the file path because the weak key map uses identity (==) instead of
                   // equals
                   String path = entry.file().path().toString().intern();
                   Boolean alreadyDeleted = deletedFiles.putIfAbsent(path, true);
                   if (alreadyDeleted == null || !alreadyDeleted) {
                     try {
-                      io.deleteFile(path);
+                      if (io instanceof SupportsBulkOperations) {

Review Comment:
   What about collecting files into a list in all cases and then checking what type of `FileIO` we got? Hopefully, it should be a little bit more readable and will avoid the instanceof call for each manifest entry, which is expensive.
   
   ```
   if (io instanceof SupportsBulkOperations) {
     SupportsBulkOperations bulkIO = (SupportsBulkOperations) io;
     bulkDelete(bulkIO, pathsToDelete, "files");
   } else {
     for (String path : pathsToDelete) {
       delete(io, path);
     }
   }
   ```



##########
core/src/main/java/org/apache/iceberg/CatalogUtil.java:
##########
@@ -102,6 +106,26 @@ public static void dropTableData(FileIO io, TableMetadata metadata) {
       deleteFiles(io, manifestsToDelete);
     }
 
+    if (io instanceof SupportsBulkOperations) {

Review Comment:
   I feel like there are three types of deletes. What about defining helper methods for them that would abstract some common logic like try catch? The type annotation is optional, it can probably be deduced from the file name.
   
   ```
   private static void delete(FileIO io, String file, String type) {
     try {
       io.deleteFile(file);
     } catch (RuntimeException e) {
       LOG.warn("Failed to delete {}: {}", type, file, e);
     }
   }
   
   private static void delete(FileIO io, Iterable<String> files, String type) {
     Tasks.foreach(files)
         .executeWith(ThreadPools.getWorkerPool())
         .noRetry()
         .suppressFailureWhenFinished()
         .onFailure((file, exc) -> LOG.warn("Delete failed for {}: {}", type, file, exc))
         .run(io::deleteFile);
   }
   
   private static void bulkDelete(SupportsBulkOperations io, Iterable<String> files, String type) {
     try {
       io.deleteFiles(files);
     } catch (RuntimeException e) {
       LOG.warn("Failed to bulk delete {}", type, e);
     }
   }
   ```



##########
core/src/main/java/org/apache/iceberg/CatalogUtil.java:
##########
@@ -102,6 +106,26 @@ public static void dropTableData(FileIO io, TableMetadata metadata) {
       deleteFiles(io, manifestsToDelete);
     }
 
+    if (io instanceof SupportsBulkOperations) {
+      SupportsBulkOperations bulkFileIO = (SupportsBulkOperations) io;
+      Iterable<String> manifestPaths = Iterables.transform(manifestsToDelete, ManifestFile::path);
+      Iterable<String> prevMetadataFiles =
+          Iterables.transform(metadata.previousFiles(), TableMetadata.MetadataLogEntry::file);
+      try {
+        Iterable<String> filesToDelete =
+            Iterables.concat(
+                manifestPaths,
+                manifestListsToDelete,
+                prevMetadataFiles,
+                ImmutableList.of(metadata.metadataFileLocation()));
+        bulkFileIO.deleteFiles(filesToDelete);

Review Comment:
   I feel like we should either follow what we do for non-bulk deletes and log an error message per each type or adapt the code below for non-bulk deletes to delete all files in one go.
   
   ```
   if (io instanceof SupportsBulkOperations) {
     SupportsBulkOperations bulkIO = (SupportsBulkOperations) io;
     bulkDelete(bulkIO, manifests, "manifests");
     bulkDelete(bulkIO, manifestLists, "manifest lists");
     bulkDelete(bulkIO, metadataFiles, "metadata files");
   
   } else {
     delete(io, manifests, "manifest");
     delete(io, manifestLists, "manifest list");
     delete(io, metadataFiles, "metadata file");
   }
   ```
   
   Or 
   
   ```
   if (io instanceof SupportsBulkOperations) {
     SupportsBulkOperations bulkIO = (SupportsBulkOperations) io;
     bulkDelete(bulkIO, files);
   } else {
     delete(io, files);
   }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org