You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/09/07 12:23:41 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

ajantha-bhat opened a new pull request, #5718:
URL: https://github.com/apache/iceberg/pull/5718

   Iceberg GC functionalities like `remove_orphan_files`, `expire_snapshots`, `drop_table` (with `purge=true`) are not aware of the Nessie's branches and tags. 
   So, when a user accidentally calls these on Nessie-managed tables, these GC functionalities will clean up the files which are still referenced in other branches/tags. So, blocking all the Iceberg GC operations on Nessie-managed tables by default. 
   
   Nessie will provide engine agnostic, reference-aware GC functionalities (which doesn't call these Iceberg GC functionalities internally) with CLI support for handling the GC of expired/unreferenced files of a Nessie table.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5718:
URL: https://github.com/apache/iceberg/pull/5718#discussion_r965842955


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##########
@@ -84,6 +85,12 @@ private TableMetadata loadTableMetadata(String metadataLocation, Reference refer
         NessieUtil.tableMetadataFromIcebergTable(io(), table, metadataLocation);
     Map<String, String> newProperties = Maps.newHashMap(deserialized.properties());
     newProperties.put(NESSIE_COMMIT_ID_PROPERTY, reference.getHash());
+    // To prevent accidental deletion of files that are still referenced by other branches/tags,
+    // setting GC_ENABLED to false. So that all Iceberg's gc operations like expire_snapshots,
+    // remove_orphan_files, drop_table with purge will fail with an error.
+    // Nessie CLI will provide a reference aware GC functionality for the expired/unreferenced
+    // files.
+    newProperties.put(TableProperties.GC_ENABLED, "false");

Review Comment:
   This way we disable the GC'ing of the tables completely. Are there cases where we still want Iceberg to do the garbage collection?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on code in PR #5718:
URL: https://github.com/apache/iceberg/pull/5718#discussion_r965846493


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##########
@@ -84,6 +85,12 @@ private TableMetadata loadTableMetadata(String metadataLocation, Reference refer
         NessieUtil.tableMetadataFromIcebergTable(io(), table, metadataLocation);
     Map<String, String> newProperties = Maps.newHashMap(deserialized.properties());
     newProperties.put(NESSIE_COMMIT_ID_PROPERTY, reference.getHash());
+    // To prevent accidental deletion of files that are still referenced by other branches/tags,
+    // setting GC_ENABLED to false. So that all Iceberg's gc operations like expire_snapshots,
+    // remove_orphan_files, drop_table with purge will fail with an error.
+    // Nessie CLI will provide a reference aware GC functionality for the expired/unreferenced
+    // files.
+    newProperties.put(TableProperties.GC_ENABLED, "false");

Review Comment:
   As mentioned in the description, we can't use the Iceberg GC code on Nessie-managed tables (as they are not reference aware). Hence, completely blocking it. 
   
   Nessie will provide its own implementation for GC.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
nastra commented on PR #5718:
URL: https://github.com/apache/iceberg/pull/5718#issuecomment-1260681424

   just fyi that I removed this from the [Iceberg 1.0.0 Release](https://github.com/apache/iceberg/milestone/22) because 1.0.0 is supposed to be 0.14.1 + spotless + Deprecation cleanups only


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5718:
URL: https://github.com/apache/iceberg/pull/5718#issuecomment-1260705365

   @nastra: Let us discuss the scope of 1.0.0 in today's sync. I believe it can accommodate few issues as well like this one and https://github.com/apache/iceberg/pull/5754. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on a diff in pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
Fokko commented on code in PR #5718:
URL: https://github.com/apache/iceberg/pull/5718#discussion_r965847211


##########
nessie/src/main/java/org/apache/iceberg/nessie/NessieTableOperations.java:
##########
@@ -84,6 +85,12 @@ private TableMetadata loadTableMetadata(String metadataLocation, Reference refer
         NessieUtil.tableMetadataFromIcebergTable(io(), table, metadataLocation);
     Map<String, String> newProperties = Maps.newHashMap(deserialized.properties());
     newProperties.put(NESSIE_COMMIT_ID_PROPERTY, reference.getHash());
+    // To prevent accidental deletion of files that are still referenced by other branches/tags,
+    // setting GC_ENABLED to false. So that all Iceberg's gc operations like expire_snapshots,
+    // remove_orphan_files, drop_table with purge will fail with an error.
+    // Nessie CLI will provide a reference aware GC functionality for the expired/unreferenced
+    // files.
+    newProperties.put(TableProperties.GC_ENABLED, "false");

Review Comment:
   That makes sense, thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko merged pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
Fokko merged PR #5718:
URL: https://github.com/apache/iceberg/pull/5718


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
ajantha-bhat commented on PR #5718:
URL: https://github.com/apache/iceberg/pull/5718#issuecomment-1239450566

   Can one of the committers please help merge this PR? 
   Also please include this in `1.0 milestone` or immediate next release milestone (would like to have this fix from the next version itself)
   
   cc: @rdblue, @pvary, @Fokko, @RussellSpitzer, @snazy    
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] Fokko commented on pull request #5718: Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags

Posted by GitBox <gi...@apache.org>.
Fokko commented on PR #5718:
URL: https://github.com/apache/iceberg/pull/5718#issuecomment-1240602073

   Thanks @ajantha-bhat 🙌🏻 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org