You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ajantha-bhat (via GitHub)" <gi...@apache.org> on 2023/05/10 11:38:54 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #7576: Core: Table metadata file deletion should check `gc.enabled` property

ajantha-bhat opened a new pull request, #7576:
URL: https://github.com/apache/iceberg/pull/7576

   Table metadata files are getting deleted even when the gc is disabled (when `TableProperties.METADATA_DELETE_AFTER_COMMIT_ENABLED` is set to true)
   
   This behaviour should be unified with the other Iceberg metadata file deletion (that is honouring the `gc.enabled` property). 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1552484576

   We'll I thought gc enabled was explicitly because we didn't want snapshots tables to be able to delete data files of their host table. I never considered sharing metadata files. I guess we could do that but I think that may just be asking for trouble, unlike data files I think most metadata files are relatively short lived anyway by default what with optimize metadata and merge manifests:


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat closed pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat closed pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property
URL: https://github.com/apache/iceberg/pull/7576


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1544982476

   @RussellSpitzer, @danielcweeks, @szehon-ho, @Fokko, @nastra     


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1556501251

   Thanks for the clarification. 
   I will close this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1542294135

   cc: @aokolnychyi, @rdblue, @jackye1995   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "amogh-jahagirdar (via GitHub)" <gi...@apache.org>.
amogh-jahagirdar commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1551861178

   Thanks @ajantha-bhat , this looks related to https://github.com/apache/iceberg/issues/4159
   
   So my take is that tables own their metadata files and that `gc.enabled` should not actually be part of the determination to delete the files or not. But my definition is based on `gc.enabled` really being tied to being used as a mechanism to prevent dangerous actions such as cleaning up files which are not a part of the table. 
   
   Right now `CatalogUtil` will only cleanup data files iff gc is enabled. `ExpireSnapshots` can only be run if gc is enabled and it it doesn't even delete metadata files. 
   
   You mentioned 
   
   ```
   This behaviour should be unified with the other Iceberg metadata file deletion (that is honouring the gc.enabled property)
   ```
   
   Curious where are you seeing this?
   
   Interested in knowing what others think though!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1552469489

   > But my definition is based on gc.enabled really being tied to being used as a mechanism to prevent dangerous actions such as cleaning up files which are not a part of the table.
   
   First of all there is no clarity on this table property in the docs. 
   If we consider expire snapshots action, expired metadata clean up is also controlled by gc_enabled property. So, I think table metadata clean up also should obey this property. 
   
   Deleting the table metadata files can also be a dangerous activity for some catalog like Nessie (as it can be referenced by other branches). So, I expect if the gc is disabled, no metadata or data file should get cleaned. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1552795086

   Hmm, I think this may take a while to conclude. I have raised a nessie catalog-specific PR to always disable `write.metadata.delete-after-commit.enabled` for 1.3.0 release. 
   
   https://github.com/apache/iceberg/pull/7641


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #7576: Core: Table metadata file deletion should check `gc.enabled` property

Posted by "rdblue (via GitHub)" <gi...@apache.org>.
rdblue commented on PR #7576:
URL: https://github.com/apache/iceberg/pull/7576#issuecomment-1556305688

   I don't think `gc.enabled` has anything to do with old metadata files. Those are known to be owned by the table so it is always okay to delete them. `gc.enabled` was a work-around for situations where we did not know whether data files were owned (e.g. snapshots)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org