You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/23 08:38:14 UTC

[GitHub] [iceberg] ConeyLiu commented on issue #4159: Define behavior of gc.enabled and location ownership

ConeyLiu commented on issue #4159:
URL: https://github.com/apache/iceberg/issues/4159#issuecomment-1048551526


   As @aokolnychyi suggested in [3056](https://github.com/apache/iceberg/pull/3056), we use `DeleteReachableFiles ` to purge table data which could provide much more scalability and performance. While there still some drawbacks that need to consider:
   
   1. Different catalog has a different implementation for drop table. For example, `HadoopCatalog`/`HadoopTables` delete the whole warehouse directly and ignore the purge argument. In this case, we could not use `DeleteReachableFiles`.
   2. User self catalog may have some customized features, such as sending event/metrics when purging data. With `DeleteReachableFiles` we will ignore those operations.
   
   > I think it should match the removal of reachable files and be consistent in all APIs. Once we know locations owned by the table, we may drop them too.
   
   I think this is necessary. We should unify the built-in catalog behavior of the drop table [purge]. And maybe need to define the interface to support some parallel operations (by leveraging distributed engine, such as spark/flink/more).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org