You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/01/26 16:35:17 UTC

[GitHub] [incubator-iceberg] waterlx opened a new issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()

waterlx opened a new issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()
URL: https://github.com/apache/incubator-iceberg/issues/751
 
 
   I would like to propose to modify Catalog.dropTable() to add an option to decide whether to delete data files when dropping the table, like:
   ```
     /**
      * Drop a table; optionally delete data and metadata files.
      * <p>
      * If purge is set to true the implementation should delete all data and metadata files.
      *
      * @param identifier a table identifier
      * @param purge if true, delete all data and metadata files in the table
      * @param deleteData ignored if purge is false; If purge is true and deleteData is true, delete all data and metadata files in the table; If purge is true and deleteData is false, delete all metadata files while keep all data files;
      * @return true if the table was dropped, false if the table did not exist
      */
     boolean dropTable(TableIdentifier identifier, boolean purge, boolean deleteData);
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] waterlx commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()

Posted by GitBox <gi...@apache.org>.
waterlx commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()
URL: https://github.com/apache/incubator-iceberg/issues/751#issuecomment-585286984
 
 
   @rdblue got your idea. Thanks!
   I might need to check with you about the expected behavior of dropTable, especially when purge = false.
   Catalog # dropTable() defines the behavior to handle metadata and data when purge = true explicity, as 
   ```
   * @param purge if true, delete all metadata and data files in the table
   ```
   But what about purge = false? 
   (1) keep both data and metadata
   or
   (2) keep data and delete metadata
   
   (1) is the current behavior of Iceberg. But is it the expected behavior? I might think that (2) is the correct behavior: When I am calling dropTable,  metadata is deleted for sure, while purge is used to determine if data files are deleted or not.
   
   Could you please share your idea about that? Thanks!
   I am trying to check the behavior of Spark or Hive regarding dropTable with purge = false.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()
URL: https://github.com/apache/incubator-iceberg/issues/751#issuecomment-581184344
 
 
   I think you're saying you want a purge option that deletes metadata, but not data?
   
   I'm very reluctant to add cases like this to the API. We want to keep the API small, and I think this may be specific to your use case. What about doing this in the application that creates the staging tables instead? You'd set purge=false when dropping the table and then clean up the metadata using your own code. Would that work?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] waterlx commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()

Posted by GitBox <gi...@apache.org>.
waterlx commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()
URL: https://github.com/apache/incubator-iceberg/issues/751#issuecomment-585618846
 
 
   @rdblue got your idea. Thanks!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] rdblue commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()

Posted by GitBox <gi...@apache.org>.
rdblue commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()
URL: https://github.com/apache/incubator-iceberg/issues/751#issuecomment-585359978
 
 
   I think that keeping both data and metadata is the correct behavior. That way, drop removes a reference to the table, but the table itself is still readable should you wish to recover it. If Iceberg were to drop metadata, then it would be extremely difficult to actually use the data files that are left if you wanted to recover the table.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [incubator-iceberg] waterlx commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()

Posted by GitBox <gi...@apache.org>.
waterlx commented on issue #751: Add an option to decide whether to delete data files in Catalog.dropTable()
URL: https://github.com/apache/incubator-iceberg/issues/751#issuecomment-578521150
 
 
   @rdblue does it make any sense to you?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org