You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by "ajantha-bhat (via GitHub)" <gi...@apache.org> on 2023/02/09 08:06:51 UTC

[GitHub] [iceberg] ajantha-bhat opened a new pull request, #6786: Spark-3.3: Support unregister table procedure

ajantha-bhat opened a new pull request, #6786:
URL: https://github.com/apache/iceberg/pull/6786

   Fixes #6785 
   
   Note: 
   Not supporting this feature for hadoop catalog as there is no concept of just removing the table entry from catalog in hadoop as hadoop catalog will always clean the files even with purge=false.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1423828785

   cc: @RussellSpitzer, @flyrain, @aokolnychyi   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1496685705

   Base on the discussion in #6785, I feel the actual use case we are trying to solve is the ability to drop a table pointer when there is no metadata file. If so, I think we should adapt our `dropTable` logic to handle such cases. We should still verify the table looks like an Iceberg table in the catalog (to prevent deleting non-Iceberg tables in the same metastore).
   
   PR #7228 is trying to handle that on the Spark side. We will need to change our Spark logic to call dropTable no matter whether the load was successful.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "jackye1995 (via GitHub)" <gi...@apache.org>.
jackye1995 commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424727896

   I would +1 on having this at catalog API level, and +1 on making this a catalog-specific implementation for drop table so user can just run `DROP TABLE` as usual, instead of having an `unregisterTable` API, I don't know how we can define `unregister` for it to make sense from catalog perspective...
   
   Why can't this just be:
   
   ```
   public boolean dropTable(TableIdentifier identifier, boolean purge) {
     try {
       // original logic
     } catch (RuntimeException e) {
       // do catalog specific removal
     }
   }
   ```
   
   Especially I think for Nessie this is just a client call, so it makes it even easier because you can just do this on server side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424278602

   > > So I think either this is a Catalog API or we just have this as part of the "drop table" behavior.
   > 
   > We already have a catalog-specific API. Which is `catalog.dropTable(identifier, purge=false)` this procedure is just calling that.
   > 
   > "Drop table" SQL also already has purge and without purge syntax. But without purge will still look up the table due to spark integration (callstack mentioned in the issue #6785). Hence, user cannot use it and need the same functionality from SQL.
   
   This is what i'm saying, if we want an api which does not treat it as an Iceberg table first I think we need that as a Catalog API. If Drop Table needs the table to be an iceberg table to drop then I think we either need a new api, or a flag like "force" which attempts to drop the table even if there is an exception.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424577897

   I'll take a look this week.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] zinking commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "zinking (via GitHub)" <gi...@apache.org>.
zinking commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1427803940

   what happens if we un-register the table and create a table with same name ? does the path gets renamed ? or just exception thrown?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424220728

   > So I think either this is a Catalog API or we just have this as part of the "drop table" behavior.
   
   We already have a catalog-specific API. Which is `catalog.dropTable(identifier, purge=false)` this procedure is just calling that.  
   
   "Drop table" SQL also already has purge and without purge syntax. 
   But without purge will still look up the table due to spark integration (callstack mentioned in the issue). Hence, user cannot use it and need the same functionality from SQL.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1512485517

   I agree that it should just be the drop table behavior 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on a diff in pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on code in PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#discussion_r1101101294


##########
core/src/main/java/org/apache/iceberg/CachingCatalog.java:
##########
@@ -125,6 +125,10 @@ private TableIdentifier canonicalizeIdentifier(TableIdentifier tableIdentifier)
     }
   }
 
+  public Catalog catalog() {

Review Comment:
   Catalog available for a call procedure can be a Caching catalog. So, need to expose the underlying catalog to check whether it is Hadoop catalog or not to block the feature for hadoop catalog.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] ajantha-bhat commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "ajantha-bhat (via GitHub)" <gi...@apache.org>.
ajantha-bhat commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424471540

   > This is what i'm saying, if we want an api which does not treat it as an Iceberg table first I think we need that as a Catalog API. If Drop Table needs the table to be an iceberg table to drop then I think we either need a new api, or a flag like "force" which attempts to drop the table even if there is an exception.
   
   When the `purge=false` for drop table API, it doesn't have to be an Iceberg table. It works just based on table identifier name. 
   For example, https://github.com/apache/iceberg/blob/master/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java#L189-L198
   
   Also, Trino also doing the same and it is using the same API that I am using here. 
   https://github.com/trinodb/trino/pull/15874/files


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "aokolnychyi (via GitHub)" <gi...@apache.org>.
aokolnychyi commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1512478527

   @ajantha-bhat @jackye1995 @RussellSpitzer, what are your thoughts regarding my proposal above?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #6786: Spark-3.3: Support unregister table procedure

Posted by "RussellSpitzer (via GitHub)" <gi...@apache.org>.
RussellSpitzer commented on PR #6786:
URL: https://github.com/apache/iceberg/pull/6786#issuecomment-1424188600

   Seems very specific. I think we talked about this before as just a part of the "drop table" command.
   
   If we are doing this method I feel like there needs to be a catalog specific api, otherwise we end up like in this procedure checking for Iceberg internal classes at runtime to determine whether the procedure can do anything.
   
   So I think either this is a Catalog API or we just have this as part of the "drop table" behavior. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org