You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/01/05 19:23:39 UTC

[GitHub] [iceberg] anuragmantri opened a new pull request #3851: API: Register existing tables in Iceberg HiveCatalog

anuragmantri opened a new pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851


   This PR allows us to register existing tables in Iceberg HiveCatalog.
   
   Right now, we can keep metadata and data files while dropping tables from Iceberg HiveCatalog. However, there is no way to register those tables back even though the metadata and data files can be still there. So, we need to extend HiveCatalog with another method that will accept a location to the valid metadata file and would register a table in HMS.
   
   This will allow us to properly support external tables in Spark.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1011354257


   Looks like the build failed. Is there a way to re-trigger the build without pushing the change? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1012264351


   Retest this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1007897477


   > @pvary - Yes, all this needs is the latest metadata file.
   
   This was my conclusion too (after I have left the comment I continued chewing on this one 😄) Then I went further, and this is where I am now:
   - What happens, if a table is modified in one of the catalogs (I suspect that this is catalog dependent as HiveCatalog will not follow the changes, but HadoopTable/HadoopCatalog might follow each others changes, as they rely on the file names) 
   - Also issues will arise if we start expiring snapshots. If one catalog decides that one of the files are not needed anymore, then it could remove the file, even if the file is still needed for the other table. 
   
   Is there a way to prevent this situation? 
   
   Thanks, Peter
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
bryanck commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r783323964



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -211,6 +214,23 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) {
     }
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(TableIdentifier identifier, String metadataFileLocation) {

Review comment:
       Sure, thanks




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1007827860


   > Can I use this to register an arbitrary table from any catalog (HadoopTable or HadoopCatalog or GlueCatalog) to the HiveCatalog?
   
   @pvary - Yes, all this needs is the latest metadata file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
bryanck commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r781287405



##########
File path: api/src/main/java/org/apache/iceberg/catalog/Catalog.java
##########
@@ -340,6 +340,18 @@ default boolean dropTable(TableIdentifier identifier) {
    */
   Table loadTable(TableIdentifier identifier);
 
+  /**
+   * Register a table with the catalog if it does not exist.

Review comment:
       LGTM, it is very straightforward, I just had one question below...




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r783309928



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -211,6 +214,23 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) {
     }
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(TableIdentifier identifier, String metadataFileLocation) {

Review comment:
       This is a good point. Potentially, other catalogs can benefit from this. However, `FileIO` is initialized with the catalog and there maybe custom implementations passed as catalog properties. I'm not sure yet on how to move this logic to `BaseMetastoreCatalog`. Do you mind if I do that as a seperate enhancement PR to this one? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1011355542


   Looks like this build errors happened in multiple PRs. I saw it #3745 as well. It could be an issue in build infra or some changes in Gradle. @rdblue 
   ```
   A problem occurred configuring root project 'iceberg'.
   31
   > Could not resolve all files for configuration ':classpath'.
   32
      > Could not resolve me.champeau.jmh:jmh-gradle-plugin:0.6.6.
   33
        Required by:
   34
            project :
   35
         > Could not resolve me.champeau.jmh:jmh-gradle-plugin:0.6.6.
   36
            > Could not get resource 'https://plugins.gradle.org/m2/me/champeau/jmh/jmh-gradle-plugin/0.6.6/jmh-gradle-plugin-0.6.6.module'.
   37
               > Could not GET 'https://jcenter.bintray.com/me/champeau/jmh/jmh-gradle-plugin/0.6.6/jmh-gradle-plugin-0.6.6.module'. Received status code 502 from server: Bad Gateway
   38
      > Could not resolve org.apache.logging.log4j:log4j-core:2.17.0.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1012366160


   Tests Passed and Merged! Thanks @anuragmantri and all reviewers!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1006360787


   Can I use this to register an arbitrary table from any catalog (HadoopTable or HadoopCatalog or GlueCatalog) to the HiveCatalog?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r780841167



##########
File path: api/src/main/java/org/apache/iceberg/catalog/Catalog.java
##########
@@ -340,6 +340,18 @@ default boolean dropTable(TableIdentifier identifier) {
    */
   Table loadTable(TableIdentifier identifier);
 
+  /**
+   * Register a table with the catalog if it does not exist.

Review comment:
       FYI @bryanck. What do you think about this?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri removed a comment on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri removed a comment on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1012264351


   Test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1012367863


   Thanks to @aokolnychyi, the original author of the PR and everyone for the review!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
flyrain commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r779761676



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -210,7 +210,8 @@ protected void doRefresh() {
   @SuppressWarnings("checkstyle:CyclomaticComplexity")
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
-    String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
+    String newMetadataLocation = base == null && metadata.metadataFileLocation() != null ?
+        metadata.metadataFileLocation() : writeNewMetadata(metadata, currentVersion() + 1);

Review comment:
       Looks like this change is not related. Can we remove it?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r780590087



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -210,7 +210,8 @@ protected void doRefresh() {
   @SuppressWarnings("checkstyle:CyclomaticComplexity")
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
-    String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
+    String newMetadataLocation = base == null && metadata.metadataFileLocation() != null ?
+        metadata.metadataFileLocation() : writeNewMetadata(metadata, currentVersion() + 1);

Review comment:
       Without this change, registering a table creates a new metadata file with a new version instead of using the version provided by the metadata file. Yes, the tests also rely on this. 

##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -211,6 +214,18 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) {
     }
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(TableIdentifier identifier, String metadataFileLocation) {
+    Preconditions.checkArgument(isValidIdentifier(identifier), "Invalid identifier: %s", identifier);
+
+    TableOperations ops = newTableOps(identifier);
+    HadoopInputFile metadataFile = HadoopInputFile.fromLocation(metadataFileLocation, conf);

Review comment:
       Thanks for catching this. Updated to use `FileIO`.

##########
File path: api/src/main/java/org/apache/iceberg/catalog/Catalog.java
##########
@@ -340,6 +340,17 @@ default boolean dropTable(TableIdentifier identifier) {
    */
   Table loadTable(TableIdentifier identifier);
 
+  /**
+   * Register a table.

Review comment:
       Updated the java doc to mention this API will register a table with the catalog if it does not exist. It throws an exception if it exists. I have also added a unit test case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
flyrain commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1011869964


   > Seems like jcenter is having issues... https://status.gradle.com
   
   The Jcenter is back online now. The build should pass once it is retriggered.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer merged pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
RussellSpitzer merged pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri edited a comment on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri edited a comment on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1012264351


   Test this please


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
flyrain commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r779763411



##########
File path: api/src/main/java/org/apache/iceberg/catalog/Catalog.java
##########
@@ -340,6 +340,17 @@ default boolean dropTable(TableIdentifier identifier) {
    */
   Table loadTable(TableIdentifier identifier);
 
+  /**
+   * Register a table.

Review comment:
       What happens if the table is already there? Do we throw an exception? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r780590087



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -210,7 +210,8 @@ protected void doRefresh() {
   @SuppressWarnings("checkstyle:CyclomaticComplexity")
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
-    String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
+    String newMetadataLocation = base == null && metadata.metadataFileLocation() != null ?
+        metadata.metadataFileLocation() : writeNewMetadata(metadata, currentVersion() + 1);

Review comment:
       Without this change, registering a table creates a new metadata file with a new version instead of using the version provided by the metadata file. Yes, the tests also rely on this. 

##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -211,6 +214,18 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) {
     }
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(TableIdentifier identifier, String metadataFileLocation) {
+    Preconditions.checkArgument(isValidIdentifier(identifier), "Invalid identifier: %s", identifier);
+
+    TableOperations ops = newTableOps(identifier);
+    HadoopInputFile metadataFile = HadoopInputFile.fromLocation(metadataFileLocation, conf);

Review comment:
       Thanks for catching this. Updated to use `FileIO`.

##########
File path: api/src/main/java/org/apache/iceberg/catalog/Catalog.java
##########
@@ -340,6 +340,17 @@ default boolean dropTable(TableIdentifier identifier) {
    */
   Table loadTable(TableIdentifier identifier);
 
+  /**
+   * Register a table.

Review comment:
       Updated the java doc to mention this API will register a table with the catalog if it does not exist. It throws an exception if it exists. I have also added a unit test case.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
flyrain commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r779759010



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -211,6 +214,18 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) {
     }
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(TableIdentifier identifier, String metadataFileLocation) {
+    Preconditions.checkArgument(isValidIdentifier(identifier), "Invalid identifier: %s", identifier);
+
+    TableOperations ops = newTableOps(identifier);
+    HadoopInputFile metadataFile = HadoopInputFile.fromLocation(metadataFileLocation, conf);

Review comment:
       Agree with Szehon. `fileIO` should be used here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r779319373



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java
##########
@@ -370,6 +371,85 @@ public void testNonDefaultDatabaseLocation() throws IOException, TException {
     metastoreClient.dropDatabase(NON_DEFAULT_DATABASE, true, true, true);
   }
 
+  @Test
+  public void testRegisterTable() throws TException {
+    org.apache.hadoop.hive.metastore.api.Table originalTable = metastoreClient.getTable(DB_NAME, TABLE_NAME);
+
+    Map<String, String> originalParams = originalTable.getParameters();
+    Assert.assertNotNull(originalParams);
+    Assert.assertTrue(ICEBERG_TABLE_TYPE_VALUE.equalsIgnoreCase(originalParams.get(TABLE_TYPE_PROP)));
+    Assert.assertTrue("EXTERNAL_TABLE".equalsIgnoreCase(originalTable.getTableType()));
+
+    catalog.dropTable(TABLE_IDENTIFIER, false);
+    Assert.assertFalse(catalog.tableExists(TABLE_IDENTIFIER));
+
+    List<String> metadataVersionFiles = metadataVersionFiles(TABLE_NAME);
+    Assert.assertEquals(1, metadataVersionFiles.size());
+
+    catalog.registerTable(TABLE_IDENTIFIER, "file:" + metadataVersionFiles.get(0));
+
+    org.apache.hadoop.hive.metastore.api.Table newTable = metastoreClient.getTable(DB_NAME, TABLE_NAME);
+
+    Map<String, String> newTableParameters = newTable.getParameters();
+    Assert.assertNull(newTableParameters.get(PREVIOUS_METADATA_LOCATION_PROP));
+    Assert.assertEquals(originalParams.get(TABLE_TYPE_PROP), newTableParameters.get(TABLE_TYPE_PROP));
+    Assert.assertEquals(originalParams.get(METADATA_LOCATION_PROP), newTableParameters.get(METADATA_LOCATION_PROP));
+    Assert.assertEquals(originalTable.getSd(), newTable.getSd());
+  }
+
+  @Test
+  public void testCloneTable() throws IOException {

Review comment:
       Agreed :) Removed this test.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1007827860


   > Can I use this to register an arbitrary table from any catalog (HadoopTable or HadoopCatalog or GlueCatalog) to the HiveCatalog?
   
   @pvary - Yes, all this needs is the latest metadata file.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1011341875


   This looks good to me when tests are passing.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
bryanck commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1011418913


   Seems like jcenter is having issues... https://status.gradle.com


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] flyrain commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
flyrain commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r779762268



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -210,7 +210,8 @@ protected void doRefresh() {
   @SuppressWarnings("checkstyle:CyclomaticComplexity")
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
-    String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
+    String newMetadataLocation = base == null && metadata.metadataFileLocation() != null ?
+        metadata.metadataFileLocation() : writeNewMetadata(metadata, currentVersion() + 1);

Review comment:
       Is it used by the new test?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r779731029



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -211,6 +214,18 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) {
     }
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(TableIdentifier identifier, String metadataFileLocation) {
+    Preconditions.checkArgument(isValidIdentifier(identifier), "Invalid identifier: %s", identifier);
+
+    TableOperations ops = newTableOps(identifier);
+    HadoopInputFile metadataFile = HadoopInputFile.fromLocation(metadataFileLocation, conf);

Review comment:
       Shouldn't we use FileIO to get the InputFile instead of hardcoding HadoopFileIO (in case we are using S3FileIO, for example?)

##########
File path: api/src/main/java/org/apache/iceberg/catalog/Catalog.java
##########
@@ -340,6 +340,17 @@ default boolean dropTable(TableIdentifier identifier) {
    */
   Table loadTable(TableIdentifier identifier);
 
+  /**
+   * Register a table.

Review comment:
       Seems from the test, the table is not there and it will create one from the file.  It wasn't too obvious from the name of the API, can we enhance the javadoc a bit to detail that?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r779218451



##########
File path: hive-metastore/src/test/java/org/apache/iceberg/hive/HiveTableTest.java
##########
@@ -370,6 +371,85 @@ public void testNonDefaultDatabaseLocation() throws IOException, TException {
     metastoreClient.dropDatabase(NON_DEFAULT_DATABASE, true, true, true);
   }
 
+  @Test
+  public void testRegisterTable() throws TException {
+    org.apache.hadoop.hive.metastore.api.Table originalTable = metastoreClient.getTable(DB_NAME, TABLE_NAME);
+
+    Map<String, String> originalParams = originalTable.getParameters();
+    Assert.assertNotNull(originalParams);
+    Assert.assertTrue(ICEBERG_TABLE_TYPE_VALUE.equalsIgnoreCase(originalParams.get(TABLE_TYPE_PROP)));
+    Assert.assertTrue("EXTERNAL_TABLE".equalsIgnoreCase(originalTable.getTableType()));
+
+    catalog.dropTable(TABLE_IDENTIFIER, false);
+    Assert.assertFalse(catalog.tableExists(TABLE_IDENTIFIER));
+
+    List<String> metadataVersionFiles = metadataVersionFiles(TABLE_NAME);
+    Assert.assertEquals(1, metadataVersionFiles.size());
+
+    catalog.registerTable(TABLE_IDENTIFIER, "file:" + metadataVersionFiles.get(0));
+
+    org.apache.hadoop.hive.metastore.api.Table newTable = metastoreClient.getTable(DB_NAME, TABLE_NAME);
+
+    Map<String, String> newTableParameters = newTable.getParameters();
+    Assert.assertNull(newTableParameters.get(PREVIOUS_METADATA_LOCATION_PROP));
+    Assert.assertEquals(originalParams.get(TABLE_TYPE_PROP), newTableParameters.get(TABLE_TYPE_PROP));
+    Assert.assertEquals(originalParams.get(METADATA_LOCATION_PROP), newTableParameters.get(METADATA_LOCATION_PROP));
+    Assert.assertEquals(originalTable.getSd(), newTable.getSd());
+  }
+
+  @Test
+  public void testCloneTable() throws IOException {

Review comment:
       I would remove this test just because I don't want to give anyone ideas :)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri edited a comment on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri edited a comment on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1011354257


   Looks like the build failed. Is there a way to re-trigger the build without pushing a new change? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] bryanck commented on a change in pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
bryanck commented on a change in pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#discussion_r781286974



##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalog.java
##########
@@ -211,6 +214,23 @@ public void renameTable(TableIdentifier from, TableIdentifier originalTo) {
     }
   }
 
+  @Override
+  public org.apache.iceberg.Table registerTable(TableIdentifier identifier, String metadataFileLocation) {

Review comment:
       Could we put this in BaseMetastoreCatalog? It doesn't look like there is anything Hive-specific here, so other catalog implementations could potentially benefit.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] anuragmantri commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
anuragmantri commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1011295032


   > * What happens, if a table is modified in one of the catalogs (I suspect that this is catalog dependent as HiveCatalog will not follow the changes, but HadoopTable/HadoopCatalog might follow each others changes, as they rely on the file names)
   > * Also issues will arise if we start expiring snapshots. If one catalog decides that one of the files are not needed anymore, then it could remove the file, even if the file is still needed for the other table.
   > 
   
   Thanks for the questions @pvary. I haven't really thought about concurrent access to the tables from multiple catalogs. It did come up a little bit in our [DR discussion](https://lists.apache.org/thread/8cg14cvs0hxybpvz20olr78ysm7xrtw2) in the dev list. I'm not sure this should be allowed. 
   
   @aokolnychyi - What are your thoughts on this? 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #3851: API: Register existing tables in Iceberg HiveCatalog

Posted by GitBox <gi...@apache.org>.
pvary commented on pull request #3851:
URL: https://github.com/apache/iceberg/pull/3851#issuecomment-1007897477


   > @pvary - Yes, all this needs is the latest metadata file.
   
   This was my conclusion too (after I have left the comment I continued chewing on this one 😄) Then I went further, and this is where I am now:
   - What happens, if a table is modified in one of the catalogs (I suspect that this is catalog dependent as HiveCatalog will not follow the changes, but HadoopTable/HadoopCatalog might follow each others changes, as they rely on the file names) 
   - Also issues will arise if we start expiring snapshots. If one catalog decides that one of the files are not needed anymore, then it could remove the file, even if the file is still needed for the other table. 
   
   Is there a way to prevent this situation? 
   
   Thanks, Peter
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org