You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/12/27 17:34:36 UTC

[GitHub] [iceberg] krvikash opened a new pull request, #6499: Fix BaseMetastoreTableOperations when create new table fails

krvikash opened a new pull request, #6499:
URL: https://github.com/apache/iceberg/pull/6499

   **Issue:**
   `checkCommitStatus` is not checking the NULL value for metadata. This will cause NPE  when the commit fails for table operations while creating a new table with RuntimeException and the call goes to check the commit status via `checkCommitStatus`. Since the table might not be created then `refresh()` will return NULL.
   
   This PR fixes the issue when creating a new table fails to commit with RuntimeException.
   
   I am adding missing test cases as well for commit failure during table creation.
   
   Though I have a couple of questions.
   1. Should we clean up the metadata file in case of createTable commit failure even if `commitStatus` is `UNKNOWN`?
   2. Should we add test cases for every table operation? For now, test cases are added for Hive and Glue table operations.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] krvikash commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "krvikash (via GitHub)" <gi...@apache.org>.
krvikash commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1114945225


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -320,6 +326,57 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitThrowsUnknownException()
+      throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations hiveTableOperations =
+        (HiveTableOperations) catalog.newTableOps(tableIdentifier);
+
+    String tableLocation = catalog.defaultWarehouseLocation(tableIdentifier);
+    TableMetadata metadataV1 = createTableMetadata(tableLocation);
+
+    HiveTableOperations spyOps = spy(hiveTableOperations);
+    failCommitAndThrowException(spyOps);
+
+    Assertions.assertThatThrownBy(() -> spyOps.commit(null, metadataV1))
+        .isInstanceOf(RuntimeException.class)
+        .hasMessageContaining("Datacenter on fire");
+
+    Assert.assertEquals(
+        "One metadata files should exist",
+        1,
+        metadataFileCount(tableLocation + "/writeMetaDataLoc"));

Review Comment:
   Also, note that `TableMetadata#metadataFileLocation` will not be set with the new metadata file location until the table is created. In this PR case, the table will not be created as it fails during `commit`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by GitBox <gi...@apache.org>.
pvary commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1058404920


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -307,6 +312,61 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitFailure() throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations glueTableOperations =

Review Comment:
   Copy-paste? 😄 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] krvikash commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by GitBox <gi...@apache.org>.
krvikash commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1069695013


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -307,6 +312,61 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitFailure() throws TException, InterruptedException {

Review Comment:
   Thanks, @nastra for the review.
   
   Yes, you are right `testCreateTableCommitFailure` will not reach `checkCommitStatus `. I have not seen any test case which covers create table failure which is why I have added this test case to verify the create table failure where we are not required to check the commit status. 
   
   Now I have separated this change into a new commit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1114547602


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -320,6 +326,57 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitThrowsUnknownException()
+      throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations hiveTableOperations =
+        (HiveTableOperations) catalog.newTableOps(tableIdentifier);
+
+    String tableLocation = catalog.defaultWarehouseLocation(tableIdentifier);
+    TableMetadata metadataV1 = createTableMetadata(tableLocation);
+
+    HiveTableOperations spyOps = spy(hiveTableOperations);
+    failCommitAndThrowException(spyOps);
+
+    Assertions.assertThatThrownBy(() -> spyOps.commit(null, metadataV1))
+        .isInstanceOf(RuntimeException.class)
+        .hasMessageContaining("Datacenter on fire");
+
+    Assert.assertEquals(
+        "One metadata files should exist",
+        1,
+        metadataFileCount(tableLocation + "/writeMetaDataLoc"));

Review Comment:
   Could we just test the default Hive locations instead of changing the default behaviour and  a specifying the locations by table properties?
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1168205368


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -320,6 +326,57 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitThrowsUnknownException()
+      throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations hiveTableOperations =
+        (HiveTableOperations) catalog.newTableOps(tableIdentifier);
+
+    String tableLocation = catalog.defaultWarehouseLocation(tableIdentifier);
+    TableMetadata metadataV1 = createTableMetadata(tableLocation);
+
+    HiveTableOperations spyOps = spy(hiveTableOperations);
+    failCommitAndThrowException(spyOps);
+
+    Assertions.assertThatThrownBy(() -> spyOps.commit(null, metadataV1))
+        .isInstanceOf(RuntimeException.class)
+        .hasMessageContaining("Datacenter on fire");
+
+    Assert.assertEquals(
+        "One metadata files should exist",
+        1,
+        metadataFileCount(tableLocation + "/writeMetaDataLoc"));

Review Comment:
   Sorry for the late answer!
   Could we use assumptions about the table location instead in the tests? I am afraid that the default behavior is different from the behavior when the location is set, and with this tests we test only the second scenario. OTOH creating a Table without specifically setting the location is the way we use Iceberg in Hive. So it would be good to test that case as well.
   
   Thanks again for taking care of this!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by GitBox <gi...@apache.org>.
pvary commented on PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#issuecomment-1366719073

   Did some cursory review and left a small comment.
   
   When the commitStatus is UNKNOWN then we sign the user that the code was not able to decide how could it fix the state, and expects the user to fix the table. In this case we keep everything and expect the user to cleanup the state.
   
   If we can identify that the issue is with the access rights, and we can identify and fix the table, then we might throw a different exception and fix the table instead.
   
   I am on PTO till January. If the PR is still open, please ping me to do a full review.
   
   Thanks, Peter 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] krvikash commented on pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "krvikash (via GitHub)" <gi...@apache.org>.
krvikash commented on PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#issuecomment-1508131953

   Hi @pvary | @nastra, When you get time, could you please review this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] krvikash commented on pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "krvikash (via GitHub)" <gi...@apache.org>.
krvikash commented on PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#issuecomment-1422081585

   Rebased and resolved conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] krvikash commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by GitBox <gi...@apache.org>.
krvikash commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1058444447


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -307,6 +312,61 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitFailure() throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations glueTableOperations =

Review Comment:
   Thanks for catching this :) Modified the variable name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] pvary commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "pvary (via GitHub)" <gi...@apache.org>.
pvary commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1114552436


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -320,6 +326,57 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitThrowsUnknownException()
+      throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations hiveTableOperations =
+        (HiveTableOperations) catalog.newTableOps(tableIdentifier);
+
+    String tableLocation = catalog.defaultWarehouseLocation(tableIdentifier);
+    TableMetadata metadataV1 = createTableMetadata(tableLocation);
+
+    HiveTableOperations spyOps = spy(hiveTableOperations);
+    failCommitAndThrowException(spyOps);
+
+    Assertions.assertThatThrownBy(() -> spyOps.commit(null, metadataV1))
+        .isInstanceOf(RuntimeException.class)
+        .hasMessageContaining("Datacenter on fire");
+
+    Assert.assertEquals(
+        "One metadata files should exist",
+        1,
+        metadataFileCount(tableLocation + "/writeMetaDataLoc"));

Review Comment:
   Why can not we use the `metadataV1` and `metadataFileCount(metadataV1)` to get the file number?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] nastra commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by GitBox <gi...@apache.org>.
nastra commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1069502841


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -307,6 +312,61 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitFailure() throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations hiveTableOperations =
+        (HiveTableOperations) catalog.newTableOps(tableIdentifier);
+
+    String tableLocation = catalog.defaultWarehouseLocation(tableIdentifier);
+    TableMetadata metadataV1 = createTableMetadata(tableLocation);
+
+    HiveTableOperations spyOps = spy(hiveTableOperations);
+    failCommitAndThrowException(spyOps, new InvalidObjectException());
+
+    AssertHelpers.assertThrows(
+        "Should throw invalid object exception",
+        ValidationException.class,

Review Comment:
   it would be better to use `Assertions.assertThatThrownBy(...)` here, similar to https://github.com/apache/iceberg/blob/223177faf955bd8f11864477da16013cf5d7cc75/api/src/test/java/org/apache/iceberg/transforms/TestResiduals.java#L147-L149



##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -307,6 +312,61 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitFailure() throws TException, InterruptedException {

Review Comment:
   both of these tests actually never reach the point where `checkCommitStatus` is being called, meaning that they will pass with/without the null check that has been added



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] krvikash commented on pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "krvikash (via GitHub)" <gi...@apache.org>.
krvikash commented on PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#issuecomment-1439085830

   Hi @pvary | @nastra, When you get time could you please review it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] krvikash commented on a diff in pull request #6499: AWS, Core, Hive: Fix `checkCommitStatus` when create table commit fails

Posted by "krvikash (via GitHub)" <gi...@apache.org>.
krvikash commented on code in PR #6499:
URL: https://github.com/apache/iceberg/pull/6499#discussion_r1114899052


##########
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveCommits.java:
##########
@@ -320,6 +326,57 @@ public void testAlreadyExistsException() {
         () -> catalog.createTable(TABLE_IDENTIFIER, schema, PartitionSpec.unpartitioned()));
   }
 
+  @Test
+  public void testCreateTableCommitThrowsUnknownException()
+      throws TException, InterruptedException {
+    String namespace = DB_NAME;
+    String tableName = getRandomName();
+    TableIdentifier tableIdentifier = TableIdentifier.of(namespace, tableName);
+
+    HiveTableOperations hiveTableOperations =
+        (HiveTableOperations) catalog.newTableOps(tableIdentifier);
+
+    String tableLocation = catalog.defaultWarehouseLocation(tableIdentifier);
+    TableMetadata metadataV1 = createTableMetadata(tableLocation);
+
+    HiveTableOperations spyOps = spy(hiveTableOperations);
+    failCommitAndThrowException(spyOps);
+
+    Assertions.assertThatThrownBy(() -> spyOps.commit(null, metadataV1))
+        .isInstanceOf(RuntimeException.class)
+        .hasMessageContaining("Datacenter on fire");
+
+    Assert.assertEquals(
+        "One metadata files should exist",
+        1,
+        metadataFileCount(tableLocation + "/writeMetaDataLoc"));

Review Comment:
   > Why can not we use the metadataV1 and metadataFileCount(metadataV1)
   
   Since the purpose of the test is to create a table, I am creating a new `TableMetadata` object using the `createTableMetadata` method. However, this method does not set the `metadataFileLocation` member variable of the `TableMetadata` object. As a result, I cannot use the `metadataFileCount(metadataV1)` method because it calls `metadata.metadataFileLocation()`, which returns null.
   
   > Could we just test the default Hive locations
   
   I have to pass the metadata location to get the file count. metadata location is specified in `createTableMetadata`.
   
   I will try to see if we can use the existing `metadataFileCount(metadataV1)` method. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org