You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/03/12 19:26:46 UTC

[GitHub] [iceberg] aokolnychyi commented on a change in pull request #2328: (#2317) Stop removal of files when catalog state is uncertain - HiveCatalog

aokolnychyi commented on a change in pull request #2328:
URL: https://github.com/apache/iceberg/pull/2328#discussion_r593358075



##########
File path: api/src/main/java/org/apache/iceberg/exceptions/CommitStateUnknownException.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.exceptions;
+
+/**
+ * Exception for a failure to confirm either affirmatively or negatively that a commit was applied. The client
+ * cannot take any further action without possibly corrupting the table.
+ */
+public class CommitStateUnknownException extends RuntimeException {
+
+  private static final String commonInfo =

Review comment:
       nit: `commonInfo` -> `COMMON_INFO`

##########
File path: api/src/main/java/org/apache/iceberg/exceptions/CommitStateUnknownException.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.exceptions;
+
+/**
+ * Exception for a failure to confirm either affirmatively or negatively that a commit was applied. The client
+ * cannot take any further action without possibly corrupting the table.
+ */
+public class CommitStateUnknownException extends RuntimeException {
+
+  private static final String commonInfo =
+      "\nCannot determine whether the commit was successful or not, the underlying data files may or " +

Review comment:
       nit: shall we add "\n" directly below to keep the constant simple?
   
   ```
   cause.getMessage() + "\n" + COMMON_INFO
   ```
   
   

##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -199,29 +203,68 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), removedProps, hiveEngineEnabled);
 
       persistTable(tbl, updateHiveTable);
-      threw = false;
     } catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {
+      canCleanupMetadata = true;
       throw new AlreadyExistsException("Table already exists: %s.%s", database, tableName);
 
     } catch (TException | UnknownHostException e) {
       if (e.getMessage() != null && e.getMessage().contains("Table/View 'HIVE_LOCKS' does not exist")) {
+        canCleanupMetadata = true;
         throw new RuntimeException("Failed to acquire locks from metastore because 'HIVE_LOCKS' doesn't " +
             "exist, this probably happened when using embedded metastore or doesn't create a " +
-            "transactional meta table. To fix this, use an alternative metastore", e);
+            "transactional meta table. To fix this, use an alternative metastore.\n%s", e);
       }
 
-      throw new RuntimeException(String.format("Metastore operation failed for %s.%s", database, tableName), e);
+      RuntimeException metastoreException =
+              new RuntimeException(String.format("Metastore operation failed for %s.%s", database, tableName), e);
+      if (checkCommitSuccessful(newMetadataLocation, metastoreException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw metastoreException;
+      }
 
     } catch (InterruptedException e) {
-      Thread.currentThread().interrupt();
-      throw new RuntimeException("Interrupted during commit", e);
 
+      Thread.currentThread().interrupt();
+      RuntimeException interruptException = new RuntimeException("Interrupted during commit", e);
+      if (checkCommitSuccessful(newMetadataLocation, interruptException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw interruptException;
+      }
     } finally {
-      cleanupMetadataAndUnlock(threw, newMetadataLocation, lockId);
+      cleanupMetadataAndUnlock(canCleanupMetadata, newMetadataLocation, lockId);

Review comment:
       @pvary @marton-bod, we only handle `TException` and `UnknownHostException` here. Is there any chance we get another exception while we do `alter_table`?

##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -199,29 +203,68 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), removedProps, hiveEngineEnabled);
 
       persistTable(tbl, updateHiveTable);
-      threw = false;
     } catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {
+      canCleanupMetadata = true;
       throw new AlreadyExistsException("Table already exists: %s.%s", database, tableName);
 
     } catch (TException | UnknownHostException e) {

Review comment:
       I wonder whether this place is too generic. Apart from handling exceptions in alter_table (i.e. the actual commit), we also catch any exceptions during loading and locking. We don't necessarily have to do these checks in cases where we did not attempt to commit a new version.
   
   Is it a crazy idea to only add this check in `persistTable`, the one that does the actual commit?

##########
File path: hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java
##########
@@ -199,29 +203,68 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
       setHmsTableParameters(newMetadataLocation, tbl, metadata.properties(), removedProps, hiveEngineEnabled);
 
       persistTable(tbl, updateHiveTable);
-      threw = false;
     } catch (org.apache.hadoop.hive.metastore.api.AlreadyExistsException e) {
+      canCleanupMetadata = true;
       throw new AlreadyExistsException("Table already exists: %s.%s", database, tableName);
 
     } catch (TException | UnknownHostException e) {
       if (e.getMessage() != null && e.getMessage().contains("Table/View 'HIVE_LOCKS' does not exist")) {
+        canCleanupMetadata = true;
         throw new RuntimeException("Failed to acquire locks from metastore because 'HIVE_LOCKS' doesn't " +
             "exist, this probably happened when using embedded metastore or doesn't create a " +
-            "transactional meta table. To fix this, use an alternative metastore", e);
+            "transactional meta table. To fix this, use an alternative metastore.\n%s", e);
       }
 
-      throw new RuntimeException(String.format("Metastore operation failed for %s.%s", database, tableName), e);
+      RuntimeException metastoreException =
+              new RuntimeException(String.format("Metastore operation failed for %s.%s", database, tableName), e);
+      if (checkCommitSuccessful(newMetadataLocation, metastoreException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw metastoreException;
+      }
 
     } catch (InterruptedException e) {
-      Thread.currentThread().interrupt();
-      throw new RuntimeException("Interrupted during commit", e);
 
+      Thread.currentThread().interrupt();
+      RuntimeException interruptException = new RuntimeException("Interrupted during commit", e);
+      if (checkCommitSuccessful(newMetadataLocation, interruptException)) {
+        return; // We are able to verify the commit succeed
+      } else {
+        // We were able to check and the commit did not succeed
+        canCleanupMetadata = true;
+        throw interruptException;
+      }
     } finally {
-      cleanupMetadataAndUnlock(threw, newMetadataLocation, lockId);
+      cleanupMetadataAndUnlock(canCleanupMetadata, newMetadataLocation, lockId);
+    }
+  }
+
+  /**
+   * Attempt to load the table and see if the current metadata location matches our new commit path. This as used as
+   * a last resort when we are dealing with exceptions which may indicate that our commit has failed but we are not
+   * certain if that is the case.
+   * @param newMetadataLocation the path of the new commit file
+   * @param originalFailure the exception which leads us to believe the commit has failed
+   * @return true if the commit was successful, false if not, and rethrows the original exception if we cannot
+   * determine
+   */
+  private boolean checkCommitSuccessful(String newMetadataLocation, RuntimeException originalFailure) {
+    try {
+      refresh();

Review comment:
       I think `refresh` already returns `TableMetadata` to us.

##########
File path: api/src/main/java/org/apache/iceberg/exceptions/CommitStateUnknownException.java
##########
@@ -0,0 +1,38 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.iceberg.exceptions;
+
+/**
+ * Exception for a failure to confirm either affirmatively or negatively that a commit was applied. The client
+ * cannot take any further action without possibly corrupting the table.
+ */
+public class CommitStateUnknownException extends RuntimeException {
+
+  private static final String commonInfo =
+      "\nCannot determine whether the commit was successful or not, the underlying data files may or " +
+      "may not be needed. Manual intervention via the Remove Orphan Files Action can remove these " +
+      "files when a connection to the Catalog can be re-established if the commit was actually unsuccessful." +
+      "Please check to see whether or not your commit was successful when the catalog is again reachable." +

Review comment:
       nit: I'd add something `... was successful before retrying the operation when the catalog is again reachable. Retrying an operation without checking may lead to duplicated data. At this time ...`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org