You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2021/04/01 00:03:39 UTC

[GitHub] [iceberg] yyanyy opened a new pull request #2402: AWS: handle uncertain catalog state for glue

yyanyy opened a new pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402


   - same fix as in #2328 for glue catalog. This case in theory should happen less often for Glue comparing to HMS unless there's internet connectivity issue with the current host at the time the commit is in progress, but still want to handle regardless. 
   - in theory, certain exception types from glue may not worth catalog recheck, e.g. client side exception due to invalid input. Personally I think an extra glue check may be cheap enough comparing to the risk of missing a valid case handling and code complexity of listing all kinds of exception type, but I don't really feel strongly either way and comments are welcome. 
   - moved `checkCommitStatus()` from `HiveTableOperations` to `BaseMetastoreTableOperations` to allow code reuse.
   - tests in `GlueCatalogCommitFailureTest` are heavily based on #2328; wanted to abstract them but since both classes extends a base class, and the shared code are not a lot, decided to duplicate the logic instead. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#issuecomment-811992680


   Looks good to me as well! Sorry the tests weren't easier to abstract, I figure with catalogs having such different operations it is probably better that we just duplicate :/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#discussion_r605773683



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java
##########
@@ -103,14 +105,30 @@ protected void doRefresh() {
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
     String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
-    boolean exceptionThrown = true;
+    CommitStatus commitStatus = CommitStatus.FAILURE;
+
     try {
       lock(newMetadataLocation);
       Table glueTable = getGlueTable();
       checkMetadataLocation(glueTable, base);
       Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation);
-      persistGlueTable(glueTable, properties);
-      exceptionThrown = false;
+
+      try {
+        persistGlueTable(glueTable, properties);
+        commitStatus = CommitStatus.SUCCESS;
+      } catch (Throwable persistFailure) {

Review comment:
       1. I think this can be `SdkException` instead of all throwables, because `persistGlueTable` is only a call to Glue. If it is not that exception type, it is guaranteed to be a failure.
   2. can we avoid nested try and directly determine this in the catch block below? We are already catching that exception at L137 anyway.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] RussellSpitzer commented on a change in pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
RussellSpitzer commented on a change in pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#discussion_r605860585



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java
##########
@@ -103,14 +105,30 @@ protected void doRefresh() {
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
     String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
-    boolean exceptionThrown = true;
+    CommitStatus commitStatus = CommitStatus.FAILURE;
+
     try {
       lock(newMetadataLocation);
       Table glueTable = getGlueTable();
       checkMetadataLocation(glueTable, base);
       Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation);
-      persistGlueTable(glueTable, properties);
-      exceptionThrown = false;
+
+      try {
+        persistGlueTable(glueTable, properties);
+        commitStatus = CommitStatus.SUCCESS;
+      } catch (Throwable persistFailure) {

Review comment:
       I think it is less important now that we start out with the commit state set to unknown, but in the original design we basically started out assuming that the commit had succeeded and switched a flag to indicate that it had failed. We can probably have this just be runtime exception now since now the logic is basically for deciding when to retry.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy merged pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
yyanyy merged pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#discussion_r605917390



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java
##########
@@ -113,29 +112,28 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
       checkMetadataLocation(glueTable, base);
       Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation);
 
-      try {
-        persistGlueTable(glueTable, properties);
-        commitStatus = CommitStatus.SUCCESS;
-      } catch (Throwable persistFailure) {
-        LOG.error("Confirming if commit to {} indeed failed to persist, attempting to reconnect and check.",
-            fullTableName, persistFailure);
-        commitStatus = checkCommitStatus(newMetadataLocation, metadata);
-        switch (commitStatus) {
-          case SUCCESS:
-            break;
-          case FAILURE:
-            throw persistFailure;
-          case UNKNOWN:
-            throw new CommitStateUnknownException(persistFailure);
-        }
-      }
+      persistGlueTable(glueTable, properties);
+      commitStatus = CommitStatus.SUCCESS;
+

Review comment:
       nit: extra line to remove

##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java
##########
@@ -113,29 +112,28 @@ protected void doCommit(TableMetadata base, TableMetadata metadata) {
       checkMetadataLocation(glueTable, base);
       Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation);
 

Review comment:
       nit: extra line to remove




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on a change in pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
yyanyy commented on a change in pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#discussion_r605851603



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java
##########
@@ -103,14 +105,30 @@ protected void doRefresh() {
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
     String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
-    boolean exceptionThrown = true;
+    CommitStatus commitStatus = CommitStatus.FAILURE;
+
     try {
       lock(newMetadataLocation);
       Table glueTable = getGlueTable();
       checkMetadataLocation(glueTable, base);
       Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation);
-      persistGlueTable(glueTable, properties);
-      exceptionThrown = false;
+
+      try {
+        persistGlueTable(glueTable, properties);
+        commitStatus = CommitStatus.SUCCESS;
+      } catch (Throwable persistFailure) {

Review comment:
       Thanks for the feedback! 
   
   For 1, I'm not confident if any type of network issue would end up as `SdkException`, so I'd prefer to maintain a list of exceptions that won't go through `checkCommitStatus` than a list that we will just to be on the safe side. 
   For 2, sounds good, I'll move the catch logic to L137 but I'll convert it to a `Throwable` to handle 1. This essentially means to not check commit status for `ConcurrentModificationException` and `AlreadyExistsException` which I think should be minimum risk. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
yyanyy commented on pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#issuecomment-813750593


   Merged as no further comment after last Thursday and multiple approvals were in place. Thank you again everyone for reviewing so quickly! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#discussion_r605854936



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java
##########
@@ -103,14 +105,30 @@ protected void doRefresh() {
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
     String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
-    boolean exceptionThrown = true;
+    CommitStatus commitStatus = CommitStatus.FAILURE;
+
     try {
       lock(newMetadataLocation);
       Table glueTable = getGlueTable();
       checkMetadataLocation(glueTable, base);
       Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation);
-      persistGlueTable(glueTable, properties);
-      exceptionThrown = false;
+
+      try {
+        persistGlueTable(glueTable, properties);
+        commitStatus = CommitStatus.SUCCESS;
+      } catch (Throwable persistFailure) {

Review comment:
       catching `Throwable` always sound very dangerous to me, because `Error` indicates serious problems that a reasonable application should not try to catch,  and I would much prefer just catching `RuntimeException` or at least just `Exception`. Was there any particular reason for `HiveTableOperations` to catch `Throwable`? @RussellSpitzer 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#issuecomment-811574515


   @RussellSpitzer, could you check?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on a change in pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
yyanyy commented on a change in pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#discussion_r605868688



##########
File path: aws/src/main/java/org/apache/iceberg/aws/glue/GlueTableOperations.java
##########
@@ -103,14 +105,30 @@ protected void doRefresh() {
   @Override
   protected void doCommit(TableMetadata base, TableMetadata metadata) {
     String newMetadataLocation = writeNewMetadata(metadata, currentVersion() + 1);
-    boolean exceptionThrown = true;
+    CommitStatus commitStatus = CommitStatus.FAILURE;
+
     try {
       lock(newMetadataLocation);
       Table glueTable = getGlueTable();
       checkMetadataLocation(glueTable, base);
       Map<String, String> properties = prepareProperties(glueTable, newMetadataLocation);
-      persistGlueTable(glueTable, properties);
-      exceptionThrown = false;
+
+      try {
+        persistGlueTable(glueTable, properties);
+        commitStatus = CommitStatus.SUCCESS;
+      } catch (Throwable persistFailure) {

Review comment:
       Sounds good, I was thinking if it's a `Error` we will very unlikely be able have a success commit before that, so it will be rethrown anyway, but on the other hand we probably want to make sure the process to die immediately. I'll update L137 to catch runtime exception. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
yyanyy commented on pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#issuecomment-812199957


   Thanks everyone for the quick review! The test failed but I think it may be transient, since I think I built the same code locally before updating and all tests succeeded. I've triggered another round of test. 
   
   If there is no more comment/concern before tomorrow I'll probably merge this change. Thanks again! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] yyanyy commented on pull request #2402: AWS: handle uncertain catalog state for glue

Posted by GitBox <gi...@apache.org>.
yyanyy commented on pull request #2402:
URL: https://github.com/apache/iceberg/pull/2402#issuecomment-811558481


   > The changes look good to me. @jackye1995 or @yyanyy, can you confirm that the integration tests pass?
   
   Thank you for the quick review! Yes I have ran the integration tests locally for both new and existing glue tests and confirmed that they all succeeded. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org