You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/02/14 20:41:17 UTC

[GitHub] [iceberg] szehon-ho opened a new pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

szehon-ho opened a new pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123


   Tries to fix #4090 
   
   (Copied from my comment there)
   
   From the log, the expected exception seems still a validationException, just not from the distributed Spark delta-writer job as is expected but rather from the non-distributed metadata-only version.  Ref:
   
   Expected: an instance of org.apache.spark.SparkException
        but: <java.lang.IllegalArgumentException: Failed to cleanly delete data files matching: ref(name="id") == 1> is a java.lang.IllegalArgumentException
   	at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
   
   That's from here (the metadata-only version):
   https://github.com/apache/iceberg/blob/master/spark/v3.2/spark/src/main/java/org/apache/iceberg/spark/source/SparkTable.java#L271
   
   Although most of the time it will hit the distributed delta delete, there is a chance here it will fall to this metadata version. This is if the optimizer (OptimizeMetadataOnlyDeleteFromTable in particular) believes that the delete can be handled with just metadata. The appendFuture tries to avert this by constructing a file of two elements (1,2) and each time the deleteFuture hits only (1) so it decides it cannot use metadata, but if the appendTable has not run yet then the table is empty the deleteFuture decides it proceed with metadata-only delete.  In most of those times, that delete goes through fine as there is nothing to do and the test gets another try to get the right exception, but a very smaller percentage of time it may find the appendFuture has landed right before the commit and then fails in the above code path.
   
   One fix is to try to add expected checks this potential failure. Another fix is to make sure that there is some pre-existing data to force it always to use the distributed delta delete, which is done here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on a change in pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on a change in pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123#discussion_r809628471



##########
File path: spark/v3.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java
##########
@@ -593,6 +593,17 @@ public synchronized void testDeleteWithSerializableIsolation() throws Interrupte
 
     sql("ALTER TABLE %s SET TBLPROPERTIES('%s' '%s')", tableName, DELETE_ISOLATION_LEVEL, "serializable");
 
+    // Pre-populate the table to force it to use the DeltaWriter instead of Metadata-Only Delete

Review comment:
       Done.  I am thinking for a little bit, but I am guessing did you mean change the comment?  (comment changed)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] KarlManong edited a comment on pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
KarlManong edited a comment on pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123#issuecomment-1042685511


   > I agree with the analysis. I had one nit but LGTM otherwise. Thanks, @szehon-ho for fixing and @KarlManong for reporting.
   
   My pleasure. Somehow I can't reproduce now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] szehon-ho commented on pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
szehon-ho commented on pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123#issuecomment-1043761316


   @aokolnychyi thanks for review!  whenever you have time for another look


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on a change in pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on a change in pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123#discussion_r808609535



##########
File path: spark/v3.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestDelete.java
##########
@@ -593,6 +593,17 @@ public synchronized void testDeleteWithSerializableIsolation() throws Interrupte
 
     sql("ALTER TABLE %s SET TBLPROPERTIES('%s' '%s')", tableName, DELETE_ISOLATION_LEVEL, "serializable");
 
+    // Pre-populate the table to force it to use the DeltaWriter instead of Metadata-Only Delete

Review comment:
       nit: what about pulling the existing `inputDF` out of the closure and reusing it here?
   
   ```
       List<Integer> ids = ImmutableList.of(1, 2);
       Dataset<Row> inputDF = spark.createDataset(ids, Encoders.INT())
           .withColumnRenamed("value", "id")
           .withColumn("dep", lit("hr"));
   
       // pre-populate the table to avoid metadata deletes
       inputDF.coalesce(1).writeTo(tableName).append();
   
       // existing futures
   ```
   
   We also have to slightly change the error message as `TestDelete` is shared between copy-on-write and merge-on-read implementations. Hence, we may not use `DeltaWriter`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi merged pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
aokolnychyi merged pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] KarlManong commented on pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
KarlManong commented on pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123#issuecomment-1042685511


   > I agree with the analysis. I had one nit but LGTM otherwise. Thanks, @szehon-ho for fixing and @KarlManong for reporting.
   
   My pleasure


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123#issuecomment-1043987222


   Thanks, @szehon-ho!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] aokolnychyi commented on pull request #4123: Fix random test failure of testDeleteWithSerializableIsolation

Posted by GitBox <gi...@apache.org>.
aokolnychyi commented on pull request #4123:
URL: https://github.com/apache/iceberg/pull/4123#issuecomment-1042498233


   I agree with the analysis. I had one nit but LGTM otherwise. Thanks, @szehon-ho for fixing and @KarlManong for reporting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org