You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2022/07/17 04:26:25 UTC

[GitHub] [iceberg] amogh-jahagirdar opened a new pull request, #5289: AWS: Fix TestS3FileIO prefixDelete test. Fix setup of S3 batch deletion tests

amogh-jahagirdar opened a new pull request, #5289:
URL: https://github.com/apache/iceberg/pull/5289

   Fix TestS3FileIO prefixDelete test and fix setup of S3 batch deletion tests. Sporadically prefixDelete fails with a 500 (see below). When debugging it looks appears as though for S3Mock there's some corruption of the file states when there's concurrent execution of deletes and lists. Still not entirely sure why this is, but serial execution of each of each of the "scale" prefix tests, leads to the test consistently passing. Though we really should be able to run these different scale tests in parallel without issue since the prefixes are isolated. The test fails during the validation step in the test where we list the prefix to validate the objects were deleted.
   
   
   If this is truly a S3Mock issue (which debugging seems to point to), then for the prefix tests another way is just to remove the unit tests and write integration tests which validate against S3. @danielcweeks @rdblue @jackye1995 let me know your thoughts.
   
   
   Also fixed the setup step for the batch deletion tested. I noticed we aren't actually putting the objects. The tests are passing now because we just validate at the end if the objects no longer exist (and they were never put it in the first place). 
   
   ```
   org.apache.iceberg.aws.s3.TestS3FileIO > testPrefixDelete FAILED
       software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 500, Request ID: null)
           at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
           at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
           at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
           at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
           at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:95)
           at software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$6(BaseClientHandler.java:234)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:40)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:30)
           at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:73)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptTimeoutTrackingStage.execute(ApiCallAttemptTimeoutTrackingStage.java:42)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:78)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.TimeoutExceptionHandlingStage.execute(TimeoutExceptionHandlingStage.java:40)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:50)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallAttemptMetricCollectionStage.execute(ApiCallAttemptMetricCollectionStage.java:36)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:81)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.RetryableStage.execute(RetryableStage.java:36)
           at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
           at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:56)
           at software.amazon.awssdk.core.internal.http.StreamManagingStage.execute(StreamManagingStage.java:36)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.executeWithTimer(ApiCallTimeoutTrackingStage.java:80)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:60)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallTimeoutTrackingStage.execute(ApiCallTimeoutTrackingStage.java:42)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:48)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ApiCallMetricCollectionStage.execute(ApiCallMetricCollectionStage.java:31)
           at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
           at software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:37)
           at software.amazon.awssdk.core.internal.http.pipeline.stages.ExecutionFailureExceptionReportingStage.execute(ExecutionFailureExceptionReportingStage.java:26)
           at software.amazon.awssdk.core.internal.http.AmazonSyncHttpClient$RequestExecutionBuilderImpl.execute(AmazonSyncHttpClient.java:193)
           at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.invoke(BaseSyncClientHandler.java:103)
           at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.doExecute(BaseSyncClientHandler.java:167)
           at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.lambda$execute$1(BaseSyncClientHandler.java:82)
           at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.measureApiCallSuccess(BaseSyncClientHandler.java:175)
           at software.amazon.awssdk.core.internal.handler.BaseSyncClientHandler.execute(BaseSyncClientHandler.java:76)
           at software.amazon.awssdk.core.client.handler.SdkSyncClientHandler.execute(SdkSyncClientHandler.java:45)
           at software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler.execute(AwsSyncClientHandler.java:56)
           at software.amazon.awssdk.services.s3.DefaultS3Client.listObjectsV2(DefaultS3Client.java:6148)
           at software.amazon.awssdk.services.s3.paginators.ListObjectsV2Iterable$ListObjectsV2ResponseFetcher.nextPage(ListObjectsV2Iterable.java:153)
           at software.amazon.awssdk.services.s3.paginators.ListObjectsV2Iterable$ListObjectsV2ResponseFetcher.nextPage(ListObjectsV2Iterable.java:144)
           at software.amazon.awssdk.core.pagination.sync.PaginatedResponsesIterator.next(PaginatedResponsesIterator.java:58)
           at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
           at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$0(StreamSpliterators.java:295)
           at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:207)
           at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:162)
           at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:301)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #5289: AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests.

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#issuecomment-1189317963

   Thanks @amogh-jahagirdar (just waiting on checks)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a diff in pull request #5289: AWS: Fix TestS3FileIO prefixDelete test. Fix setup of S3 batch deletion tests

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r923962203


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -218,8 +219,7 @@ public void testPrefixList() {
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
     List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
-
-    scaleSizes.parallelStream().forEach(scale -> {
+    scaleSizes.forEach(scale -> {

Review Comment:
   In that case can we reduce the number of files created for local test, and create another integration test for the same use case with parallel delete?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5289: AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests.

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r924725943


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -217,9 +218,9 @@ public void testPrefixList() {
   @Test
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
-    List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
+    List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000);

Review Comment:
   Good point, I've updated 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5289: AWS: Fix TestS3FileIO prefixDelete test. Fix setup of S3 batch deletion tests

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r923958262


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -218,8 +219,7 @@ public void testPrefixList() {
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
     List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
-
-    scaleSizes.parallelStream().forEach(scale -> {
+    scaleSizes.forEach(scale -> {

Review Comment:
   Yeah it makes sense to parallelize because S3 mock still takes a lot of time in the high scale case. The unit tests are run concurrently, so I think we're good there. 
   
   Since the issue seems to lie with S3 mock, I think to truly validate this behavior it makes sense to have integ tests which validate against S3 but for now I think it makes sense to just address the unit tests. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5289: AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests.

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r924725943


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -217,9 +218,9 @@ public void testPrefixList() {
   @Test
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
-    List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
+    List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000);

Review Comment:
   Sure, I've updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5289: AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests.

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r924717267


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -217,9 +218,9 @@ public void testPrefixList() {
   @Test
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
-    List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
+    List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000);

Review Comment:
   Can we make the third scale `1001`?  That will actually trigger a single pagination, which would be good to include at minor additional overhead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5289: AWS: Fix TestS3FileIO prefixDelete test. Fix setup of S3 batch deletion tests

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r923958262


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -218,8 +219,7 @@ public void testPrefixList() {
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
     List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
-
-    scaleSizes.parallelStream().forEach(scale -> {
+    scaleSizes.forEach(scale -> {

Review Comment:
   Yeah it makes sense to parallelize because S3 mock still takes a lot of time in the high scale case. The unit tests are run concurrently, so I think we're good there. 
   
   Since the issue seems to lie with S3 mock, I think to truly validate this behavior it makes sense to have integ tests which validate against S3 but for now I think it makes sense to just address in the unit tests. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a diff in pull request #5289: AWS: Fix TestS3FileIO prefixDelete test. Fix setup of S3 batch deletion tests

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r923938835


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -218,8 +219,7 @@ public void testPrefixList() {
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
     List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
-
-    scaleSizes.parallelStream().forEach(scale -> {
+    scaleSizes.forEach(scale -> {

Review Comment:
   The original rational for using the parallel stream was to help speedup the tests because it take the S3 mock library a considerable amount of time to create 2.5K files (even all locally).  The intent of the scale factors was to ensure that pagination in the s3 list calls is working correctly.  If these tests are run in parallel with the other project tests, I think it's probably fine to just call sequentially.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] amogh-jahagirdar commented on a diff in pull request #5289: AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests.

Posted by GitBox <gi...@apache.org>.
amogh-jahagirdar commented on code in PR #5289:
URL: https://github.com/apache/iceberg/pull/5289#discussion_r924087183


##########
aws/src/test/java/org/apache/iceberg/aws/s3/TestS3FileIO.java:
##########
@@ -218,8 +219,7 @@ public void testPrefixList() {
   public void testPrefixDelete() {
     String prefix = "s3://bucket/path/to/delete";
     List<Integer> scaleSizes = Lists.newArrayList(0, 5, 1000, 2500);
-
-    scaleSizes.parallelStream().forEach(scale -> {
+    scaleSizes.forEach(scale -> {

Review Comment:
   Makes sense, the pagination is handled by the AWS SDK listObjectsV2Paginator so effectively that's already unit tested through the AWS SDK. Integ tests can validate the actual behavior and we can do a parallel delete as you mentioned. Updated and ran integ tests, which are passing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 merged pull request #5289: AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests.

Posted by GitBox <gi...@apache.org>.
jackye1995 merged PR #5289:
URL: https://github.com/apache/iceberg/pull/5289


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org