You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2020/12/02 17:54:51 UTC

[GitHub] [iceberg] jackye1995 opened a new pull request #1863: AWS: support S3 strong consistency

jackye1995 opened a new pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863


   AWS released S3 strong consistency, and now all LIST and GET are strongly consistent. So we can use a LIST to force strong consistency before performing a HEAD for `exists()` check.
   
   More details: https://aws.amazon.com/blogs/aws/amazon-s3-update-strong-read-after-write-consistency/


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#discussion_r534559328



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java
##########
@@ -75,13 +76,14 @@ public boolean exists() {
     }
   }
 
-  protected HeadObjectResponse getObjectMetadata() throws S3Exception {
+  protected S3Object getObjectMetadata() throws S3Exception {
     if (metadata == null) {
-      HeadObjectRequest.Builder requestBuilder = HeadObjectRequest.builder()
+      ListObjectsV2Response response = client().listObjectsV2(ListObjectsV2Request.builder()
           .bucket(uri().bucket())
-          .key(uri().key());
-      S3RequestUtil.configureEncryption(awsProperties, requestBuilder);
-      metadata = client().headObject(requestBuilder.build());
+          .prefix(uri().key())
+          .maxKeys(1)
+          .build());
+      metadata = response.hasContents() ? response.contents().get(0) : null;

Review comment:
       Yeah I was going to ask, I saw in `HadoopInputFile` it simply throws a `RuntimeIOException` when asking for length of a file not exist. Probably better to check existence and throw `NotFoundException` in this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on a change in pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
rdblue commented on a change in pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#discussion_r535415969



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java
##########
@@ -75,13 +76,14 @@ public boolean exists() {
     }
   }
 
-  protected HeadObjectResponse getObjectMetadata() throws S3Exception {
+  protected S3Object getObjectMetadata() throws S3Exception {
     if (metadata == null) {
-      HeadObjectRequest.Builder requestBuilder = HeadObjectRequest.builder()
+      ListObjectsV2Response response = client().listObjectsV2(ListObjectsV2Request.builder()
           .bucket(uri().bucket())
-          .key(uri().key());
-      S3RequestUtil.configureEncryption(awsProperties, requestBuilder);
-      metadata = client().headObject(requestBuilder.build());
+          .prefix(uri().key())
+          .maxKeys(1)
+          .build());
+      metadata = response.hasContents() ? response.contents().get(0) : null;

Review comment:
       I agree, `NotFoundException` would be much better.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#discussion_r534549302



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java
##########
@@ -77,6 +78,12 @@ public boolean exists() {
 
   protected HeadObjectResponse getObjectMetadata() throws S3Exception {
     if (metadata == null) {
+      // list object to force strong consistency
+      client().listObjectsV2(ListObjectsV2Request.builder()
+          .bucket(uri().bucket())
+          .prefix(uri().key())

Review comment:
       Do we want to validate, or just set metadata to null so that `exists()` return false? I understand that it is not likely that a file name is a prefix of another file, but we cannot eliminate the possibility of some location provider with such behavior.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a change in pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on a change in pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#discussion_r534498614



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java
##########
@@ -77,6 +78,12 @@ public boolean exists() {
 
   protected HeadObjectResponse getObjectMetadata() throws S3Exception {
     if (metadata == null) {
+      // list object to force strong consistency
+      client().listObjectsV2(ListObjectsV2Request.builder()
+          .bucket(uri().bucket())
+          .prefix(uri().key())

Review comment:
       you might want to include `maxKeys(1)` as well since we only care about a single entry.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#issuecomment-738132358


   Thanks @jackye1995 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#issuecomment-737625728


   > Would it make sense to test for the contents of a file change after put (and not just the length being the same like is currently tested)?
   
   What situation does it try to cover in the context of this PR?
   (I do have more tests for S3FileIO that I plan to port here in another PR, which checks the contents of a file as you describe)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks merged pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
danielcweeks merged pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#issuecomment-737507316


   I agree with @rdblue's comment that we probably don't want to do two requests and LIST alone would be sufficient.
   
   Also, I'm not clear on how forcing a LIST would ensure correctness for a following head call.  Based on the docs I didn't see anything about that sequencing resulting in HEAD consistency.  


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#issuecomment-737552943


   > It looks like it should be possible to use just a LIST call here because we don't need the full metadata from the HEAD request. Metadata is only used for the existence check and for the file size, which should be available from the list response:
   > 
   > ```java
   >       // list object to force strong consistency
   >       ListObjectsV2Response response = client().listObjectsV2(ListObjectsV2Request.builder()
   >           .bucket(uri().bucket())
   >           .prefix(uri().key())
   >           .build());
   > 
   >       Preconditions.checkState(response.contents().size() <= 1, "Invalid file: %s contains more than one object", uri);
   >       this.exists = response.contents().size() > 0;
   >       this.length = response.contents().get(0).size();
   > ```
   
   Thanks for the suggestion, did not know LIST also gets the length metadata.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a change in pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on a change in pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#discussion_r534550155



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java
##########
@@ -75,13 +76,14 @@ public boolean exists() {
     }
   }
 
-  protected HeadObjectResponse getObjectMetadata() throws S3Exception {
+  protected S3Object getObjectMetadata() throws S3Exception {
     if (metadata == null) {
-      HeadObjectRequest.Builder requestBuilder = HeadObjectRequest.builder()
+      ListObjectsV2Response response = client().listObjectsV2(ListObjectsV2Request.builder()
           .bucket(uri().bucket())
-          .key(uri().key());
-      S3RequestUtil.configureEncryption(awsProperties, requestBuilder);
-      metadata = client().headObject(requestBuilder.build());
+          .prefix(uri().key())
+          .maxKeys(1)
+          .build());
+      metadata = response.hasContents() ? response.contents().get(0) : null;

Review comment:
       I think this changes the exception behavior a little bit because now `S3InputFile.getLength()` will throw NPE as opposed to S3 NOT_FOUND in the event it does not exist and `getLength()` is called.  Not a huge issue, but we might consider improving that a little.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] jackye1995 commented on a change in pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
jackye1995 commented on a change in pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#discussion_r534559385



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java
##########
@@ -77,6 +78,12 @@ public boolean exists() {
 
   protected HeadObjectResponse getObjectMetadata() throws S3Exception {
     if (metadata == null) {
+      // list object to force strong consistency
+      client().listObjectsV2(ListObjectsV2Request.builder()
+          .bucket(uri().bucket())
+          .prefix(uri().key())

Review comment:
       See the updated logic for more details




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] danielcweeks commented on a change in pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
danielcweeks commented on a change in pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#discussion_r534498614



##########
File path: aws/src/main/java/org/apache/iceberg/aws/s3/BaseS3File.java
##########
@@ -77,6 +78,12 @@ public boolean exists() {
 
   protected HeadObjectResponse getObjectMetadata() throws S3Exception {
     if (metadata == null) {
+      // list object to force strong consistency
+      client().listObjectsV2(ListObjectsV2Request.builder()
+          .bucket(uri().bucket())
+          .prefix(uri().key())

Review comment:
       you might want to include `maxKeys(1)` as well since we only care about a single entry (and validate that it matches exactly)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org


[GitHub] [iceberg] rdblue commented on pull request #1863: AWS: support S3 strong consistency

Posted by GitBox <gi...@apache.org>.
rdblue commented on pull request #1863:
URL: https://github.com/apache/iceberg/pull/1863#issuecomment-737484633


   It looks like it should be possible to use just a LIST call here because we don't need the full metadata from the HEAD request. Metadata is only used for the existence check and for the file size, which should be available from the list response:
   
   ```java
         // list object to force strong consistency
         ListObjectsV2Response response = client().listObjectsV2(ListObjectsV2Request.builder()
             .bucket(uri().bucket())
             .prefix(uri().key())
             .build());
   
         Preconditions.checkState(response.contents().size() <= 1, "Invalid file: %s contains more than one object", uri);
         this.exists = response.contents().size() > 0;
         this.length = response.contents().get(0).size();
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org