You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Chris Nauroth (JIRA)" <ji...@apache.org> on 2016/05/01 00:14:12 UTC

[jira] [Commented] (HADOOP-13076) S3A does not perform MD5 verification on stored objects.

    [ https://issues.apache.org/jira/browse/HADOOP-13076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15265527#comment-15265527 ] 

Chris Nauroth commented on HADOOP-13076:
----------------------------------------

bq. Chris: I believe the MD5 checks are in the SDK download automatically, at least from my cursory review of the AWS code on github;

[~stevel@apache.org], I just checked out that code, and I think you're right.

bq. there's nothing equivalent in upload

I think I found MD5 verification for uploads too.  It's kind of clever about it too.  If the caller supplied an MD5, then it will go ahead and send the Content-MD5 header.  If not, then it will do the MD5 calculation internally for the caller, but it won't actually send the Content-MD5 header.  That would imply needing to buffer the entire content and finish the MD5 calculation before sending the HTTP request headers.  Instead, it checks the ETag returned by the S3 service in the HTTP response, so it doesn't need to use an inefficient buffering strategy.  I found this logic for both plain put and multi-part upload.

https://github.com/aws/aws-sdk-java/blob/1.10.6/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1385-L1505

https://github.com/aws/aws-sdk-java/blob/1.10.6/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L2893-L2944

Those links are for the 1.10.6 code, which is our dependency as of Apache Hadoop 2.8.0.  Here are the equivalent links for the 1.7.4 code, which is our dependency in Apache Hadoop versions prior to 2.8.0.

https://github.com/aws/aws-sdk-java/blob/1.7.4.x/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L1363-L1419

https://github.com/aws/aws-sdk-java/blob/1.7.4.x/src/main/java/com/amazonaws/services/s3/AmazonS3Client.java#L2678-L2723

Based on this, I think we can close this as "Not a Problem".  I'll give it a few more days before closing in case anyone else wants to comment.

> S3A does not perform MD5 verification on stored objects.
> --------------------------------------------------------
>
>                 Key: HADOOP-13076
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13076
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs/s3
>            Reporter: Chris Nauroth
>
> S3N supports end-to-end checksum verification for stored objects by passing the MD5.  This feature is missing from S3A.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org