You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Anis Elleuch (JIRA)" <ji...@apache.org> on 2018/02/27 01:20:00 UTC
[jira] [Updated] (HADOOP-15267) S3A fails to store my data when
multipart size is set ot 5 Mb and SSE-C encryption is enabled
[ https://issues.apache.org/jira/browse/HADOOP-15267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anis Elleuch updated HADOOP-15267:
----------------------------------
Description:
When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size to 5 Mb, storing data in AWS doesn't work anymore. For example, running the following code:
{code}
>>> df1 = spark.read.json('/home/user/people.json')
>>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
{code}
shows the following exception:
{code:java}
com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload initiate requested encryption. Subsequent part requests must include the appropriate encryption parameters.
{code}
After some investigation, I discovered that hadoop-aws doesn't send SSE-C headers in Put Object Part as stated in AWS specification: [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
{code:java}
If you requested server-side encryption using a customer-provided encryption key in your initiate multipart upload request, you must provide identical encryption information in each part upload using the following headers.
{code}
You can find a patch attached to this issue for a better clarification of the problem.
was:
With Spark with Hadoop 3.1.0, when I enable SSE-C encryption and set fs.s3a.multipart.size to 5 Mb, storing data in AWS won't work anymore. For example, running the following code:
{code}
>>> df1 = spark.read.json('/home/user/people.json')
>>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
{code}
shows the following exception:
{code:java}
com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload initiate requested encryption. Subsequent part requests must include the appropriate encryption parameters.
{code}
After some investigation, I discovered that hadoop-aws doesn't send SSE-C headers in Put Object Part as stated in AWS specification: [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
{code:java}
If you requested server-side encryption using a customer-provided encryption key in your initiate multipart upload request, you must provide identical encryption information in each part upload using the following headers.
{code}
You can find a patch attached to this issue for a better clarification of the problem.
> S3A fails to store my data when multipart size is set ot 5 Mb and SSE-C encryption is enabled
> ---------------------------------------------------------------------------------------------
>
> Key: HADOOP-15267
> URL: https://issues.apache.org/jira/browse/HADOOP-15267
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs/s3
> Affects Versions: 3.1.0
> Environment: Hadoop 3.1 Snapshot
> Reporter: Anis Elleuch
> Priority: Critical
> Attachments: hadoop-fix.patch
>
>
> When I enable SSE-C encryption in Hadoop 3.1 and set fs.s3a.multipart.size to 5 Mb, storing data in AWS doesn't work anymore. For example, running the following code:
> {code}
> >>> df1 = spark.read.json('/home/user/people.json')
> >>> df1.write.mode("overwrite").json("s3a://testbucket/people.json")
> {code}
> shows the following exception:
> {code:java}
> com.amazonaws.services.s3.model.AmazonS3Exception: The multipart upload initiate requested encryption. Subsequent part requests must include the appropriate encryption parameters.
> {code}
> After some investigation, I discovered that hadoop-aws doesn't send SSE-C headers in Put Object Part as stated in AWS specification: [https://docs.aws.amazon.com/AmazonS3/latest/API/mpUploadUploadPart.html]
> {code:java}
> If you requested server-side encryption using a customer-provided encryption key in your initiate multipart upload request, you must provide identical encryption information in each part upload using the following headers.
> {code}
>
> You can find a patch attached to this issue for a better clarification of the problem.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org