You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Ralf Heyde <rh...@hubrick.com> on 2015/03/20 14:53:18 UTC

Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)

Hey,

We want to run a Job, accessing S3, from EC2 instances. The Job runs in a
self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything
works as expected.

i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4 of
their API there, means: access is only possible via: AWS4-HMAC-SHA256

This is still ok, but I dont get access there. What I tried already:

All of the Approaches I tried with these URLs:
A) "s3n://<key>:<secret>@<bucket>/<path>/"
B) "s3://<key>:<secret>@<bucket>/<path>/"
C) "s3n://<bucket>/<path>/"
D) "s3://<bucket>/<path>/"

1a. setting Environment Variables in the operating system
1b. found something, to set AccessKey/Secret in SparkConf like that (I
guess, this does not have any effect)
   sc.set("​AWS_ACCESS_KEY_ID", id)
   sc.set("​AWS_SECRET_ACCESS_KEY", secret)

2. tried to use a "more up to date" jets3t client (somehow I was not able
to get the "new" version running)
3. tried in-URL basic authentication (A+B)
4. Setting the hadoop configuration:
hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl",
"org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

-->
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for
'/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error
Message: <?xml version="1.0"
encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The
authorization mechanism you have provided is not supported. Please use
AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>

2. setting Hadoop Configuration
hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

-->
Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
ResponseCode=400, ResponseMessage=Bad Request

5. without Hadoop Config
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access
Key ID and Secret Access Key must be specified as the username or password
(respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or
fs.s3.awsSecretAccessKey properties (respectively).

6. without Hadoop Config but passed in S3 URL
with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
org.jets3t.service.S3ServiceException: S3 HEAD request failed for
'/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
ResponseCode=400, ResponseMessage=Bad Request
with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS
Access Key ID and Secret Access Key must be specified as the username or
password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId
or fs.s3.awsSecretAccessKey properties (respectively).


Drilled down in the Job, I can see, that the RestStorageService recognizes
AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) ->
i replaced the key / encoded secret with XXX_*_XXX:

15/03/20 11:25:31 WARN RestStorageService: Retrying request with
"AWS4-HMAC-SHA256" signing mechanism: GET
https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F
HTTP/1.1
15/03/20 11:25:31 WARN RestStorageService: Retrying request following error
response: GET
'/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
-- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS
XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers:
[x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2:
rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=,
x-amz-region: eu-central-1, Content-Type: application/xml,
Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT,
Connection: close, Server: AmazonS3]
15/03/20 11:25:32 WARN RestStorageService: Retrying request after automatic
adjustment of Host endpoint from "frankfurt.ingestion.batch.s3.amazonaws.com"
to "frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following
request signing error using AWS request signing version 4: GET
https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/
HTTP/1.1
15/03/20 11:25:32 WARN RestStorageService: Retrying request following error
response: GET
'/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
-- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host:
frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date: 20150320T112531Z,
Authorization: AWS4-HMAC-SHA256
Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4],
Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2:
V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=,
Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20
Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz


Do you have any Ideas? Was somebody of you already able to access S3 in
Frankfurt, if so - how?

Cheers Ralf

Re: Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)

Posted by Steve Loughran <st...@hortonworks.com>.
1. make sure your secret key doesn't have a "/" in it. If it does, generate a new key.
2. jets3t and hadoop JAR versions need to be in sync;  jets3t 0.9.0 was picked up in Hadoop 2.4 and not AFAIK
3. Hadoop 2.6 has a new S3 client, "s3a", which compatible with s3n data. It uses the AWS toolkit over JetS3t, where all future dev is going. Assuming it is up date with the AWS toolkit, it will do the auth. Not knowingly tested against frankfurt though; just ireland, US east, US west & Japan.  S3a still has some quirks being worked through; HADOOP-11571 lists the set fixed.

On 20 Mar 2015, at 15:15, Ralf Heyde <rh...@hubrick.com>> wrote:

Good Idea, will try that.
But assuming, "only" data is located there, the problem will still occur.

On Fri, Mar 20, 2015 at 3:08 PM, Gourav Sengupta <go...@gmail.com>> wrote:
Hi Ralf,

using secret keys and authorization details is a strict NO for AWS, they are major security lapses and should be avoided at any cost.

Have you tried starting the clusters using ROLES, they are wonderful way to start clusters or EC2 nodes and you do not have to copy and paste any permissions either.

Try going through this article in AWS: http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html (though for datapipeline, they show the correct set of permissions to enable).

I start EC2 nodes using roles (as mentioned in the link above), run the aws cli commands (without copying any keys or files).

Please let me know if the issue was resolved.

Regards,
Gourav

On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <rh...@hubrick.com>> wrote:
Hey,

We want to run a Job, accessing S3, from EC2 instances. The Job runs in a self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything works as expected.

i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4 of their API there, means: access is only possible via: AWS4-HMAC-SHA256

This is still ok, but I dont get access there. What I tried already:

All of the Approaches I tried with these URLs:
A) "s3n://<key>:<secret>@<bucket>/<path>/"
B) "s3://<key>:<secret>@<bucket>/<path>/"
C) "s3n://<bucket>/<path>/"
D) "s3://<bucket>/<path>/"

1a. setting Environment Variables in the operating system
1b. found something, to set AccessKey/Secret in SparkConf like that (I guess, this does not have any effect)
   sc.set("​AWS_ACCESS_KEY_ID", id)
   sc.set("​AWS_SECRET_ACCESS_KEY", secret)

2. tried to use a "more up to date" jets3t client (somehow I was not able to get the "new" version running)
3. tried in-URL basic authentication (A+B)
4. Setting the hadoop configuration:
hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3.S3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

-->
Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>

2. setting Hadoop Configuration
hadoopConfiguration.set("fs.s3n.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);

hadoopConfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");

-->
Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request

5. without Hadoop Config
Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).

6. without Hadoop Config but passed in S3 URL
with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.S3ServiceException: S3 HEAD request failed for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' - ResponseCode=400, ResponseMessage=Bad Request
with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS Access Key ID and Secret Access Key must be specified as the username or password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).


Drilled down in the Job, I can see, that the RestStorageService recognizes AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) -> i replaced the key / encoded secret with XXX_*_XXX:

15/03/20 11:25:31 WARN RestStorageService: Retrying request with "AWS4-HMAC-SHA256" signing mechanism: GET https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F<https://frankfurt.ingestion.batch.s3.amazonaws.com/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F> HTTP/1.1
15/03/20 11:25:31 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers: [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2: rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=, x-amz-region: eu-central-1, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT, Connection: close, Server: AmazonS3]
15/03/20 11:25:32 WARN RestStorageService: Retrying request after automatic adjustment of Host endpoint from "frankfurt.ingestion.batch.s3.amazonaws.com<http://frankfurt.ingestion.batch.s3.amazonaws.com/>" to "frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com<http://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com/>" following request signing error using AWS request signing version 4: GET https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/<https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/> HTTP/1.1
15/03/20 11:25:32 WARN RestStorageService: Retrying request following error response: GET '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/' -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date: Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256: e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host: frankfurt.ingestion.batch.s3.amazonaws.com<http://frankfurt.ingestion.batch.s3.amazonaws.com/>, x-amz-date: 20150320T112531Z, Authorization: AWS4-HMAC-SHA256 Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4], Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2: V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz


Do you have any Ideas? Was somebody of you already able to access S3 in Frankfurt, if so - how?

Cheers Ralf






Re: Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)

Posted by Ralf Heyde <rh...@hubrick.com>.
Good Idea, will try that.
But assuming, "only" data is located there, the problem will still occur.

On Fri, Mar 20, 2015 at 3:08 PM, Gourav Sengupta <go...@gmail.com>
wrote:

> Hi Ralf,
>
> using secret keys and authorization details is a strict NO for AWS, they
> are major security lapses and should be avoided at any cost.
>
> Have you tried starting the clusters using ROLES, they are wonderful way
> to start clusters or EC2 nodes and you do not have to copy and paste any
> permissions either.
>
> Try going through this article in AWS:
> http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html
> (though for datapipeline, they show the correct set of permissions to
> enable).
>
> I start EC2 nodes using roles (as mentioned in the link above), run the
> aws cli commands (without copying any keys or files).
>
> Please let me know if the issue was resolved.
>
> Regards,
> Gourav
>
> On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <rh...@hubrick.com> wrote:
>
>> Hey,
>>
>> We want to run a Job, accessing S3, from EC2 instances. The Job runs in a
>> self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything
>> works as expected.
>>
>> i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4
>> of their API there, means: access is only possible via: AWS4-HMAC-SHA256
>>
>> This is still ok, but I dont get access there. What I tried already:
>>
>> All of the Approaches I tried with these URLs:
>> A) "s3n://<key>:<secret>@<bucket>/<path>/"
>> B) "s3://<key>:<secret>@<bucket>/<path>/"
>> C) "s3n://<bucket>/<path>/"
>> D) "s3://<bucket>/<path>/"
>>
>> 1a. setting Environment Variables in the operating system
>> 1b. found something, to set AccessKey/Secret in SparkConf like that (I
>> guess, this does not have any effect)
>>    sc.set("​AWS_ACCESS_KEY_ID", id)
>>    sc.set("​AWS_SECRET_ACCESS_KEY", secret)
>>
>> 2. tried to use a "more up to date" jets3t client (somehow I was not able
>> to get the "new" version running)
>> 3. tried in-URL basic authentication (A+B)
>> 4. Setting the hadoop configuration:
>> hadoopConfiguration.set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3.S3FileSystem");
>> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
>> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>>
>> hadoopConfiguration.set("fs.s3.impl",
>> "org.apache.hadoop.fs.s3.S3FileSystem");
>> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
>> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>>
>> -->
>> Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for
>> '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error
>> Message: <?xml version="1.0"
>> encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The
>> authorization mechanism you have provided is not supported. Please use
>> AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>
>>
>> 2. setting Hadoop Configuration
>> hadoopConfiguration.set("fs.s3n.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
>> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
>> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>>
>> hadoopConfiguration.set("fs.s3.impl",
>> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
>> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
>> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>>
>> -->
>> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
>> for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
>> ResponseCode=400, ResponseMessage=Bad Request
>>
>> 5. without Hadoop Config
>> Exception in thread "main" java.lang.IllegalArgumentException: AWS Access
>> Key ID and Secret Access Key must be specified as the username or password
>> (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or
>> fs.s3.awsSecretAccessKey properties (respectively).
>>
>> 6. without Hadoop Config but passed in S3 URL
>> with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
>> org.jets3t.service.S3ServiceException: S3 HEAD request failed for
>> '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
>> ResponseCode=400, ResponseMessage=Bad Request
>> with B) Exception in thread "main" java.lang.IllegalArgumentException:
>> AWS Access Key ID and Secret Access Key must be specified as the username
>> or password (respectively) of a s3 URL, or by setting the
>> fs.s3.awsAccessKeyId or fs.s3.awsSecretAccessKey properties (respectively).
>>
>>
>> Drilled down in the Job, I can see, that the RestStorageService
>> recognizes AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log
>> below) -> i replaced the key / encoded secret with XXX_*_XXX:
>>
>> 15/03/20 11:25:31 WARN RestStorageService: Retrying request with
>> "AWS4-HMAC-SHA256" signing mechanism: GET
>> https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F
>> HTTP/1.1
>> 15/03/20 11:25:31 WARN RestStorageService: Retrying request following
>> error response: GET
>> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
>> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
>> Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS
>> XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers:
>> [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2:
>> rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=,
>> x-amz-region: eu-central-1, Content-Type: application/xml,
>> Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT,
>> Connection: close, Server: AmazonS3]
>> 15/03/20 11:25:32 WARN RestStorageService: Retrying request after
>> automatic adjustment of Host endpoint from "
>> frankfurt.ingestion.batch.s3.amazonaws.com" to "
>> frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following
>> request signing error using AWS request signing version 4: GET
>> https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/
>> HTTP/1.1
>> 15/03/20 11:25:32 WARN RestStorageService: Retrying request following
>> error response: GET
>> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
>> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
>> Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256:
>> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host:
>> frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date:
>> 20150320T112531Z, Authorization: AWS4-HMAC-SHA256
>> Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4],
>> Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2:
>> V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=,
>> Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20
>> Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
>> Exception in thread "main"
>> org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:
>> s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz
>>
>>
>> Do you have any Ideas? Was somebody of you already able to access S3 in
>> Frankfurt, if so - how?
>>
>> Cheers Ralf
>>
>>
>>
>

Re: Accessing AWS S3 in Frankfurt (v4 only - AWS4-HMAC-SHA256)

Posted by Gourav Sengupta <go...@gmail.com>.
Hi Ralf,

using secret keys and authorization details is a strict NO for AWS, they
are major security lapses and should be avoided at any cost.

Have you tried starting the clusters using ROLES, they are wonderful way to
start clusters or EC2 nodes and you do not have to copy and paste any
permissions either.

Try going through this article in AWS:
http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-iam-roles.html
(though for datapipeline, they show the correct set of permissions to
enable).

I start EC2 nodes using roles (as mentioned in the link above), run the aws
cli commands (without copying any keys or files).

Please let me know if the issue was resolved.

Regards,
Gourav

On Fri, Mar 20, 2015 at 1:53 PM, Ralf Heyde <rh...@hubrick.com> wrote:

> Hey,
>
> We want to run a Job, accessing S3, from EC2 instances. The Job runs in a
> self-provided Spark Cluster (1.3.0) on EC2 instances. In Irland everything
> works as expected.
>
> i just tried to move data from Irland -> Frankfurt. AWS S3 is forcing v4
> of their API there, means: access is only possible via: AWS4-HMAC-SHA256
>
> This is still ok, but I dont get access there. What I tried already:
>
> All of the Approaches I tried with these URLs:
> A) "s3n://<key>:<secret>@<bucket>/<path>/"
> B) "s3://<key>:<secret>@<bucket>/<path>/"
> C) "s3n://<bucket>/<path>/"
> D) "s3://<bucket>/<path>/"
>
> 1a. setting Environment Variables in the operating system
> 1b. found something, to set AccessKey/Secret in SparkConf like that (I
> guess, this does not have any effect)
>    sc.set("​AWS_ACCESS_KEY_ID", id)
>    sc.set("​AWS_SECRET_ACCESS_KEY", secret)
>
> 2. tried to use a "more up to date" jets3t client (somehow I was not able
> to get the "new" version running)
> 3. tried in-URL basic authentication (A+B)
> 4. Setting the hadoop configuration:
> hadoopConfiguration.set("fs.s3n.impl",
> "org.apache.hadoop.fs.s3.S3FileSystem");
> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>
> hadoopConfiguration.set("fs.s3.impl",
> "org.apache.hadoop.fs.s3.S3FileSystem");
> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>
> -->
> Caused by: org.jets3t.service.S3ServiceException: S3 GET failed for
> '/%2FEAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' XML Error
> Message: <?xml version="1.0"
> encoding="UTF-8"?><Error><Code>InvalidRequest</Code><Message>The
> authorization mechanism you have provided is not supported. Please use
> AWS4-HMAC-SHA256.</Message><RequestId>43F8F02E767DC4A2</RequestId><HostId>wgMeAEYcZZa/2BazQ9TA+PAkUxt5l+ExnT4Emb+1Uk5KhWfJu5C8Xcesm1AXCfJ9nZJMyh4wPX8=</HostId></Error>
>
> 2. setting Hadoop Configuration
> hadoopConfiguration.set("fs.s3n.impl",
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
> hadoopConfiguration.set("fs.s3n.awsAccessKeyId", key);
> hadoopConfiguration.set("fs.s3n.awsSecretAccessKey", secret);
>
> hadoopConfiguration.set("fs.s3.impl",
> "org.apache.hadoop.fs.s3native.NativeS3FileSystem");
> hadoopConfiguration.set("fs.s3.awsAccessKeyId", "myAccessKey");
> hadoopConfiguration.set("fs.s3.awsSecretAccessKey", "myAccessSecret");
>
> -->
> Caused by: org.jets3t.service.S3ServiceException: S3 HEAD request failed
> for '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
> ResponseCode=400, ResponseMessage=Bad Request
>
> 5. without Hadoop Config
> Exception in thread "main" java.lang.IllegalArgumentException: AWS Access
> Key ID and Secret Access Key must be specified as the username or password
> (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId or
> fs.s3.awsSecretAccessKey properties (respectively).
>
> 6. without Hadoop Config but passed in S3 URL
> with A) Exception in thread "main" org.apache.hadoop.fs.s3.S3Exception:
> org.jets3t.service.S3ServiceException: S3 HEAD request failed for
> '/EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz' -
> ResponseCode=400, ResponseMessage=Bad Request
> with B) Exception in thread "main" java.lang.IllegalArgumentException: AWS
> Access Key ID and Secret Access Key must be specified as the username or
> password (respectively) of a s3 URL, or by setting the fs.s3.awsAccessKeyId
> or fs.s3.awsSecretAccessKey properties (respectively).
>
>
> Drilled down in the Job, I can see, that the RestStorageService recognizes
> AWS4-HMAC-SHA256 ... but somehow it gets a ResponseCode 400 (log below) ->
> i replaced the key / encoded secret with XXX_*_XXX:
>
> 15/03/20 11:25:31 WARN RestStorageService: Retrying request with
> "AWS4-HMAC-SHA256" signing mechanism: GET
> https://frankfurt.ingestion.batch.s3.amazonaws.com:443/?max-keys=1&prefix=EAN%2F2015-03-09-72640385%2Finput%2FHotelImageList.gz%2F&delimiter=%2F
> HTTP/1.1
> 15/03/20 11:25:31 WARN RestStorageService: Retrying request following
> error response: GET
> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
> Fri, 20 Mar 2015 11:25:31 GMT, Authorization: AWS
> XXX_MY_KEY_XXX:XXX_I_GUESS_SECRET_XXX], Response Headers:
> [x-amz-request-id: 7E6F85873D69D14E, x-amz-id-2:
> rGFW+kRfURzz3DlY/m/M8h054MmHu8bxJAtKVHUmov/VY7pBXvtMvbQTXxA7bffpu4xxf4rGmL4=,
> x-amz-region: eu-central-1, Content-Type: application/xml,
> Transfer-Encoding: chunked, Date: Fri, 20 Mar 2015 11:25:31 GMT,
> Connection: close, Server: AmazonS3]
> 15/03/20 11:25:32 WARN RestStorageService: Retrying request after
> automatic adjustment of Host endpoint from "
> frankfurt.ingestion.batch.s3.amazonaws.com" to "
> frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com" following
> request signing error using AWS request signing version 4: GET
> https://frankfurt.ingestion.batch.s3-eu-central-1.amazonaws.com:443/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/
> HTTP/1.1
> 15/03/20 11:25:32 WARN RestStorageService: Retrying request following
> error response: GET
> '/?max-keys=1&prefix=EAN/2015-03-09-72640385/input/HotelImageList.gz/&delimiter=/'
> -- ResponseCode: 400, ResponseStatus: Bad Request, Request Headers: [Date:
> Fri, 20 Mar 2015 11:25:31 GMT, x-amz-content-sha256:
> e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855, Host:
> frankfurt.ingestion.batch.s3.amazonaws.com, x-amz-date: 20150320T112531Z,
> Authorization: AWS4-HMAC-SHA256
> Credential=XXX_MY_KEY_XXX/20150320/us-east-1/s3/aws4_request,SignedHeaders=date;host;x-amz-content-sha256;x-amz-date,Signature=2098d3175c4304e44be912b770add7594d1d1b44f545c3025be1748672ec60e4],
> Response Headers: [x-amz-request-id: 5CABCD0D3046B267, x-amz-id-2:
> V65tW1lbSybbN3R3RMKBjJFz7xUgJDubSUm/XKXTypg7qfDtkSFRt2I9CMo2Qo2OAA+E44hiazg=,
> Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 20
> Mar 2015 11:25:32 GMT, Connection: close, Server: AmazonS3]
> Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException:
> Input path does not exist:
> s3n://frankfurt.ingestion.batch/EAN/2015-03-09-72640385/input/HotelImageList.gz
>
>
> Do you have any Ideas? Was somebody of you already able to access S3 in
> Frankfurt, if so - how?
>
> Cheers Ralf
>
>
>