You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by Dan Brickley <da...@danbri.org> on 2011/11/08 23:05:19 UTC

Pig access to Amazon S3 'requester pays' buckets

I understand Pig supports Amazon S3 storage, however in trying to
access an S3 bucket configured as 'requester pays', I get access
denied 403, "AWS Request ID: ..., AWS Error Code: AccessDenied, AWS
Error Message: Access Denied"

More on 'requester pays' here,
http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?ObjectsinRequesterPaysBuckets.html

The grunt> shell let me cd/ to the  s3://... path, but I tried 'ls'
and that triggers the error. Searching around, this doesn't seem to be
a common problem. Maybe I should be asking on the Amazon forums but
afaik the s3 support is built-in. Is there any prospect this can be
made to work?

thanks for any tips,

Dan

Re: Pig access to Amazon S3 'requester pays' buckets

Posted by Dan Brickley <da...@danbri.org>.

On 12 November 2011 14:09, Dan Brickley <da...@danbri.org> wrote:

> OK, adding
> httpclient.requester-pays-buckets-enabled=true
>
> ...to /home/hadoop/conf/jets3t.properties (which already existed, and
> didn't have that setting)

... I forgot the critical words, '...without success or any change in
behaviour'. Apologies for the extra mail.

(At least this thread is showing up in Google searches now, so maybe
it'll give others some pointers...)

Re: Pig access to Amazon S3 'requester pays' buckets

Posted by Dan Brickley <da...@danbri.org>.

On 9 November 2011 22:01, Dan Brickley <da...@danbri.org> wrote:
> On 9 November 2011 20:59, Daniel Dai <da...@hortonworks.com> wrote:
>> How about "hadoop fs -ls"? Pig rely on hadoop for s3 access.

hadoop@ip-10-100-246-61:~$  hadoop fs -ls  s3n://commoncrawl-crawl-002/
ls: Access Denied
hadoop@ip-10-100-246-61:~$  hadoop fs -ls  s3://commoncrawl-crawl-002/
ls: Access Denied

I realise I've seen both s3: and s3n: prefixes used; Pig behaves the
same with both, it seems. As does 'hadoop fs -ls'.


>> If hadoop does not support this, the fix should be on hadoop side.
>
> Ah, thanks. I've shut down the instance for now, will check next time
> it's reactivated. However from a quick look elsewhere I find
> https://issues.apache.org/jira/browse/HADOOP-6146
> "The JetS3t library is used for the S3 filesystems. We should upgrade
> to the latest version (0.7.1) which has support for Requester Pays
> buckets." ...and that was integrated into Hadoop back in July 2009.
>
> http://www.jets3t.org/toolkit/configuration.html has "Requester Pays Settings":
> httpclient.requester-pays-buckets-enabled       When set to true, JetS3t
> will be able to access Requester Pays buckets and the library user
> will be liable for any subsequent S3 request and bandwidth fees.
>
> "The main configuration properties for the JetS3t toolkit and
> applications are stored in a file called jets3t.properties. This file
> specifies advanced communication properties. A properties file with
> default settings is included in the configs directory JetS3t release."
>
> It looks from searching around as if jets3t.properties should probably
> live in Hadoop's conf directory. Investigating...

OK, adding

# danbri added
httpclient.requester-pays-buckets-enabled=true

...to /home/hadoop/conf/jets3t.properties (which already existed, and
didn't have that setting)

There are some other possibly relevant fields in there (devpay etc.);
I'll have a read around. But yes, this doesn't seem a Pig-specific
issue. Thanks for the help...

cheers,

Dan

Re: Pig access to Amazon S3 'requester pays' buckets

Posted by Dan Brickley <da...@danbri.org>.

On 9 November 2011 20:59, Daniel Dai <da...@hortonworks.com> wrote:
> How about "hadoop fs -ls"? Pig rely on hadoop for s3 access. If hadoop does
> not support this, the fix should be on hadoop side.

Ah, thanks. I've shut down the instance for now, will check next time
it's reactivated. However from a quick look elsewhere I find
https://issues.apache.org/jira/browse/HADOOP-6146
"The JetS3t library is used for the S3 filesystems. We should upgrade
to the latest version (0.7.1) which has support for Requester Pays
buckets." ...and that was integrated into Hadoop back in July 2009.

http://www.jets3t.org/toolkit/configuration.html has "Requester Pays Settings":
httpclient.requester-pays-buckets-enabled	When set to true, JetS3t
will be able to access Requester Pays buckets and the library user
will be liable for any subsequent S3 request and bandwidth fees.

"The main configuration properties for the JetS3t toolkit and
applications are stored in a file called jets3t.properties. This file
specifies advanced communication properties. A properties file with
default settings is included in the configs directory JetS3t release."

It looks from searching around as if jets3t.properties should probably
live in Hadoop's conf directory. Investigating...

cheers,

Dan

Re: Pig access to Amazon S3 'requester pays' buckets

Posted by Daniel Dai <da...@hortonworks.com>.

How about "hadoop fs -ls"? Pig rely on hadoop for s3 access. If hadoop does
not support this, the fix should be on hadoop side.

Daniel

On Tue, Nov 8, 2011 at 2:05 PM, Dan Brickley <da...@danbri.org> wrote:

> I understand Pig supports Amazon S3 storage, however in trying to
> access an S3 bucket configured as 'requester pays', I get access
> denied 403, "AWS Request ID: ..., AWS Error Code: AccessDenied, AWS
> Error Message: Access Denied"
>
> More on 'requester pays' here,
>
> http://docs.amazonwebservices.com/AmazonS3/latest/dev/index.html?ObjectsinRequesterPaysBuckets.html
>
> The grunt> shell let me cd/ to the  s3://... path, but I tried 'ls'
> and that triggers the error. Searching around, this doesn't seem to be
> a common problem. Maybe I should be asking on the Amazon forums but
> afaik the s3 support is built-in. Is there any prospect this can be
> made to work?
>
> thanks for any tips,
>
> Dan
>