You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jonhy Stack <so...@gmail.com> on 2017/03/07 15:21:37 UTC

(python) Spark .textFile(s3://…) access denied 403 with valid credentials

In order to access my S3 bucket i have exported my creds

    export AWS_SECRET_ACCESS_KEY=
    export AWS_ACCESSS_ACCESS_KEY=

I can verify that everything works by doing

    aws s3 ls mybucket

I can also verify with boto3 that it works in python

    resource = boto3.resource("s3", region_name="us-east-1")
    resource.Object("mybucket", "text/text.py") \
                .put(Body=open("text.py", "rb"),ContentType="text/x-py")

This works and I can see the file in the bucket.

However when I do this with spark:

    spark_context = SparkContext()
    sql_context = SQLContext(spark_context)
    spark_context.textFile("s3://mybucket/my/path/*)

I get a nice

    > Caused by: org.jets3t.service.S3ServiceException: Service Error
    > Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error
    > Message: <?xml version="1.0"
    > encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The
    > AWS Access Key Id you provided does not exist in our
    > records.</Message><AWSAccessKeyId>[MY_ACCESS_KEY]</AWSAccess
KeyId><RequestId>XXXXX</RequestId><HostId>xxxxxxx</HostId></Error>

this is how I submit the job locally

spark-submit --packages com.amazonaws:aws-java-sdk-pom
:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py

Why does it works with command line + boto3 but spark is chocking ?

Re: (python) Spark .textFile(s3://…) access denied 403 with valid credentials

Posted by Amjad ALSHABANI <as...@gmail.com>.
Hi Jonhy,

What is the master you are using with spark-submit?

I ve had this problem before because Spark (different from CLI and boto3)
 was running in Yarn distributed mode (--master yarn) So the keys  were not
copied to all the executors' nodes so I have had to submit my spark job as
following:

$ spark-submit --master yarn-client --conf
"spark.executor.extraJavaOptions=-Daws.accessKeyId=ACCESSKEY
-Daws.secretKey=SECRETKEY"
....

I hope this will help


Amjad

On Tue, Mar 7, 2017 at 4:21 PM, Jonhy Stack <so...@gmail.com> wrote:

> In order to access my S3 bucket i have exported my creds
>
>     export AWS_SECRET_ACCESS_KEY=
>     export AWS_ACCESSS_ACCESS_KEY=
>
> I can verify that everything works by doing
>
>     aws s3 ls mybucket
>
> I can also verify with boto3 that it works in python
>
>     resource = boto3.resource("s3", region_name="us-east-1")
>     resource.Object("mybucket", "text/text.py") \
>                 .put(Body=open("text.py", "rb"),ContentType="text/x-py")
>
> This works and I can see the file in the bucket.
>
> However when I do this with spark:
>
>     spark_context = SparkContext()
>     sql_context = SQLContext(spark_context)
>     spark_context.textFile("s3://mybucket/my/path/*)
>
> I get a nice
>
>     > Caused by: org.jets3t.service.S3ServiceException: Service Error
>     > Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error
>     > Message: <?xml version="1.0"
>     > encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The
>     > AWS Access Key Id you provided does not exist in our
>     > records.</Message><AWSAccessKeyId>[MY_ACCESS_KEY]</AWSAccess
> KeyId><RequestId>XXXXX</RequestId><HostId>xxxxxxx</HostId></Error>
>
> this is how I submit the job locally
>
> spark-submit --packages com.amazonaws:aws-java-sdk-pom
> :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
>
> Why does it works with command line + boto3 but spark is chocking ?
>