You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Jonhy Stack <so...@gmail.com> on 2017/03/07 15:21:37 UTC
(python) Spark .textFile(s3://…) access denied 403 with valid credentials
In order to access my S3 bucket i have exported my creds
export AWS_SECRET_ACCESS_KEY=
export AWS_ACCESSS_ACCESS_KEY=
I can verify that everything works by doing
aws s3 ls mybucket
I can also verify with boto3 that it works in python
resource = boto3.resource("s3", region_name="us-east-1")
resource.Object("mybucket", "text/text.py") \
.put(Body=open("text.py", "rb"),ContentType="text/x-py")
This works and I can see the file in the bucket.
However when I do this with spark:
spark_context = SparkContext()
sql_context = SQLContext(spark_context)
spark_context.textFile("s3://mybucket/my/path/*)
I get a nice
> Caused by: org.jets3t.service.S3ServiceException: Service Error
> Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error
> Message: <?xml version="1.0"
> encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The
> AWS Access Key Id you provided does not exist in our
> records.</Message><AWSAccessKeyId>[MY_ACCESS_KEY]</AWSAccess
KeyId><RequestId>XXXXX</RequestId><HostId>xxxxxxx</HostId></Error>
this is how I submit the job locally
spark-submit --packages com.amazonaws:aws-java-sdk-pom
:1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
Why does it works with command line + boto3 but spark is chocking ?
Re: (python) Spark .textFile(s3://…) access denied 403 with valid credentials
Posted by Amjad ALSHABANI <as...@gmail.com>.
Hi Jonhy,
What is the master you are using with spark-submit?
I ve had this problem before because Spark (different from CLI and boto3)
was running in Yarn distributed mode (--master yarn) So the keys were not
copied to all the executors' nodes so I have had to submit my spark job as
following:
$ spark-submit --master yarn-client --conf
"spark.executor.extraJavaOptions=-Daws.accessKeyId=ACCESSKEY
-Daws.secretKey=SECRETKEY"
....
I hope this will help
Amjad
On Tue, Mar 7, 2017 at 4:21 PM, Jonhy Stack <so...@gmail.com> wrote:
> In order to access my S3 bucket i have exported my creds
>
> export AWS_SECRET_ACCESS_KEY=
> export AWS_ACCESSS_ACCESS_KEY=
>
> I can verify that everything works by doing
>
> aws s3 ls mybucket
>
> I can also verify with boto3 that it works in python
>
> resource = boto3.resource("s3", region_name="us-east-1")
> resource.Object("mybucket", "text/text.py") \
> .put(Body=open("text.py", "rb"),ContentType="text/x-py")
>
> This works and I can see the file in the bucket.
>
> However when I do this with spark:
>
> spark_context = SparkContext()
> sql_context = SQLContext(spark_context)
> spark_context.textFile("s3://mybucket/my/path/*)
>
> I get a nice
>
> > Caused by: org.jets3t.service.S3ServiceException: Service Error
> > Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error
> > Message: <?xml version="1.0"
> > encoding="UTF-8"?><Error><Code>InvalidAccessKeyId</Code><Message>The
> > AWS Access Key Id you provided does not exist in our
> > records.</Message><AWSAccessKeyId>[MY_ACCESS_KEY]</AWSAccess
> KeyId><RequestId>XXXXX</RequestId><HostId>xxxxxxx</HostId></Error>
>
> this is how I submit the job locally
>
> spark-submit --packages com.amazonaws:aws-java-sdk-pom
> :1.11.98,org.apache.hadoop:hadoop-aws:2.7.3 test.py
>
> Why does it works with command line + boto3 but spark is chocking ?
>