You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Christian <en...@gmail.com> on 2015/11/05 17:25:17 UTC

Recommended change to core-site.xml template

We ended up reading and writing to S3 a ton in our Spark jobs.
For this to work, we ended up having to add s3a, and s3 key/secret pairs.
We also had to add fs.hdfs.impl to get these things to work.

I thought maybe I'd share what we did and it might be worth adding these to
the spark conf for out of the box functionality with S3.

We created:
ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml

We changed the contents form the original, adding in the following:

  <property>
    <name>fs.file.impl</name>
    <value>org.apache.hadoop.fs.LocalFileSystem</value>
  </property>

  <property>
    <name>fs.hdfs.impl</name>
    <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
  </property>

  <property>
    <name>fs.s3.impl</name>
    <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
  </property>

  <property>
    <name>fs.s3.awsAccessKeyId</name>
    <value>{{aws_access_key_id}}</value>
  </property>

  <property>
    <name>fs.s3.awsSecretAccessKey</name>
    <value>{{aws_secret_access_key}}</value>
  </property>

  <property>
    <name>fs.s3n.awsAccessKeyId</name>
    <value>{{aws_access_key_id}}</value>
  </property>

  <property>
    <name>fs.s3n.awsSecretAccessKey</name>
    <value>{{aws_secret_access_key}}</value>
  </property>

  <property>
    <name>fs.s3a.awsAccessKeyId</name>
    <value>{{aws_access_key_id}}</value>
  </property>

  <property>
    <name>fs.s3a.awsSecretAccessKey</name>
    <value>{{aws_secret_access_key}}</value>
  </property>

This change makes spark on ec2 work out of the box for us. It took us
several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
version 2.

Best Regards,
Christian

Re: Recommended change to core-site.xml template

Posted by Shivaram Venkataraman <sh...@eecs.berkeley.edu>.

Thanks for investigating this. The right place to add these is the
core-site.xml template we have at
https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/spark/conf/core-site.xml
and/or https://github.com/amplab/spark-ec2/blob/branch-1.5/templates/root/ephemeral-hdfs/conf/core-site.xml

Feel free to open a PR against the amplab/spark-ec2 repository for this.

Thanks
Shivaram

On Thu, Nov 5, 2015 at 8:25 AM, Christian <en...@gmail.com> wrote:
> We ended up reading and writing to S3 a ton in our Spark jobs.
> For this to work, we ended up having to add s3a, and s3 key/secret pairs. We
> also had to add fs.hdfs.impl to get these things to work.
>
> I thought maybe I'd share what we did and it might be worth adding these to
> the spark conf for out of the box functionality with S3.
>
> We created:
> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>
> We changed the contents form the original, adding in the following:
>
>   <property>
>     <name>fs.file.impl</name>
>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>   </property>
>
>   <property>
>     <name>fs.hdfs.impl</name>
>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>   </property>
>
>   <property>
>     <name>fs.s3.impl</name>
>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>   </property>
>
>   <property>
>     <name>fs.s3.awsAccessKeyId</name>
>     <value>{{aws_access_key_id}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3.awsSecretAccessKey</name>
>     <value>{{aws_secret_access_key}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3n.awsAccessKeyId</name>
>     <value>{{aws_access_key_id}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3n.awsSecretAccessKey</name>
>     <value>{{aws_secret_access_key}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3a.awsAccessKeyId</name>
>     <value>{{aws_access_key_id}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3a.awsSecretAccessKey</name>
>     <value>{{aws_secret_access_key}}</value>
>   </property>
>
> This change makes spark on ec2 work out of the box for us. It took us
> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
> version 2.
>
> Best Regards,
> Christian

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: Recommended change to core-site.xml template

Posted by Christian <en...@gmail.com>.

Oh right. I forgot about the libraries being removed.
On Thu, Nov 5, 2015 at 10:35 PM Nicholas Chammas <ni...@gmail.com>
wrote:

> I might be mistaken, but yes, even with the changes you mentioned you will
> not be able to access S3 if Spark is built against Hadoop 2.6+ unless you
> install additional libraries. The issue is explained in SPARK-7481
> <https://issues.apache.org/jira/browse/SPARK-7481> and SPARK-7442
> <https://issues.apache.org/jira/browse/SPARK-7442>.
>
> On Fri, Nov 6, 2015 at 12:22 AM Christian <en...@gmail.com> wrote:
>
>> Even with the changes I mentioned above?
>> On Thu, Nov 5, 2015 at 8:10 PM Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you
>>> cannot access S3, unfortunately.
>>>
>>> On Thu, Nov 5, 2015 at 3:53 PM Christian <en...@gmail.com> wrote:
>>>
>>>> I created the cluster with the following:
>>>>
>>>> --hadoop-major-version=2
>>>> --spark-version=1.4.1
>>>>
>>>> from: spark-1.5.1-bin-hadoop1
>>>>
>>>> Are you saying there might be different behavior if I download
>>>> spark-1.5.1-hadoop-2.6 and create my cluster?
>>>>
>>>> On Thu, Nov 5, 2015 at 1:28 PM, Christian <en...@gmail.com> wrote:
>>>>
>>>>> Spark 1.5.1-hadoop1
>>>>>
>>>>> On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> > I am using both 1.4.1 and 1.5.1.
>>>>>>
>>>>>> That's the Spark version. I'm wondering what version of Hadoop your
>>>>>> Spark is built against.
>>>>>>
>>>>>> For example, when you download Spark
>>>>>> <http://spark.apache.org/downloads.html> you have to select from a
>>>>>> number of packages (under "Choose a package type"), and each is built
>>>>>> against a different version of Hadoop. When Spark is built against Hadoop
>>>>>> 2.6+, from my understanding, you need to install additional libraries
>>>>>> <https://issues.apache.org/jira/browse/SPARK-7481> to access S3.
>>>>>> When Spark is built against Hadoop 2.4 or earlier, you don't need to do
>>>>>> this.
>>>>>>
>>>>>> I'm confirming that this is what is happening in your case.
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> On Thu, Nov 5, 2015 at 12:17 PM Christian <en...@gmail.com> wrote:
>>>>>>
>>>>>>> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because
>>>>>>> of the new feature for instance-profile which greatly helps with this as
>>>>>>> well.
>>>>>>> Without the instance-profile, we got it working by copying a
>>>>>>> .aws/credentials file up to each node. We could easily automate that
>>>>>>> through the templates.
>>>>>>>
>>>>>>> I don't need any additional libraries. We just need to change the
>>>>>>> core-site.xml
>>>>>>>
>>>>>>> -Christian
>>>>>>>
>>>>>>> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
>>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>>
>>>>>>>> Thanks for sharing this, Christian.
>>>>>>>>
>>>>>>>> What build of Spark are you using? If I understand correctly, if
>>>>>>>> you are using Spark built against Hadoop 2.6+ then additional configs alone
>>>>>>>> won't help because additional libraries also need to be installed
>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>>>>>>>
>>>>>>>> Nick
>>>>>>>>
>>>>>>>> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>>>>>>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>>>>>>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>>>>>>>
>>>>>>>>> I thought maybe I'd share what we did and it might be worth adding
>>>>>>>>> these to the spark conf for out of the box functionality with S3.
>>>>>>>>>
>>>>>>>>> We created:
>>>>>>>>>
>>>>>>>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>>>>>>>
>>>>>>>>> We changed the contents form the original, adding in the following:
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.file.impl</name>
>>>>>>>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.hdfs.impl</name>
>>>>>>>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.s3.impl</name>
>>>>>>>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.s3.awsAccessKeyId</name>
>>>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>>   <property>
>>>>>>>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>>>   </property>
>>>>>>>>>
>>>>>>>>> This change makes spark on ec2 work out of the box for us. It took
>>>>>>>>> us several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>>>>>>>> version 2.
>>>>>>>>>
>>>>>>>>> Best Regards,
>>>>>>>>> Christian
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>

Re: Recommended change to core-site.xml template

Posted by Nicholas Chammas <ni...@gmail.com>.

I might be mistaken, but yes, even with the changes you mentioned you will
not be able to access S3 if Spark is built against Hadoop 2.6+ unless you
install additional libraries. The issue is explained in SPARK-7481
<https://issues.apache.org/jira/browse/SPARK-7481> and SPARK-7442
<https://issues.apache.org/jira/browse/SPARK-7442>.

On Fri, Nov 6, 2015 at 12:22 AM Christian <en...@gmail.com> wrote:

> Even with the changes I mentioned above?
> On Thu, Nov 5, 2015 at 8:10 PM Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you
>> cannot access S3, unfortunately.
>>
>> On Thu, Nov 5, 2015 at 3:53 PM Christian <en...@gmail.com> wrote:
>>
>>> I created the cluster with the following:
>>>
>>> --hadoop-major-version=2
>>> --spark-version=1.4.1
>>>
>>> from: spark-1.5.1-bin-hadoop1
>>>
>>> Are you saying there might be different behavior if I download
>>> spark-1.5.1-hadoop-2.6 and create my cluster?
>>>
>>> On Thu, Nov 5, 2015 at 1:28 PM, Christian <en...@gmail.com> wrote:
>>>
>>>> Spark 1.5.1-hadoop1
>>>>
>>>> On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> > I am using both 1.4.1 and 1.5.1.
>>>>>
>>>>> That's the Spark version. I'm wondering what version of Hadoop your
>>>>> Spark is built against.
>>>>>
>>>>> For example, when you download Spark
>>>>> <http://spark.apache.org/downloads.html> you have to select from a
>>>>> number of packages (under "Choose a package type"), and each is built
>>>>> against a different version of Hadoop. When Spark is built against Hadoop
>>>>> 2.6+, from my understanding, you need to install additional libraries
>>>>> <https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When
>>>>> Spark is built against Hadoop 2.4 or earlier, you don't need to do this.
>>>>>
>>>>> I'm confirming that this is what is happening in your case.
>>>>>
>>>>> Nick
>>>>>
>>>>> On Thu, Nov 5, 2015 at 12:17 PM Christian <en...@gmail.com> wrote:
>>>>>
>>>>>> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of
>>>>>> the new feature for instance-profile which greatly helps with this as well.
>>>>>> Without the instance-profile, we got it working by copying a
>>>>>> .aws/credentials file up to each node. We could easily automate that
>>>>>> through the templates.
>>>>>>
>>>>>> I don't need any additional libraries. We just need to change the
>>>>>> core-site.xml
>>>>>>
>>>>>> -Christian
>>>>>>
>>>>>> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
>>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>>
>>>>>>> Thanks for sharing this, Christian.
>>>>>>>
>>>>>>> What build of Spark are you using? If I understand correctly, if you
>>>>>>> are using Spark built against Hadoop 2.6+ then additional configs alone
>>>>>>> won't help because additional libraries also need to be installed
>>>>>>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>>>>>>
>>>>>>> Nick
>>>>>>>
>>>>>>> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:
>>>>>>>
>>>>>>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>>>>>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>>>>>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>>>>>>
>>>>>>>> I thought maybe I'd share what we did and it might be worth adding
>>>>>>>> these to the spark conf for out of the box functionality with S3.
>>>>>>>>
>>>>>>>> We created:
>>>>>>>>
>>>>>>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>>>>>>
>>>>>>>> We changed the contents form the original, adding in the following:
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.file.impl</name>
>>>>>>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.hdfs.impl</name>
>>>>>>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.s3.impl</name>
>>>>>>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.s3.awsAccessKeyId</name>
>>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>>   <property>
>>>>>>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>>   </property>
>>>>>>>>
>>>>>>>> This change makes spark on ec2 work out of the box for us. It took
>>>>>>>> us several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>>>>>>> version 2.
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Christian
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>

Re: Recommended change to core-site.xml template

Posted by Christian <en...@gmail.com>.

Even with the changes I mentioned above?
On Thu, Nov 5, 2015 at 8:10 PM Nicholas Chammas <ni...@gmail.com>
wrote:

> Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you
> cannot access S3, unfortunately.
>
> On Thu, Nov 5, 2015 at 3:53 PM Christian <en...@gmail.com> wrote:
>
>> I created the cluster with the following:
>>
>> --hadoop-major-version=2
>> --spark-version=1.4.1
>>
>> from: spark-1.5.1-bin-hadoop1
>>
>> Are you saying there might be different behavior if I download
>> spark-1.5.1-hadoop-2.6 and create my cluster?
>>
>> On Thu, Nov 5, 2015 at 1:28 PM, Christian <en...@gmail.com> wrote:
>>
>>> Spark 1.5.1-hadoop1
>>>
>>> On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> > I am using both 1.4.1 and 1.5.1.
>>>>
>>>> That's the Spark version. I'm wondering what version of Hadoop your
>>>> Spark is built against.
>>>>
>>>> For example, when you download Spark
>>>> <http://spark.apache.org/downloads.html> you have to select from a
>>>> number of packages (under "Choose a package type"), and each is built
>>>> against a different version of Hadoop. When Spark is built against Hadoop
>>>> 2.6+, from my understanding, you need to install additional libraries
>>>> <https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When
>>>> Spark is built against Hadoop 2.4 or earlier, you don't need to do this.
>>>>
>>>> I'm confirming that this is what is happening in your case.
>>>>
>>>> Nick
>>>>
>>>> On Thu, Nov 5, 2015 at 12:17 PM Christian <en...@gmail.com> wrote:
>>>>
>>>>> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of
>>>>> the new feature for instance-profile which greatly helps with this as well.
>>>>> Without the instance-profile, we got it working by copying a
>>>>> .aws/credentials file up to each node. We could easily automate that
>>>>> through the templates.
>>>>>
>>>>> I don't need any additional libraries. We just need to change the
>>>>> core-site.xml
>>>>>
>>>>> -Christian
>>>>>
>>>>> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
>>>>> nicholas.chammas@gmail.com> wrote:
>>>>>
>>>>>> Thanks for sharing this, Christian.
>>>>>>
>>>>>> What build of Spark are you using? If I understand correctly, if you
>>>>>> are using Spark built against Hadoop 2.6+ then additional configs alone
>>>>>> won't help because additional libraries also need to be installed
>>>>>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>>>>>
>>>>>> Nick
>>>>>>
>>>>>> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:
>>>>>>
>>>>>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>>>>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>>>>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>>>>>
>>>>>>> I thought maybe I'd share what we did and it might be worth adding
>>>>>>> these to the spark conf for out of the box functionality with S3.
>>>>>>>
>>>>>>> We created:
>>>>>>>
>>>>>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>>>>>
>>>>>>> We changed the contents form the original, adding in the following:
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.file.impl</name>
>>>>>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.hdfs.impl</name>
>>>>>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.s3.impl</name>
>>>>>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.s3.awsAccessKeyId</name>
>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>>   <property>
>>>>>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>>   </property>
>>>>>>>
>>>>>>> This change makes spark on ec2 work out of the box for us. It took
>>>>>>> us several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>>>>>> version 2.
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Christian
>>>>>>>
>>>>>>
>>>>>
>>>
>>

Re: Recommended change to core-site.xml template

Posted by Nicholas Chammas <ni...@gmail.com>.

Yep, I think if you try spark-1.5.1-hadoop-2.6 you will find that you
cannot access S3, unfortunately.

On Thu, Nov 5, 2015 at 3:53 PM Christian <en...@gmail.com> wrote:

> I created the cluster with the following:
>
> --hadoop-major-version=2
> --spark-version=1.4.1
>
> from: spark-1.5.1-bin-hadoop1
>
> Are you saying there might be different behavior if I download
> spark-1.5.1-hadoop-2.6 and create my cluster?
>
> On Thu, Nov 5, 2015 at 1:28 PM, Christian <en...@gmail.com> wrote:
>
>> Spark 1.5.1-hadoop1
>>
>> On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> > I am using both 1.4.1 and 1.5.1.
>>>
>>> That's the Spark version. I'm wondering what version of Hadoop your
>>> Spark is built against.
>>>
>>> For example, when you download Spark
>>> <http://spark.apache.org/downloads.html> you have to select from a
>>> number of packages (under "Choose a package type"), and each is built
>>> against a different version of Hadoop. When Spark is built against Hadoop
>>> 2.6+, from my understanding, you need to install additional libraries
>>> <https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When
>>> Spark is built against Hadoop 2.4 or earlier, you don't need to do this.
>>>
>>> I'm confirming that this is what is happening in your case.
>>>
>>> Nick
>>>
>>> On Thu, Nov 5, 2015 at 12:17 PM Christian <en...@gmail.com> wrote:
>>>
>>>> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of
>>>> the new feature for instance-profile which greatly helps with this as well.
>>>> Without the instance-profile, we got it working by copying a
>>>> .aws/credentials file up to each node. We could easily automate that
>>>> through the templates.
>>>>
>>>> I don't need any additional libraries. We just need to change the
>>>> core-site.xml
>>>>
>>>> -Christian
>>>>
>>>> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
>>>> nicholas.chammas@gmail.com> wrote:
>>>>
>>>>> Thanks for sharing this, Christian.
>>>>>
>>>>> What build of Spark are you using? If I understand correctly, if you
>>>>> are using Spark built against Hadoop 2.6+ then additional configs alone
>>>>> won't help because additional libraries also need to be installed
>>>>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>>>>
>>>>> Nick
>>>>>
>>>>> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:
>>>>>
>>>>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>>>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>>>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>>>>
>>>>>> I thought maybe I'd share what we did and it might be worth adding
>>>>>> these to the spark conf for out of the box functionality with S3.
>>>>>>
>>>>>> We created:
>>>>>>
>>>>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>>>>
>>>>>> We changed the contents form the original, adding in the following:
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.file.impl</name>
>>>>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.hdfs.impl</name>
>>>>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.s3.impl</name>
>>>>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.s3.awsAccessKeyId</name>
>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>>   </property>
>>>>>>
>>>>>>   <property>
>>>>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>>   </property>
>>>>>>
>>>>>> This change makes spark on ec2 work out of the box for us. It took us
>>>>>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>>>>> version 2.
>>>>>>
>>>>>> Best Regards,
>>>>>> Christian
>>>>>>
>>>>>
>>>>
>>
>

Re: Recommended change to core-site.xml template

Posted by Christian <en...@gmail.com>.

I created the cluster with the following:

--hadoop-major-version=2
--spark-version=1.4.1

from: spark-1.5.1-bin-hadoop1

Are you saying there might be different behavior if I download
spark-1.5.1-hadoop-2.6 and create my cluster?

On Thu, Nov 5, 2015 at 1:28 PM, Christian <en...@gmail.com> wrote:

> Spark 1.5.1-hadoop1
>
> On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> > I am using both 1.4.1 and 1.5.1.
>>
>> That's the Spark version. I'm wondering what version of Hadoop your Spark
>> is built against.
>>
>> For example, when you download Spark
>> <http://spark.apache.org/downloads.html> you have to select from a
>> number of packages (under "Choose a package type"), and each is built
>> against a different version of Hadoop. When Spark is built against Hadoop
>> 2.6+, from my understanding, you need to install additional libraries
>> <https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When
>> Spark is built against Hadoop 2.4 or earlier, you don't need to do this.
>>
>> I'm confirming that this is what is happening in your case.
>>
>> Nick
>>
>> On Thu, Nov 5, 2015 at 12:17 PM Christian <en...@gmail.com> wrote:
>>
>>> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of
>>> the new feature for instance-profile which greatly helps with this as well.
>>> Without the instance-profile, we got it working by copying a
>>> .aws/credentials file up to each node. We could easily automate that
>>> through the templates.
>>>
>>> I don't need any additional libraries. We just need to change the
>>> core-site.xml
>>>
>>> -Christian
>>>
>>> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
>>> nicholas.chammas@gmail.com> wrote:
>>>
>>>> Thanks for sharing this, Christian.
>>>>
>>>> What build of Spark are you using? If I understand correctly, if you
>>>> are using Spark built against Hadoop 2.6+ then additional configs alone
>>>> won't help because additional libraries also need to be installed
>>>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>>>
>>>> Nick
>>>>
>>>> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:
>>>>
>>>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>>>
>>>>> I thought maybe I'd share what we did and it might be worth adding
>>>>> these to the spark conf for out of the box functionality with S3.
>>>>>
>>>>> We created:
>>>>>
>>>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>>>
>>>>> We changed the contents form the original, adding in the following:
>>>>>
>>>>>   <property>
>>>>>     <name>fs.file.impl</name>
>>>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.hdfs.impl</name>
>>>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3.impl</name>
>>>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3.awsAccessKeyId</name>
>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>>>     <value>{{aws_access_key_id}}</value>
>>>>>   </property>
>>>>>
>>>>>   <property>
>>>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>>>     <value>{{aws_secret_access_key}}</value>
>>>>>   </property>
>>>>>
>>>>> This change makes spark on ec2 work out of the box for us. It took us
>>>>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>>>> version 2.
>>>>>
>>>>> Best Regards,
>>>>> Christian
>>>>>
>>>>
>>>
>

Re: Recommended change to core-site.xml template

Posted by Christian <en...@gmail.com>.

Spark 1.5.1-hadoop1

On Thu, Nov 5, 2015 at 10:28 AM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> > I am using both 1.4.1 and 1.5.1.
>
> That's the Spark version. I'm wondering what version of Hadoop your Spark
> is built against.
>
> For example, when you download Spark
> <http://spark.apache.org/downloads.html> you have to select from a number
> of packages (under "Choose a package type"), and each is built against a
> different version of Hadoop. When Spark is built against Hadoop 2.6+, from
> my understanding, you need to install additional libraries
> <https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When
> Spark is built against Hadoop 2.4 or earlier, you don't need to do this.
>
> I'm confirming that this is what is happening in your case.
>
> Nick
>
> On Thu, Nov 5, 2015 at 12:17 PM Christian <en...@gmail.com> wrote:
>
>> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the
>> new feature for instance-profile which greatly helps with this as well.
>> Without the instance-profile, we got it working by copying a
>> .aws/credentials file up to each node. We could easily automate that
>> through the templates.
>>
>> I don't need any additional libraries. We just need to change the
>> core-site.xml
>>
>> -Christian
>>
>> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>>> Thanks for sharing this, Christian.
>>>
>>> What build of Spark are you using? If I understand correctly, if you are
>>> using Spark built against Hadoop 2.6+ then additional configs alone won't
>>> help because additional libraries also need to be installed
>>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>>
>>> Nick
>>>
>>> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:
>>>
>>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>>
>>>> I thought maybe I'd share what we did and it might be worth adding
>>>> these to the spark conf for out of the box functionality with S3.
>>>>
>>>> We created:
>>>>
>>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>>
>>>> We changed the contents form the original, adding in the following:
>>>>
>>>>   <property>
>>>>     <name>fs.file.impl</name>
>>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.hdfs.impl</name>
>>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.s3.impl</name>
>>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.s3.awsAccessKeyId</name>
>>>>     <value>{{aws_access_key_id}}</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>>     <value>{{aws_secret_access_key}}</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>>     <value>{{aws_access_key_id}}</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>>     <value>{{aws_secret_access_key}}</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>>     <value>{{aws_access_key_id}}</value>
>>>>   </property>
>>>>
>>>>   <property>
>>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>>     <value>{{aws_secret_access_key}}</value>
>>>>   </property>
>>>>
>>>> This change makes spark on ec2 work out of the box for us. It took us
>>>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>>> version 2.
>>>>
>>>> Best Regards,
>>>> Christian
>>>>
>>>
>>

Re: Recommended change to core-site.xml template

Posted by Nicholas Chammas <ni...@gmail.com>.

> I am using both 1.4.1 and 1.5.1.

That's the Spark version. I'm wondering what version of Hadoop your Spark
is built against.

For example, when you download Spark
<http://spark.apache.org/downloads.html> you have to select from a number
of packages (under "Choose a package type"), and each is built against a
different version of Hadoop. When Spark is built against Hadoop 2.6+, from
my understanding, you need to install additional libraries
<https://issues.apache.org/jira/browse/SPARK-7481> to access S3. When Spark
is built against Hadoop 2.4 or earlier, you don't need to do this.

I'm confirming that this is what is happening in your case.

Nick

On Thu, Nov 5, 2015 at 12:17 PM Christian <en...@gmail.com> wrote:

> I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the
> new feature for instance-profile which greatly helps with this as well.
> Without the instance-profile, we got it working by copying a
> .aws/credentials file up to each node. We could easily automate that
> through the templates.
>
> I don't need any additional libraries. We just need to change the
> core-site.xml
>
> -Christian
>
> On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
>> Thanks for sharing this, Christian.
>>
>> What build of Spark are you using? If I understand correctly, if you are
>> using Spark built against Hadoop 2.6+ then additional configs alone won't
>> help because additional libraries also need to be installed
>> <https://issues.apache.org/jira/browse/SPARK-7481>.
>>
>> Nick
>>
>> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:
>>
>>> We ended up reading and writing to S3 a ton in our Spark jobs.
>>> For this to work, we ended up having to add s3a, and s3 key/secret
>>> pairs. We also had to add fs.hdfs.impl to get these things to work.
>>>
>>> I thought maybe I'd share what we did and it might be worth adding these
>>> to the spark conf for out of the box functionality with S3.
>>>
>>> We created:
>>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>>
>>> We changed the contents form the original, adding in the following:
>>>
>>>   <property>
>>>     <name>fs.file.impl</name>
>>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.hdfs.impl</name>
>>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.s3.impl</name>
>>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.s3.awsAccessKeyId</name>
>>>     <value>{{aws_access_key_id}}</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.s3.awsSecretAccessKey</name>
>>>     <value>{{aws_secret_access_key}}</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.s3n.awsAccessKeyId</name>
>>>     <value>{{aws_access_key_id}}</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.s3n.awsSecretAccessKey</name>
>>>     <value>{{aws_secret_access_key}}</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.s3a.awsAccessKeyId</name>
>>>     <value>{{aws_access_key_id}}</value>
>>>   </property>
>>>
>>>   <property>
>>>     <name>fs.s3a.awsSecretAccessKey</name>
>>>     <value>{{aws_secret_access_key}}</value>
>>>   </property>
>>>
>>> This change makes spark on ec2 work out of the box for us. It took us
>>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>>> version 2.
>>>
>>> Best Regards,
>>> Christian
>>>
>>
>

Re: Recommended change to core-site.xml template

Posted by Christian <en...@gmail.com>.

I am using both 1.4.1 and 1.5.1. In the end, we used 1.5.1 because of the
new feature for instance-profile which greatly helps with this as well.
Without the instance-profile, we got it working by copying a
.aws/credentials file up to each node. We could easily automate that
through the templates.

I don't need any additional libraries. We just need to change the
core-site.xml

-Christian

On Thu, Nov 5, 2015 at 9:35 AM, Nicholas Chammas <nicholas.chammas@gmail.com
> wrote:

> Thanks for sharing this, Christian.
>
> What build of Spark are you using? If I understand correctly, if you are
> using Spark built against Hadoop 2.6+ then additional configs alone won't
> help because additional libraries also need to be installed
> <https://issues.apache.org/jira/browse/SPARK-7481>.
>
> Nick
>
> On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:
>
>> We ended up reading and writing to S3 a ton in our Spark jobs.
>> For this to work, we ended up having to add s3a, and s3 key/secret pairs.
>> We also had to add fs.hdfs.impl to get these things to work.
>>
>> I thought maybe I'd share what we did and it might be worth adding these
>> to the spark conf for out of the box functionality with S3.
>>
>> We created:
>> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>>
>> We changed the contents form the original, adding in the following:
>>
>>   <property>
>>     <name>fs.file.impl</name>
>>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.hdfs.impl</name>
>>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3.impl</name>
>>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3.awsAccessKeyId</name>
>>     <value>{{aws_access_key_id}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3.awsSecretAccessKey</name>
>>     <value>{{aws_secret_access_key}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3n.awsAccessKeyId</name>
>>     <value>{{aws_access_key_id}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3n.awsSecretAccessKey</name>
>>     <value>{{aws_secret_access_key}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3a.awsAccessKeyId</name>
>>     <value>{{aws_access_key_id}}</value>
>>   </property>
>>
>>   <property>
>>     <name>fs.s3a.awsSecretAccessKey</name>
>>     <value>{{aws_secret_access_key}}</value>
>>   </property>
>>
>> This change makes spark on ec2 work out of the box for us. It took us
>> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
>> version 2.
>>
>> Best Regards,
>> Christian
>>
>

Re: Recommended change to core-site.xml template

Posted by Nicholas Chammas <ni...@gmail.com>.

Thanks for sharing this, Christian.

What build of Spark are you using? If I understand correctly, if you are
using Spark built against Hadoop 2.6+ then additional configs alone won't
help because additional libraries also need to be installed
<https://issues.apache.org/jira/browse/SPARK-7481>.

Nick

On Thu, Nov 5, 2015 at 11:25 AM Christian <en...@gmail.com> wrote:

> We ended up reading and writing to S3 a ton in our Spark jobs.
> For this to work, we ended up having to add s3a, and s3 key/secret pairs.
> We also had to add fs.hdfs.impl to get these things to work.
>
> I thought maybe I'd share what we did and it might be worth adding these
> to the spark conf for out of the box functionality with S3.
>
> We created:
> ec2/deploy.generic/root/spark-ec2/templates/root/spark/conf/core-site.xml
>
> We changed the contents form the original, adding in the following:
>
>   <property>
>     <name>fs.file.impl</name>
>     <value>org.apache.hadoop.fs.LocalFileSystem</value>
>   </property>
>
>   <property>
>     <name>fs.hdfs.impl</name>
>     <value>org.apache.hadoop.hdfs.DistributedFileSystem</value>
>   </property>
>
>   <property>
>     <name>fs.s3.impl</name>
>     <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
>   </property>
>
>   <property>
>     <name>fs.s3.awsAccessKeyId</name>
>     <value>{{aws_access_key_id}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3.awsSecretAccessKey</name>
>     <value>{{aws_secret_access_key}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3n.awsAccessKeyId</name>
>     <value>{{aws_access_key_id}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3n.awsSecretAccessKey</name>
>     <value>{{aws_secret_access_key}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3a.awsAccessKeyId</name>
>     <value>{{aws_access_key_id}}</value>
>   </property>
>
>   <property>
>     <name>fs.s3a.awsSecretAccessKey</name>
>     <value>{{aws_secret_access_key}}</value>
>   </property>
>
> This change makes spark on ec2 work out of the box for us. It took us
> several days to figure this out. It works for 1.4.1 and 1.5.1 on Hadoop
> version 2.
>
> Best Regards,
> Christian
>