You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by JG Perrin <jp...@lumeris.com> on 2017/10/03 01:28:50 UTC

Quick one... AWS SDK version?

Hey Sparkians,

What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs?

Thanks!

jg

Re: Quick one... AWS SDK version?

Posted by Jonathan Kelly <jo...@gmail.com>.
Tushar,

Yes, the hadoop-aws jar installed on an emr-5.8.0 cluster was built with
AWS Java SDK 1.11.160, if that’s what you mean.

~ Jonathan
On Sun, Oct 8, 2017 at 8:42 AM Tushar Sudake <et...@gmail.com> wrote:

> Hi Jonathan,
>
> Does that mean Hadoop-AWS 2.7.3 too is built against AWS SDK 1.11.160 and
> not 1.7.4?
>
> Thanks.
>
>
> On Oct 7, 2017 3:50 PM, "Jean Georges Perrin" <jg...@jgp.net> wrote:
>
>
> Hey Marco,
>
> I am actually reading from S3 and I use 2.7.3, but I inherited the project
> and they use some AWS API from Amazon SDK, which version is like from
> yesterday :) so it’s confused and AMZ is changing its version like crazy so
> it’s a little difficult to follow. Right now I went back to 2.7.3 and SDK
> 1.7.4...
>
> jg
>
>
> On Oct 7, 2017, at 15:34, Marco Mistroni <mm...@gmail.com> wrote:
>
> Hi JG
>  out of curiosity what's ur usecase? are you writing to S3? you could use
> Spark to do that , e.g using hadoop package
> org.apache.hadoop:hadoop-aws:2.7.1 ..that will download the aws client
> which is in line with hadoop 2.7.1?
>
> hth
>  marco
>
> On Fri, Oct 6, 2017 at 10:58 PM, Jonathan Kelly <jo...@gmail.com>
> wrote:
>
>> Note: EMR builds Hadoop, Spark, et al, from source against specific
>> versions of certain packages like the AWS Java SDK, httpclient/core,
>> Jackson, etc., sometimes requiring some patches in these applications in
>> order to work with versions of these dependencies that differ from what the
>> applications may support upstream.
>>
>> For emr-5.8.0, we have built Hadoop and Spark (the Spark Kinesis
>> connector, that is, since that's the only part of Spark that actually
>> depends upon the AWS Java SDK directly) against AWS Java SDK 1.11.160
>> instead of the much older version that vanilla Hadoop 2.7.3 would otherwise
>> depend upon.
>>
>> ~ Jonathan
>>
>> On Wed, Oct 4, 2017 at 7:17 AM Steve Loughran <st...@hortonworks.com>
>> wrote:
>>
>>> On 3 Oct 2017, at 21:37, JG Perrin <jp...@lumeris.com> wrote:
>>>
>>> Sorry Steve – I may not have been very clear: thinking about
>>> aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled
>>> with Spark.
>>>
>>>
>>>
>>> I know, but if you are talking to s3 via the s3a client, you will need
>>> the SDK version to match the hadoop-aws JAR of the same version of Hadoop
>>> your JARs have. Similarly, if you were using spark-kinesis, it needs to be
>>> in sync there.
>>>
>>>
>>> *From:* Steve Loughran [mailto:stevel@hortonworks.com
>>> <st...@hortonworks.com>]
>>> *Sent:* Tuesday, October 03, 2017 2:20 PM
>>> *To:* JG Perrin <jp...@lumeris.com>
>>> *Cc:* user@spark.apache.org
>>> *Subject:* Re: Quick one... AWS SDK version?
>>>
>>>
>>>
>>> On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com> wrote:
>>>
>>> Hey Sparkians,
>>>
>>> What version of AWS Java SDK do you use with Spark 2.2? Do you stick
>>> with the Hadoop 2.7.3 libs?
>>>
>>>
>>> You generally to have to stick with the version which hadoop was built
>>> with I'm afraid...very brittle dependency.
>>>
>>>
>
>

Re: Quick one... AWS SDK version?

Posted by Tushar Sudake <et...@gmail.com>.
Hi Jonathan,

Does that mean Hadoop-AWS 2.7.3 too is built against AWS SDK 1.11.160 and
not 1.7.4?

Thanks.


On Oct 7, 2017 3:50 PM, "Jean Georges Perrin" <jg...@jgp.net> wrote:


Hey Marco,

I am actually reading from S3 and I use 2.7.3, but I inherited the project
and they use some AWS API from Amazon SDK, which version is like from
yesterday :) so it’s confused and AMZ is changing its version like crazy so
it’s a little difficult to follow. Right now I went back to 2.7.3 and SDK
1.7.4...

jg


On Oct 7, 2017, at 15:34, Marco Mistroni <mm...@gmail.com> wrote:

Hi JG
 out of curiosity what's ur usecase? are you writing to S3? you could use
Spark to do that , e.g using hadoop package  org.apache.hadoop:hadoop-aws:2.7.1
..that will download the aws client which is in line with hadoop 2.7.1?

hth
 marco

On Fri, Oct 6, 2017 at 10:58 PM, Jonathan Kelly <jo...@gmail.com>
wrote:

> Note: EMR builds Hadoop, Spark, et al, from source against specific
> versions of certain packages like the AWS Java SDK, httpclient/core,
> Jackson, etc., sometimes requiring some patches in these applications in
> order to work with versions of these dependencies that differ from what the
> applications may support upstream.
>
> For emr-5.8.0, we have built Hadoop and Spark (the Spark Kinesis
> connector, that is, since that's the only part of Spark that actually
> depends upon the AWS Java SDK directly) against AWS Java SDK 1.11.160
> instead of the much older version that vanilla Hadoop 2.7.3 would otherwise
> depend upon.
>
> ~ Jonathan
>
> On Wed, Oct 4, 2017 at 7:17 AM Steve Loughran <st...@hortonworks.com>
> wrote:
>
>> On 3 Oct 2017, at 21:37, JG Perrin <jp...@lumeris.com> wrote:
>>
>> Sorry Steve – I may not have been very clear: thinking about
>> aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled
>> with Spark.
>>
>>
>>
>> I know, but if you are talking to s3 via the s3a client, you will need
>> the SDK version to match the hadoop-aws JAR of the same version of Hadoop
>> your JARs have. Similarly, if you were using spark-kinesis, it needs to be
>> in sync there.
>>
>>
>> *From:* Steve Loughran [mailto:stevel@hortonworks.com
>> <st...@hortonworks.com>]
>> *Sent:* Tuesday, October 03, 2017 2:20 PM
>> *To:* JG Perrin <jp...@lumeris.com>
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Quick one... AWS SDK version?
>>
>>
>>
>> On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com> wrote:
>>
>> Hey Sparkians,
>>
>> What version of AWS Java SDK do you use with Spark 2.2? Do you stick with
>> the Hadoop 2.7.3 libs?
>>
>>
>> You generally to have to stick with the version which hadoop was built
>> with I'm afraid...very brittle dependency.
>>
>>

Re: Quick one... AWS SDK version?

Posted by Jean Georges Perrin <jg...@jgp.net>.
Hey Marco,

I am actually reading from S3 and I use 2.7.3, but I inherited the project and they use some AWS API from Amazon SDK, which version is like from yesterday :) so it’s confused and AMZ is changing its version like crazy so it’s a little difficult to follow. Right now I went back to 2.7.3 and SDK 1.7.4...

jg


> On Oct 7, 2017, at 15:34, Marco Mistroni <mm...@gmail.com> wrote:
> 
> Hi JG
>  out of curiosity what's ur usecase? are you writing to S3? you could use Spark to do that , e.g using hadoop package  org.apache.hadoop:hadoop-aws:2.7.1 ..that will download the aws client which is in line with hadoop 2.7.1?
> 
> hth
>  marco
> 
>> On Fri, Oct 6, 2017 at 10:58 PM, Jonathan Kelly <jo...@gmail.com> wrote:
>> Note: EMR builds Hadoop, Spark, et al, from source against specific versions of certain packages like the AWS Java SDK, httpclient/core, Jackson, etc., sometimes requiring some patches in these applications in order to work with versions of these dependencies that differ from what the applications may support upstream.
>> 
>> For emr-5.8.0, we have built Hadoop and Spark (the Spark Kinesis connector, that is, since that's the only part of Spark that actually depends upon the AWS Java SDK directly) against AWS Java SDK 1.11.160 instead of the much older version that vanilla Hadoop 2.7.3 would otherwise depend upon.
>> 
>> ~ Jonathan
>> 
>>> On Wed, Oct 4, 2017 at 7:17 AM Steve Loughran <st...@hortonworks.com> wrote:
>>>> On 3 Oct 2017, at 21:37, JG Perrin <jp...@lumeris.com> wrote:
>>>> 
>>>> Sorry Steve – I may not have been very clear: thinking about aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled with Spark.
>>> 
>>> 
>>> I know, but if you are talking to s3 via the s3a client, you will need the SDK version to match the hadoop-aws JAR of the same version of Hadoop your JARs have. Similarly, if you were using spark-kinesis, it needs to be in sync there. 
>>>>  
>>>> From: Steve Loughran [mailto:stevel@hortonworks.com] 
>>>> Sent: Tuesday, October 03, 2017 2:20 PM
>>>> To: JG Perrin <jp...@lumeris.com>
>>>> Cc: user@spark.apache.org
>>>> Subject: Re: Quick one... AWS SDK version?
>>>>  
>>>>  
>>>> On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com> wrote:
>>>>  
>>>> Hey Sparkians,
>>>>  
>>>> What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs?
>>>>  
>>>> You generally to have to stick with the version which hadoop was built with I'm afraid...very brittle dependency. 
> 

Re: Quick one... AWS SDK version?

Posted by Marco Mistroni <mm...@gmail.com>.
Hi JG
 out of curiosity what's ur usecase? are you writing to S3? you could use
Spark to do that , e.g using hadoop package
org.apache.hadoop:hadoop-aws:2.7.1 ..that will download the aws client
which is in line with hadoop 2.7.1?

hth
 marco

On Fri, Oct 6, 2017 at 10:58 PM, Jonathan Kelly <jo...@gmail.com>
wrote:

> Note: EMR builds Hadoop, Spark, et al, from source against specific
> versions of certain packages like the AWS Java SDK, httpclient/core,
> Jackson, etc., sometimes requiring some patches in these applications in
> order to work with versions of these dependencies that differ from what the
> applications may support upstream.
>
> For emr-5.8.0, we have built Hadoop and Spark (the Spark Kinesis
> connector, that is, since that's the only part of Spark that actually
> depends upon the AWS Java SDK directly) against AWS Java SDK 1.11.160
> instead of the much older version that vanilla Hadoop 2.7.3 would otherwise
> depend upon.
>
> ~ Jonathan
>
> On Wed, Oct 4, 2017 at 7:17 AM Steve Loughran <st...@hortonworks.com>
> wrote:
>
>> On 3 Oct 2017, at 21:37, JG Perrin <jp...@lumeris.com> wrote:
>>
>> Sorry Steve – I may not have been very clear: thinking about
>> aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled
>> with Spark.
>>
>>
>>
>> I know, but if you are talking to s3 via the s3a client, you will need
>> the SDK version to match the hadoop-aws JAR of the same version of Hadoop
>> your JARs have. Similarly, if you were using spark-kinesis, it needs to be
>> in sync there.
>>
>>
>> *From:* Steve Loughran [mailto:stevel@hortonworks.com
>> <st...@hortonworks.com>]
>> *Sent:* Tuesday, October 03, 2017 2:20 PM
>> *To:* JG Perrin <jp...@lumeris.com>
>> *Cc:* user@spark.apache.org
>> *Subject:* Re: Quick one... AWS SDK version?
>>
>>
>>
>> On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com> wrote:
>>
>> Hey Sparkians,
>>
>> What version of AWS Java SDK do you use with Spark 2.2? Do you stick with
>> the Hadoop 2.7.3 libs?
>>
>>
>> You generally to have to stick with the version which hadoop was built
>> with I'm afraid...very brittle dependency.
>>
>>

Re: Quick one... AWS SDK version?

Posted by Jonathan Kelly <jo...@gmail.com>.
Note: EMR builds Hadoop, Spark, et al, from source against specific
versions of certain packages like the AWS Java SDK, httpclient/core,
Jackson, etc., sometimes requiring some patches in these applications in
order to work with versions of these dependencies that differ from what the
applications may support upstream.

For emr-5.8.0, we have built Hadoop and Spark (the Spark Kinesis connector,
that is, since that's the only part of Spark that actually depends upon the
AWS Java SDK directly) against AWS Java SDK 1.11.160 instead of the much
older version that vanilla Hadoop 2.7.3 would otherwise depend upon.

~ Jonathan

On Wed, Oct 4, 2017 at 7:17 AM Steve Loughran <st...@hortonworks.com>
wrote:

> On 3 Oct 2017, at 21:37, JG Perrin <jp...@lumeris.com> wrote:
>
> Sorry Steve – I may not have been very clear: thinking about
> aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled
> with Spark.
>
>
>
> I know, but if you are talking to s3 via the s3a client, you will need the
> SDK version to match the hadoop-aws JAR of the same version of Hadoop your
> JARs have. Similarly, if you were using spark-kinesis, it needs to be in
> sync there.
>
>
> *From:* Steve Loughran [mailto:stevel@hortonworks.com
> <st...@hortonworks.com>]
> *Sent:* Tuesday, October 03, 2017 2:20 PM
> *To:* JG Perrin <jp...@lumeris.com>
> *Cc:* user@spark.apache.org
> *Subject:* Re: Quick one... AWS SDK version?
>
>
>
> On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com> wrote:
>
> Hey Sparkians,
>
> What version of AWS Java SDK do you use with Spark 2.2? Do you stick with
> the Hadoop 2.7.3 libs?
>
>
> You generally to have to stick with the version which hadoop was built
> with I'm afraid...very brittle dependency.
>
>

Re: Quick one... AWS SDK version?

Posted by Steve Loughran <st...@hortonworks.com>.
On 3 Oct 2017, at 21:37, JG Perrin <jp...@lumeris.com>> wrote:

Sorry Steve – I may not have been very clear: thinking about aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled with Spark.


I know, but if you are talking to s3 via the s3a client, you will need the SDK version to match the hadoop-aws JAR of the same version of Hadoop your JARs have. Similarly, if you were using spark-kinesis, it needs to be in sync there.

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Tuesday, October 03, 2017 2:20 PM
To: JG Perrin <jp...@lumeris.com>>
Cc: user@spark.apache.org<ma...@spark.apache.org>
Subject: Re: Quick one... AWS SDK version?


On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com>> wrote:

Hey Sparkians,

What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs?

You generally to have to stick with the version which hadoop was built with I'm afraid...very brittle dependency.


RE: Quick one... AWS SDK version?

Posted by JG Perrin <jp...@lumeris.com>.
Sorry Steve - I may not have been very clear: thinking about aws-java-sdk-z.yy.xxx.jar. To the best of my knowledge, none is bundled with Spark.

From: Steve Loughran [mailto:stevel@hortonworks.com]
Sent: Tuesday, October 03, 2017 2:20 PM
To: JG Perrin <jp...@lumeris.com>
Cc: user@spark.apache.org
Subject: Re: Quick one... AWS SDK version?


On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com>> wrote:

Hey Sparkians,

What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs?

You generally to have to stick with the version which hadoop was built with I'm afraid...very brittle dependency.

Re: Quick one... AWS SDK version?

Posted by Steve Loughran <st...@hortonworks.com>.
On 3 Oct 2017, at 02:28, JG Perrin <jp...@lumeris.com>> wrote:

Hey Sparkians,

What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs?

You generally to have to stick with the version which hadoop was built with I'm afraid...very brittle dependency.

RE: Quick one... AWS SDK version?

Posted by JG Perrin <jp...@lumeris.com>.
Thanks Yash… this is helpful!

From: Yash Sharma [mailto:yash360@gmail.com]
Sent: Tuesday, October 03, 2017 1:02 AM
To: JG Perrin <jp...@lumeris.com>; user@spark.apache.org
Subject: Re: Quick one... AWS SDK version?


Hi JG,
Here are my cluster configs if it helps.

Cheers.

EMR: emr-5.8.0
Hadoop distribution: Amazon 2.7.3
AWS sdk: /usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.160.jar
Applications:
Hive 2.3.0
Spark 2.2.0
Tez 0.8.4

On Tue, 3 Oct 2017 at 12:29 JG Perrin <jp...@lumeris.com>> wrote:
Hey Sparkians,

What version of AWS Java SDK do you use with Spark 2.2? Do you stick with the Hadoop 2.7.3 libs?

Thanks!

jg

Re: Quick one... AWS SDK version?

Posted by Yash Sharma <ya...@gmail.com>.
Hi JG,
Here are my cluster configs if it helps.

Cheers.

EMR: emr-5.8.0
Hadoop distribution: Amazon 2.7.3
AWS sdk: /usr/share/aws/aws-java-sdk/aws-java-sdk-1.11.160.jar

Applications:
Hive 2.3.0
Spark 2.2.0
Tez 0.8.4


On Tue, 3 Oct 2017 at 12:29 JG Perrin <jp...@lumeris.com> wrote:

> Hey Sparkians,
>
>
>
> What version of AWS Java SDK do you use with Spark 2.2? Do you stick with
> the Hadoop 2.7.3 libs?
>
>
>
> Thanks!
>
>
>
> jg
>