You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@spark.apache.org by adam kramer <ad...@gmail.com> on 2016/10/27 22:04:12 UTC

Spark 2.0 with Hadoop 3.0?

Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases?

Is there any reason why Hadoop 3.0 is a non-starter for use with Spark
2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which
would resolve our driver dependency issues.

Thanks,
Adam

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark 2.0 with Hadoop 3.0?

Posted by adam kramer <ad...@gmail.com>.

The version problems are related to using hadoop-aws-2.7.3 alongside
aws-sdk-1.7.4  in hadoop-2.7.3 where DynamoDb functionality is limited
(it may not even operate with deployed versions of the service). I've
stripped out usage of DynamoDB from the driver program in the meantime
(using it in a calling program which reads standard output instead).

I believe anything in 1.10.x sdk should be fine including 1.10.6
included in 2.0-alpha1 release (we were using 1.10.31 elsewhere), so I
don't think the 10.10+ patch is necessary if we try hadoop-3. I'll let
you if we end up patching and testing anything from trunk to get it
working.






On Sat, Oct 29, 2016 at 6:08 AM, Steve Loughran <st...@hortonworks.com> wrote:
>
> On 27 Oct 2016, at 23:04, adam kramer <ad...@gmail.com> wrote:
>
> Is the version of Spark built for Hadoop 2.7 and later only for 2.x
> releases?
>
> Is there any reason why Hadoop 3.0 is a non-starter for use with Spark
> 2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which
> would resolve our driver dependency issues.
>
>
> what version problems are you having there?
>
>
> There's a patch to move to AWS SDK 10.10, but that has a jackson 2.6.6+
> dependency; that being something I'd like to do in Hadoop branch-2 as well,
> as it is Time to Move On ( HADOOP-12705 ) . FWIW all jackson 1.9
> dependencies have been ripped out, leaving on that 2.x version problem.
>
> https://issues.apache.org/jira/browse/HADOOP-13050
>
> The HADOOP-13345 s3guard work will pull in a (provided) dependency on
> dynamodb; looks like the HADOOP-13449 patch moves to SDK 1.11.0.
>
> I think we are likely to backport that to branch-2 as well, though it'd help
> the dev & test there if you built and tested your code against trunk early
> —not least to find any changes in that transitive dependency set.
>
>
> Thanks,
> Adam
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org

Re: Spark 2.0 with Hadoop 3.0?

Posted by Steve Loughran <st...@hortonworks.com>.

On 27 Oct 2016, at 23:04, adam kramer <ad...@gmail.com>> wrote:

Is the version of Spark built for Hadoop 2.7 and later only for 2.x releases?

Is there any reason why Hadoop 3.0 is a non-starter for use with Spark
2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which
would resolve our driver dependency issues.

what version problems are you having there?


There's a patch to move to AWS SDK 10.10, but that has a jackson 2.6.6+ dependency; that being something I'd like to do in Hadoop branch-2 as well, as it is Time to Move On ( HADOOP-12705 ) . FWIW all jackson 1.9 dependencies have been ripped out, leaving on that 2.x version problem.

https://issues.apache.org/jira/browse/HADOOP-13050

The HADOOP-13345 s3guard work will pull in a (provided) dependency on dynamodb; looks like the HADOOP-13449 patch moves to SDK 1.11.0.

I think we are likely to backport that to branch-2 as well, though it'd help the dev & test there if you built and tested your code against trunk early —not least to find any changes in that transitive dependency set.


Thanks,
Adam

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscribe@spark.apache.org<ma...@spark.apache.org>

Re: Spark 2.0 with Hadoop 3.0?

Posted by Zoltán Zvara <zo...@gmail.com>.

Worked for me 2 weeks ago with a 3.0.0-alpha2 snapshot. Just changed
hadoop.version while building.

On Fri, Oct 28, 2016, 11:50 Sean Owen <so...@cloudera.com> wrote:

> I don't think it works, but, there is no Hadoop 3.0 right now either. As
> the version implies, it's going to be somewhat different API-wise.
>
> On Thu, Oct 27, 2016 at 11:04 PM adam kramer <ad...@gmail.com> wrote:
>
> Is the version of Spark built for Hadoop 2.7 and later only for 2.x
> releases?
>
> Is there any reason why Hadoop 3.0 is a non-starter for use with Spark
> 2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which
> would resolve our driver dependency issues.
>
> Thanks,
> Adam
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>

Re: Spark 2.0 with Hadoop 3.0?

Posted by Sean Owen <so...@cloudera.com>.

I don't think it works, but, there is no Hadoop 3.0 right now either. As
the version implies, it's going to be somewhat different API-wise.

On Thu, Oct 27, 2016 at 11:04 PM adam kramer <ad...@gmail.com> wrote:

> Is the version of Spark built for Hadoop 2.7 and later only for 2.x
> releases?
>
> Is there any reason why Hadoop 3.0 is a non-starter for use with Spark
> 2.0? The version of aws-sdk in 3.0 actually works for DynamoDB which
> would resolve our driver dependency issues.
>
> Thanks,
> Adam
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscribe@spark.apache.org
>
>