You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Sean Owen <so...@cloudera.com> on 2017/02/03 11:52:30 UTC

Remove support for Hadoop 2.5 and earlier?

Last year we discussed removing support for things like Hadoop 2.5 and
earlier. It was deprecated in Spark 2.1.0. I'd like to go ahead with this,
so am checking whether anyone has strong feelings about it.

The original rationale for separate Hadoop profile was bridging the
significant difference between Hadoop 1 and 2, and the moderate differences
between 2.0 alpha, 2.1 beta, and 2.2 final. 2.2 is really the "stable"
Hadoop 2, and releases from there to current are comparatively very similar
from Spark's perspective. We nevertheless continued to make a separate
build profile for every minor release, which isn't serving much purpose.

The argument here is mostly that it will simplify code a little bit (less
reflection, fewer profiles), simplify the build -- we now have 6 profiles x
2 build systems x 4 major branches in Jenkins, whereas master could go down
to 2 profiles.

Realistically, I don't know how much we'd do to support Hadoop before 2.6
anyway. Any distro user is long since on 2.6+.

Would this cause anyone significant pain? if so, let's talk about when it
would be realistic to remove this, when does that change.

Re: Remove support for Hadoop 2.5 and earlier?

Posted by Steve Loughran <st...@hortonworks.com>.

On 3 Feb 2017, at 21:28, Jacek Laskowski <ja...@japila.pl>> wrote:

Hi Sean,

Given that 3.0.0 is coming, removing the unused versions would be a
huge benefit from maintenance point of view. I'd support removing
support for 2.5 and earlier.

Speaking of Hadoop support, is anyone considering 3.0.0 support? Can't
find any JIRA for this.



As it stands, hive 1.2.x rejects 3 as a supported Hadoop version, so dataframes won't work

https://issues.apache.org/jira/browse/SPARK-18673

There's a quick fix to get hadoop to lie about what version it is to keep hive quiet, building hadoop with -Ddeclared.hadoop.version=2.11 to force it into 2.11, but that's not production. It does at least verify that nobody has broken any of the APIs (at least excluding those called via reflection on codepaths not tested in unit testing)

the full Hive patch is very much a WiP and its aimed at Hive 2
https://issues.apache.org/jira/browse/HIVE-15016

...which means either backporting to the org,spark-project hive 1.2 fork or moving up to Hive 2, which is inevitably going to be a major change

Re: Remove support for Hadoop 2.5 and earlier?

Posted by Jacek Laskowski <ja...@japila.pl>.

Hi Sean,

Given that 3.0.0 is coming, removing the unused versions would be a
huge benefit from maintenance point of view. I'd support removing
support for 2.5 and earlier.

Speaking of Hadoop support, is anyone considering 3.0.0 support? Can't
find any JIRA for this.

Pozdrawiam,
Jacek Laskowski
----
https://medium.com/@jaceklaskowski/
Mastering Apache Spark 2.0 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Feb 3, 2017 at 12:52 PM, Sean Owen <so...@cloudera.com> wrote:
> Last year we discussed removing support for things like Hadoop 2.5 and
> earlier. It was deprecated in Spark 2.1.0. I'd like to go ahead with this,
> so am checking whether anyone has strong feelings about it.
>
> The original rationale for separate Hadoop profile was bridging the
> significant difference between Hadoop 1 and 2, and the moderate differences
> between 2.0 alpha, 2.1 beta, and 2.2 final. 2.2 is really the "stable"
> Hadoop 2, and releases from there to current are comparatively very similar
> from Spark's perspective. We nevertheless continued to make a separate build
> profile for every minor release, which isn't serving much purpose.
>
> The argument here is mostly that it will simplify code a little bit (less
> reflection, fewer profiles), simplify the build -- we now have 6 profiles x
> 2 build systems x 4 major branches in Jenkins, whereas master could go down
> to 2 profiles.
>
> Realistically, I don't know how much we'd do to support Hadoop before 2.6
> anyway. Any distro user is long since on 2.6+.
>
> Would this cause anyone significant pain? if so, let's talk about when it
> would be realistic to remove this, when does that change.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org

Re: Remove support for Hadoop 2.5 and earlier?

Posted by Steve Loughran <st...@hortonworks.com>.

> On 3 Feb 2017, at 11:52, Sean Owen <so...@cloudera.com> wrote:
> 
> Last year we discussed removing support for things like Hadoop 2.5 and earlier. It was deprecated in Spark 2.1.0. I'd like to go ahead with this, so am checking whether anyone has strong feelings about it.
> 
> The original rationale for separate Hadoop profile was bridging the significant difference between Hadoop 1 and 2, and the moderate differences between 2.0 alpha, 2.1 beta, and 2.2 final. 2.2 is really the "stable" Hadoop 2, and releases from there to current are comparatively very similar from Spark's perspective. We nevertheless continued to make a separate build profile for every minor release, which isn't serving much purpose.
> 
> The argument here is mostly that it will simplify code a little bit (less reflection, fewer profiles), simplify the build -- we now have 6 profiles x 2 build systems x 4 major branches in Jenkins, whereas master could go down to 2 profiles. 
> 
> Realistically, I don't know how much we'd do to support Hadoop before 2.6 anyway. Any distro user is long since on 2.6+.

Hadoop 2.5 doesnt work properly on Java 7, so support for it is kind of implicitly false. indeed, Hadoop 2.6 only works on Java 7 if you disable kerberos, which isn't something I'd recommend in a shared physical cluster, though you may be able to get away with in an ephemeral one where you lock down all the ports.


---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org