You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2016/04/04 07:28:41 UTC

Re: [discuss] ending support for Java 7 in Spark 2.0

Since my original email, I've talked to a lot more users and looked at what
various environments support. It is true that a lot of enterprises, and
even some technology companies, are still using Java 7. One thing is that
up until this date, users still can't install openjdk 8 on Ubuntu by
default. I see that as an indication that it is too early to drop Java 7.

Looking at the timeline, JDK release a major new version roughly every 3
years. We dropped Java 6 support one year ago, so from a timeline point of
view we would be very aggressive here if we were to drop Java 7 support in
Spark 2.0.

Note that not dropping Java 7 support now doesn't mean we have to support
Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
though Spark 1.0 started with Java 6.

In terms of testing, Josh has actually improved our test infra so now we
would run the Java 8 tests: https://github.com/apache/spark/pull/12073




On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:

> Arguments are really convincing; new Dataset API as well as performance
>
> improvements is exiting, so I'm personally +1 on moving onto Java8.
>
>
>
> However, I'm afraid Tencent is one of "the organizations stuck with Java7"
>
> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and
>
> wouldn't upgrade to Java8 until Java9 is out.
>
>
> So:
>
> (non-binding) +1 on dropping scala 2.10 support
>
> (non-binding)  -1 on dropping Java 7 support
>
>                       * as long as we figure out a practical way to run
> Spark with
>
>                         JDK8 on JDK7 clusters, this -1 would then
> definitely be +1
>
>
> Thanks !
>
> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i think that logic is reasonable, but then the same should also apply to
>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>> has been since march 2015 except for one hotfix due to a license
>> incompatibility)
>>
>> who wants to support scala 2.10 three years after they did the last
>> maintenance release?
>>
>>
>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>>> Removing compatibility (with jdk, etc) can be done with a major release-
>>> given that 7 has been EOLed a while back and is now unsupported, we have to
>>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>>
>>> Given the functionality & performance benefits of going to jdk8, future
>>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>> requires it, and simplicity wrt code, test & support it looks like a good
>>> checkpoint to drop jdk7 support.
>>>
>>> As already mentioned in the thread, existing yarn clusters are
>>> unaffected if they want to continue running jdk7 and yet use
>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>> distribute jdk8 as archive - suboptimal).
>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>
>>>
>>> Proposal is for 1.6x line to continue to be supported with critical
>>> fixes; newer features will require 2.x and so jdk8
>>>
>>> Regards
>>> Mridul
>>>
>>>
>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>
>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>> > If you want to go down that route, you should also ask somebody who
>>>> has had
>>>> > experience managing a large organization's applications and try to
>>>> update
>>>> > Scala version.
>>>>
>>>> I understand both sides. But if you look at what I've been asking
>>>> since the beginning, it's all about the cost and benefits of dropping
>>>> support for java 1.7.
>>>>
>>>> The biggest argument in your original e-mail is about testing. And the
>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>> be even possible to just do everything in a single job - compile for
>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>> which seems to be the main thing you were worried about.
>>>>
>>>>
>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>>> wrote:
>>>> >>
>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>>>> are
>>>> >> > not
>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>>>> >> > certain
>>>> >> > esoteric cases.
>>>> >>
>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>> >> their applications and everything... binary compatibility is a tiny
>>>> >> drop in that bucket.
>>>> >>
>>>> >> --
>>>> >> Marcelo
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.
On 4 Apr 2016, at 16:41, Xuefeng Wu <be...@gmail.com>> wrote:


Many open source projects are aggressive, such as Oracle JDK and Ubuntu, But they provide stable commercial supporting.


supporting old versions of jdk is one of the key revenue streams for oracle's sun group: there's a lot of webapps out there which need a secure/stable JDK version, and whose owners don't want to spend time & money doing the upgrade.


In other words, the enterprises doesn't drop JDK7, might aslo do not drop Spark 1.x to adopt Spark 2.x early version.



probably true, except for the complication that in a large multitenant cluster, you need to get everyone who runs code in the cluster and the ops team happy with the plan

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Xuefeng Wu <be...@gmail.com>.

Many open source projects are aggressive, such as Oracle JDK and Ubuntu, But they provide stable commercial supporting.


In other words, the enterprises doesn't drop JDK7, might aslo do not drop Spark 1.x to adopt Spark 2.x early version.





On Sun, Apr 3, 2016 at 10:29 PM -0700, "Reynold Xin" <rx...@databricks.com> wrote:










Since my original email, I've talked to a lot more users and looked at what various environments support. It is true that a lot of enterprises, and even some technology companies, are still using Java 7. One thing is that up until this date, users still can't install openjdk 8 on Ubuntu by default. I see that as an indication that it is too early to drop Java 7.

Looking at the timeline, JDK release a major new version roughly every 3 years. We dropped Java 6 support one year ago, so from a timeline point of view we would be very aggressive here if we were to drop Java 7 support in Spark 2.0.
Note that not dropping Java 7 support now doesn't mean we have to support Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even though Spark 1.0 started with Java 6.
In terms of testing, Josh has actually improved our test infra so now we would run the Java 8 tests: https://github.com/apache/spark/pull/12073



On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:


Arguments are really convincing; new Dataset API as well as performance


improvements is exiting, so I'm personally +1 on moving onto Java8.

 

However, I'm afraid Tencent is one of "the organizations stuck with Java7"

-- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and

wouldn't upgrade to Java8 until Java9 is out.




So: 

(non-binding) +1 on dropping scala 2.10 support

(non-binding)  -1 on dropping Java 7 support

                      * as long as we figure out a practical way to run Spark with

                        JDK8 on JDK7 clusters, this -1 would then definitely be +1




Thanks !
On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com> wrote:
i think that logic is reasonable, but then the same should also apply to scala 2.10, which is also unmaintained/unsupported at this point (basically has been since march 2015 except for one hotfix due to a license incompatibility)

who wants to support scala 2.10 three years after they did the last maintenance release?


On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com> wrote:
Removing compatibility (with jdk, etc) can be done with a major release- given that 7 has been EOLed a while back and is now unsupported, we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
Given the functionality & performance benefits of going to jdk8, future enhancements relevant in 2.x timeframe ( scala, dependencies) which requires it, and simplicity wrt code, test & support it looks like a good checkpoint to drop jdk7 support.
As already mentioned in the thread, existing yarn clusters are unaffected if they want to continue running jdk7 and yet use spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as archive - suboptimal).I am unsure about mesos (standalone might be easier upgrade I guess ?).

Proposal is for 1.6x line to continue to be supported with critical fixes; newer features will require 2.x and so jdk8
Regards Mridul 

On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:
On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com> wrote:

> If you want to go down that route, you should also ask somebody who has had

> experience managing a large organization's applications and try to update

> Scala version.



I understand both sides. But if you look at what I've been asking

since the beginning, it's all about the cost and benefits of dropping

support for java 1.7.



The biggest argument in your original e-mail is about testing. And the

testing cost is much bigger for supporting scala 2.10 than it is for

supporting java 1.7. If you read one of my earlier replies, it should

be even possible to just do everything in a single job - compile for

java 7 and still be able to test things in 1.8, including lambdas,

which seems to be the main thing you were worried about.





> On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

>>

>> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:

>> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than

>> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are

>> > not

>> > binary compatible, whereas JVM 7 and 8 are binary compatible except

>> > certain

>> > esoteric cases.

>>

>> True, but ask anyone who manages a large cluster how long it would

>> take them to upgrade the jdk across their cluster and validate all

>> their applications and everything... binary compatibility is a tiny

>> drop in that bucket.

>>

>> --

>> Marcelo

>

>







--

Marcelo



---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org

For additional commands, e-mail: dev-help@spark.apache.org
















Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.
Hi Sean,

See http://www.oracle.com/technetwork/java/eol-135779.html

Java 7 hasn't EOLed yet. If you look at support you can get from Oracle,
it's actually goes to 2019. And you can even get more support after that.

Spark has always maintained great backward compatibility with other
systems, way beyond what vendors typically support. For example, we
supported Hadoop 1.x all the way until Spark 1.6 (basically the last
release), while all the vendors have dropped support for them already.

Putting my Databricks hat on we actually only support Java 8, but I think
it would be great to still support Java 7 in the upstream release for some
larger deployments. I like the idea of deprecating or at least strongly
encouraging people to update.

On Tuesday, April 5, 2016, Sean Owen <so...@cloudera.com> wrote:

> Following
> https://github.com/apache/spark/pull/12165#issuecomment-205791222
> I'd like to make a point about process and then answer points below.
>
> We have this funny system where anyone can propose a change, and any
> of a few people can veto a change unilaterally. The latter rarely
> comes up. 9 changes out of 10 nobody disagrees on; sometimes a
> committer will say 'no' to a change and nobody else with that bit
> disagrees.
>
> Sometimes it matters and here I see, what, 4 out of 5 people including
> committers supporting a particular change. A veto to oppose that is
> pretty drastic. It's not something to use because you or customers
> prefer a certain outcome. This reads like you're informing people
> you've changed your mind and that's the decision, when it can't work
> that way. I saw this happen to a lesser extent in the thread about
> Scala 2.10.
>
> It doesn't mean majority rules here either, but can I suggest you
> instead counter-propose an outcome that the people here voting in
> favor of what you're vetoing would probably also buy into? I bet
> everyone's willing to give wide accommodation to your concerns. It's
> probably not hard, like: let's plan to not support Java 7 in Spark
> 2.1.0. (Then we can debate the logic of that.)
>
> On Mon, Apr 4, 2016 at 6:28 AM, Reynold Xin <rxin@databricks.com
> <javascript:;>> wrote:
> > some technology companies, are still using Java 7. One thing is that up
> > until this date, users still can't install openjdk 8 on Ubuntu by
> default. I
> > see that as an indication that it is too early to drop Java 7.
>
> I have Java 8 on my Ubuntu instance, and installed it directly via apt-get.
> http://openjdk.java.net/install/
>
>
> > Looking at the timeline, JDK release a major new version roughly every 3
> > years. We dropped Java 6 support one year ago, so from a timeline point
> of
> > view we would be very aggressive here if we were to drop Java 7 support
> in
> > Spark 2.0.
>
> The metric is really (IMHO) when the JDK goes EOL. Java 6 was EOL in
> Feb 2013, so supporting it into Spark 1.x was probably too long. Java
> 7 was EOL in April 2015. It's not really somehow every ~3 years.
>
>
> > Note that not dropping Java 7 support now doesn't mean we have to support
> > Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
> > though Spark 1.0 started with Java 6.
>
> Whatever arguments one has about preventing people from updating to
> the latest and greatest then apply to a *minor* release, which is
> worse. Java 6 support was probably overdue for removal at 1.0;
> better-late-than-never, not necessarily the right time to do it.
>
>
> > In terms of testing, Josh has actually improved our test infra so now we
> > would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>
> Excellent, but, orthogonal.
>
> Even if I personally don't see the merit in these arguments compared
> to the counter-arguments, retaining Java 7 support now wouldn't be a
> terrible outcome. I'd like to see better process and a more reasonable
> compromise result though.
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Sean Owen <so...@cloudera.com>.
Following https://github.com/apache/spark/pull/12165#issuecomment-205791222
I'd like to make a point about process and then answer points below.

We have this funny system where anyone can propose a change, and any
of a few people can veto a change unilaterally. The latter rarely
comes up. 9 changes out of 10 nobody disagrees on; sometimes a
committer will say 'no' to a change and nobody else with that bit
disagrees.

Sometimes it matters and here I see, what, 4 out of 5 people including
committers supporting a particular change. A veto to oppose that is
pretty drastic. It's not something to use because you or customers
prefer a certain outcome. This reads like you're informing people
you've changed your mind and that's the decision, when it can't work
that way. I saw this happen to a lesser extent in the thread about
Scala 2.10.

It doesn't mean majority rules here either, but can I suggest you
instead counter-propose an outcome that the people here voting in
favor of what you're vetoing would probably also buy into? I bet
everyone's willing to give wide accommodation to your concerns. It's
probably not hard, like: let's plan to not support Java 7 in Spark
2.1.0. (Then we can debate the logic of that.)

On Mon, Apr 4, 2016 at 6:28 AM, Reynold Xin <rx...@databricks.com> wrote:
> some technology companies, are still using Java 7. One thing is that up
> until this date, users still can't install openjdk 8 on Ubuntu by default. I
> see that as an indication that it is too early to drop Java 7.

I have Java 8 on my Ubuntu instance, and installed it directly via apt-get.
http://openjdk.java.net/install/


> Looking at the timeline, JDK release a major new version roughly every 3
> years. We dropped Java 6 support one year ago, so from a timeline point of
> view we would be very aggressive here if we were to drop Java 7 support in
> Spark 2.0.

The metric is really (IMHO) when the JDK goes EOL. Java 6 was EOL in
Feb 2013, so supporting it into Spark 1.x was probably too long. Java
7 was EOL in April 2015. It's not really somehow every ~3 years.


> Note that not dropping Java 7 support now doesn't mean we have to support
> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
> though Spark 1.0 started with Java 6.

Whatever arguments one has about preventing people from updating to
the latest and greatest then apply to a *minor* release, which is
worse. Java 6 support was probably overdue for removal at 1.0;
better-late-than-never, not necessarily the right time to do it.


> In terms of testing, Josh has actually improved our test infra so now we
> would run the Java 8 tests: https://github.com/apache/spark/pull/12073

Excellent, but, orthogonal.

Even if I personally don't see the merit in these arguments compared
to the counter-arguments, retaining Java 7 support now wouldn't be a
terrible outcome. I'd like to see better process and a more reasonable
compromise result though.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Karlis Zigurs <ho...@gmail.com>.
Curveball: Is there a need to use lambdas quite yet?

On Mon, Apr 4, 2016 at 10:58 PM, Ofir Manor <of...@equalum.io> wrote:
> I think that a backup plan could be to announce that JDK7 is deprecated in
> Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives
> admins enough warning to install JDK8 along side their "main" JDK (or fully
> migrate to it), while allowing the project to merge JDK8-specific changes to
> trunk right after the 2.0 release.
>
> However, I personally think it is better to drop JDK7 now. I'm sure that
> both the community and the distributors (Databricks, Cloudera, Hortonworks,
> MapR, IBM etc) will all rush to help their customers migrate their
> environment to support Spark 2.0, so I think any backlash won't be dramatic
> or lasting.
>
> Just my two cents,
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io
>
>
> On Mon, Apr 4, 2016 at 6:48 PM, Luciano Resende <lu...@gmail.com>
> wrote:
>>
>> Reynold,
>>
>> Considering the performance improvements you mentioned in your original
>> e-mail and also considering that few other big data projects have already or
>> are in progress of abandoning JDK 7, I think it would benefit Spark if we go
>> with JDK 8.0 only.
>>
>> Are there users that will be less aggressive ? Yes, but those would most
>> likely be in more stable releases like 1.6.x.
>>
>> On Sun, Apr 3, 2016 at 10:28 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>
>>> Since my original email, I've talked to a lot more users and looked at
>>> what various environments support. It is true that a lot of enterprises, and
>>> even some technology companies, are still using Java 7. One thing is that up
>>> until this date, users still can't install openjdk 8 on Ubuntu by default. I
>>> see that as an indication that it is too early to drop Java 7.
>>>
>>> Looking at the timeline, JDK release a major new version roughly every 3
>>> years. We dropped Java 6 support one year ago, so from a timeline point of
>>> view we would be very aggressive here if we were to drop Java 7 support in
>>> Spark 2.0.
>>>
>>> Note that not dropping Java 7 support now doesn't mean we have to support
>>> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
>>> though Spark 1.0 started with Java 6.
>>>
>>> In terms of testing, Josh has actually improved our test infra so now we
>>> would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>>>
>>>
>>>
>>>
>>> On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:
>>>>
>>>> Arguments are really convincing; new Dataset API as well as performance
>>>>
>>>> improvements is exiting, so I'm personally +1 on moving onto Java8.
>>>>
>>>>
>>>>
>>>> However, I'm afraid Tencent is one of "the organizations stuck with
>>>> Java7"
>>>>
>>>> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out,
>>>> and
>>>>
>>>> wouldn't upgrade to Java8 until Java9 is out.
>>>>
>>>>
>>>> So:
>>>>
>>>> (non-binding) +1 on dropping scala 2.10 support
>>>>
>>>> (non-binding)  -1 on dropping Java 7 support
>>>>
>>>>                       * as long as we figure out a practical way to run
>>>> Spark with
>>>>
>>>>                         JDK8 on JDK7 clusters, this -1 would then
>>>> definitely be +1
>>>>
>>>>
>>>> Thanks !
>>>>
>>>>
>>>> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>>
>>>>> i think that logic is reasonable, but then the same should also apply
>>>>> to scala 2.10, which is also unmaintained/unsupported at this point
>>>>> (basically has been since march 2015 except for one hotfix due to a license
>>>>> incompatibility)
>>>>>
>>>>> who wants to support scala 2.10 three years after they did the last
>>>>> maintenance release?
>>>>>
>>>>>
>>>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Removing compatibility (with jdk, etc) can be done with a major
>>>>>> release- given that 7 has been EOLed a while back and is now unsupported, we
>>>>>> have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>>>>>
>>>>>> Given the functionality & performance benefits of going to jdk8,
>>>>>> future enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>>>>> requires it, and simplicity wrt code, test & support it looks like a good
>>>>>> checkpoint to drop jdk7 support.
>>>>>>
>>>>>> As already mentioned in the thread, existing yarn clusters are
>>>>>> unaffected if they want to continue running jdk7 and yet use spark2 (install
>>>>>> jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>>>>>> archive - suboptimal).
>>>>>> I am unsure about mesos (standalone might be easier upgrade I guess
>>>>>> ?).
>>>>>>
>>>>>>
>>>>>> Proposal is for 1.6x line to continue to be supported with critical
>>>>>> fixes; newer features will require 2.x and so jdk8
>>>>>>
>>>>>> Regards
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>>>>> wrote:
>>>>>>> > If you want to go down that route, you should also ask somebody who
>>>>>>> > has had
>>>>>>> > experience managing a large organization's applications and try to
>>>>>>> > update
>>>>>>> > Scala version.
>>>>>>>
>>>>>>> I understand both sides. But if you look at what I've been asking
>>>>>>> since the beginning, it's all about the cost and benefits of dropping
>>>>>>> support for java 1.7.
>>>>>>>
>>>>>>> The biggest argument in your original e-mail is about testing. And
>>>>>>> the
>>>>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>>>>> be even possible to just do everything in a single job - compile for
>>>>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>>>>> which seems to be the main thing you were worried about.
>>>>>>>
>>>>>>>
>>>>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin
>>>>>>> > <va...@cloudera.com> wrote:
>>>>>>> >>
>>>>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>>>>> >> wrote:
>>>>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
>>>>>>> >> > than
>>>>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and
>>>>>>> >> > 2.11 are
>>>>>>> >> > not
>>>>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible
>>>>>>> >> > except
>>>>>>> >> > certain
>>>>>>> >> > esoteric cases.
>>>>>>> >>
>>>>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>>>>> >> their applications and everything... binary compatibility is a
>>>>>>> >> tiny
>>>>>>> >> drop in that bucket.
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Marcelo
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Marcelo
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org


Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.
> On 4 Apr 2016, at 20:58, Ofir Manor <of...@equalum.io> wrote:
> 
> I think that a backup plan could be to announce that JDK7 is deprecated in Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives admins enough warning to install JDK8 along side their "main" JDK (or fully migrate to it), while allowing the project to merge JDK8-specific changes to trunk right after the 2.0 release.
> 

Announcing a plan is good; anything which can be done to help mixed JVM deployment (documentation, testing) would be useful too

> However, I personally think it is better to drop JDK7 now. I'm sure that both the community and the distributors (Databricks, Cloudera, Hortonworks, MapR, IBM etc) will all rush to help their customers migrate their environment to support Spark 2.0, so I think any backlash won't be dramatic or lasting. 
> 

People using Spark tend to be pretty aggressive about wanting the latest version, at least on the 1.x line; so far there've been no major problems allowing mixed spark version deployments, provided shared bits of infrastructure (spark history server) were recent. Hive metadata repository access is the other big issue: moving spark up to hive  1.2.1 addresses that for the moment.

I don't know about organisations adoption of JDK8 vs 7; or how anyone would react to having to move to java 8 for spark 2. Maybe it'll be a barrier to adoption —maybe it'll be an incentive to upgrade.

Oh, I do know that Java 9 is going to be trouble. Different topic.

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Ofir Manor <of...@equalum.io>.
I think that a backup plan could be to announce that JDK7 is deprecated in
Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives
admins enough warning to install JDK8 along side their "main" JDK (or fully
migrate to it), while allowing the project to merge JDK8-specific changes
to trunk right after the 2.0 release.

However, I personally think it is better to drop JDK7 now. I'm sure that
both the community and the distributors (Databricks, Cloudera, Hortonworks,
MapR, IBM etc) will all rush to help their customers migrate their
environment to support Spark 2.0, so I think any backlash won't be dramatic
or lasting.

Just my two cents,

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io

On Mon, Apr 4, 2016 at 6:48 PM, Luciano Resende <lu...@gmail.com>
wrote:

> Reynold,
>
> Considering the performance improvements you mentioned in your original
> e-mail and also considering that few other big data projects have already
> or are in progress of abandoning JDK 7, I think it would benefit Spark if
> we go with JDK 8.0 only.
>
> Are there users that will be less aggressive ? Yes, but those would most
> likely be in more stable releases like 1.6.x.
>
> On Sun, Apr 3, 2016 at 10:28 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Since my original email, I've talked to a lot more users and looked at
>> what various environments support. It is true that a lot of enterprises,
>> and even some technology companies, are still using Java 7. One thing is
>> that up until this date, users still can't install openjdk 8 on Ubuntu by
>> default. I see that as an indication that it is too early to drop Java 7.
>>
>> Looking at the timeline, JDK release a major new version roughly every 3
>> years. We dropped Java 6 support one year ago, so from a timeline point of
>> view we would be very aggressive here if we were to drop Java 7 support in
>> Spark 2.0.
>>
>> Note that not dropping Java 7 support now doesn't mean we have to support
>> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
>> though Spark 1.0 started with Java 6.
>>
>> In terms of testing, Josh has actually improved our test infra so now we
>> would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>>
>>
>>
>>
>> On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:
>>
>>> Arguments are really convincing; new Dataset API as well as performance
>>>
>>> improvements is exiting, so I'm personally +1 on moving onto Java8.
>>>
>>>
>>>
>>> However, I'm afraid Tencent is one of "the organizations stuck with
>>> Java7"
>>>
>>> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out,
>>> and
>>>
>>> wouldn't upgrade to Java8 until Java9 is out.
>>>
>>>
>>> So:
>>>
>>> (non-binding) +1 on dropping scala 2.10 support
>>>
>>> (non-binding)  -1 on dropping Java 7 support
>>>
>>>                       * as long as we figure out a practical way to run
>>> Spark with
>>>
>>>                         JDK8 on JDK7 clusters, this -1 would then
>>> definitely be +1
>>>
>>>
>>> Thanks !
>>>
>>> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> i think that logic is reasonable, but then the same should also apply
>>>> to scala 2.10, which is also unmaintained/unsupported at this point
>>>> (basically has been since march 2015 except for one hotfix due to a license
>>>> incompatibility)
>>>>
>>>> who wants to support scala 2.10 three years after they did the last
>>>> maintenance release?
>>>>
>>>>
>>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Removing compatibility (with jdk, etc) can be done with a major
>>>>> release- given that 7 has been EOLed a while back and is now unsupported,
>>>>> we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from
>>>>> now).
>>>>>
>>>>> Given the functionality & performance benefits of going to jdk8,
>>>>> future enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>>>> requires it, and simplicity wrt code, test & support it looks like a good
>>>>> checkpoint to drop jdk7 support.
>>>>>
>>>>> As already mentioned in the thread, existing yarn clusters are
>>>>> unaffected if they want to continue running jdk7 and yet use
>>>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>>>> distribute jdk8 as archive - suboptimal).
>>>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>>>
>>>>>
>>>>> Proposal is for 1.6x line to continue to be supported with critical
>>>>> fixes; newer features will require 2.x and so jdk8
>>>>>
>>>>> Regards
>>>>> Mridul
>>>>>
>>>>>
>>>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>>>> wrote:
>>>>>> > If you want to go down that route, you should also ask somebody who
>>>>>> has had
>>>>>> > experience managing a large organization's applications and try to
>>>>>> update
>>>>>> > Scala version.
>>>>>>
>>>>>> I understand both sides. But if you look at what I've been asking
>>>>>> since the beginning, it's all about the cost and benefits of dropping
>>>>>> support for java 1.7.
>>>>>>
>>>>>> The biggest argument in your original e-mail is about testing. And the
>>>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>>>> be even possible to just do everything in a single job - compile for
>>>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>>>> which seems to be the main thing you were worried about.
>>>>>>
>>>>>>
>>>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <
>>>>>> vanzin@cloudera.com> wrote:
>>>>>> >>
>>>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>>>> wrote:
>>>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
>>>>>> than
>>>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and
>>>>>> 2.11 are
>>>>>> >> > not
>>>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible
>>>>>> except
>>>>>> >> > certain
>>>>>> >> > esoteric cases.
>>>>>> >>
>>>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>>>> >> their applications and everything... binary compatibility is a tiny
>>>>>> >> drop in that bucket.
>>>>>> >>
>>>>>> >> --
>>>>>> >> Marcelo
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Marcelo
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>
>>>>>>
>>>>
>>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Luciano Resende <lu...@gmail.com>.
Reynold,

Considering the performance improvements you mentioned in your original
e-mail and also considering that few other big data projects have already
or are in progress of abandoning JDK 7, I think it would benefit Spark if
we go with JDK 8.0 only.

Are there users that will be less aggressive ? Yes, but those would most
likely be in more stable releases like 1.6.x.

On Sun, Apr 3, 2016 at 10:28 PM, Reynold Xin <rx...@databricks.com> wrote:

> Since my original email, I've talked to a lot more users and looked at
> what various environments support. It is true that a lot of enterprises,
> and even some technology companies, are still using Java 7. One thing is
> that up until this date, users still can't install openjdk 8 on Ubuntu by
> default. I see that as an indication that it is too early to drop Java 7.
>
> Looking at the timeline, JDK release a major new version roughly every 3
> years. We dropped Java 6 support one year ago, so from a timeline point of
> view we would be very aggressive here if we were to drop Java 7 support in
> Spark 2.0.
>
> Note that not dropping Java 7 support now doesn't mean we have to support
> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
> though Spark 1.0 started with Java 6.
>
> In terms of testing, Josh has actually improved our test infra so now we
> would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>
>
>
>
> On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:
>
>> Arguments are really convincing; new Dataset API as well as performance
>>
>> improvements is exiting, so I'm personally +1 on moving onto Java8.
>>
>>
>>
>> However, I'm afraid Tencent is one of "the organizations stuck with
>> Java7"
>>
>> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and
>>
>> wouldn't upgrade to Java8 until Java9 is out.
>>
>>
>> So:
>>
>> (non-binding) +1 on dropping scala 2.10 support
>>
>> (non-binding)  -1 on dropping Java 7 support
>>
>>                       * as long as we figure out a practical way to run
>> Spark with
>>
>>                         JDK8 on JDK7 clusters, this -1 would then
>> definitely be +1
>>
>>
>> Thanks !
>>
>> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>>> i think that logic is reasonable, but then the same should also apply to
>>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>>> has been since march 2015 except for one hotfix due to a license
>>> incompatibility)
>>>
>>> who wants to support scala 2.10 three years after they did the last
>>> maintenance release?
>>>
>>>
>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>>> wrote:
>>>
>>>> Removing compatibility (with jdk, etc) can be done with a major
>>>> release- given that 7 has been EOLed a while back and is now unsupported,
>>>> we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from
>>>> now).
>>>>
>>>> Given the functionality & performance benefits of going to jdk8, future
>>>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>>> requires it, and simplicity wrt code, test & support it looks like a good
>>>> checkpoint to drop jdk7 support.
>>>>
>>>> As already mentioned in the thread, existing yarn clusters are
>>>> unaffected if they want to continue running jdk7 and yet use
>>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>>> distribute jdk8 as archive - suboptimal).
>>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>>
>>>>
>>>> Proposal is for 1.6x line to continue to be supported with critical
>>>> fixes; newer features will require 2.x and so jdk8
>>>>
>>>> Regards
>>>> Mridul
>>>>
>>>>
>>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com>
>>>> wrote:
>>>>
>>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>> > If you want to go down that route, you should also ask somebody who
>>>>> has had
>>>>> > experience managing a large organization's applications and try to
>>>>> update
>>>>> > Scala version.
>>>>>
>>>>> I understand both sides. But if you look at what I've been asking
>>>>> since the beginning, it's all about the cost and benefits of dropping
>>>>> support for java 1.7.
>>>>>
>>>>> The biggest argument in your original e-mail is about testing. And the
>>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>>> be even possible to just do everything in a single job - compile for
>>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>>> which seems to be the main thing you were worried about.
>>>>>
>>>>>
>>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
>>>>> than
>>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and
>>>>> 2.11 are
>>>>> >> > not
>>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible
>>>>> except
>>>>> >> > certain
>>>>> >> > esoteric cases.
>>>>> >>
>>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>>> >> their applications and everything... binary compatibility is a tiny
>>>>> >> drop in that bucket.
>>>>> >>
>>>>> >> --
>>>>> >> Marcelo
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>
>>>>>
>>>
>>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/