You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Reynold Xin <rx...@databricks.com> on 2016/03/24 08:27:45 UTC

[discuss] ending support for Java 7 in Spark 2.0

About a year ago we decided to drop Java 6 support in Spark 1.5. I am
wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
Spark 2.0 would require Java 8 to run).

Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
removed public downloads for JDK 7 in July 2015. In the past I've actually
been against dropping Java 8, but today I ran into an issue with the new
Dataset API not working well with Java 8 lambdas, and that changed my
opinion on this.

I've been thinking more about this issue today and also talked with a lot
people offline to gather feedback, and I actually think the pros outweighs
the cons, for the following reasons (in some rough order of importance):

1. It is complicated to test how well Spark APIs work for Java lambdas if
we support Java 7. Jenkins machines need to have both Java 7 and Java 8
installed and we must run through a set of test suites in 7, and then the
lambda tests in Java 8. This complicates build environments/scripts, and
makes them less robust. Without good testing infrastructure, I have no
confidence in building good APIs for Java 8.

2. Dataset/DataFrame performance will be between 1x to 10x slower in Java
7. The primary APIs we want users to use in Spark 2.x are
Dataset/DataFrame, and this impacts pretty much everything from machine
learning to structured streaming. We have made great progress in their
performance through extensive use of code generation. (In many dimensions
Spark 2.0 with DataFrames/Datasets looks more like a compiler than a
MapReduce or query engine.) These optimizations don't work well in Java 7
due to broken code cache flushing. This problem has been fixed by Oracle in
Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.

3. Scala 2.12 will come out soon, and we will want to add support for that.
Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a
fairly complicated compatibility matrix and testing infrastructure.

4. There are libraries that I've looked into in the past that support only
Java 8. This is more common in high performance libraries such as Aeron (a
messaging library). Having to support Java 7 means we are not able to use
these. It is not that big of a deal right now, but will become increasingly
more difficult as we optimize performance.


The downside of not supporting Java 7 is also obvious. Some organizations
are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without
upgrading Java.

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Mridul Muralidharan <mr...@gmail.com>.

+1
Agree, dropping support for java 7 is long overdue - and 2.0 would be
a logical release to do this on.

Regards,
Mridul


On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <rx...@databricks.com> wrote:
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> Spark 2.0 would require Java 8 to run).
>
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> removed public downloads for JDK 7 in July 2015. In the past I've actually
> been against dropping Java 8, but today I ran into an issue with the new
> Dataset API not working well with Java 8 lambdas, and that changed my
> opinion on this.
>
> I've been thinking more about this issue today and also talked with a lot
> people offline to gather feedback, and I actually think the pros outweighs
> the cons, for the following reasons (in some rough order of importance):
>
> 1. It is complicated to test how well Spark APIs work for Java lambdas if we
> support Java 7. Jenkins machines need to have both Java 7 and Java 8
> installed and we must run through a set of test suites in 7, and then the
> lambda tests in Java 8. This complicates build environments/scripts, and
> makes them less robust. Without good testing infrastructure, I have no
> confidence in building good APIs for Java 8.
>
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7.
> The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame,
> and this impacts pretty much everything from machine learning to structured
> streaming. We have made great progress in their performance through
> extensive use of code generation. (In many dimensions Spark 2.0 with
> DataFrames/Datasets looks more like a compiler than a MapReduce or query
> engine.) These optimizations don't work well in Java 7 due to broken code
> cache flushing. This problem has been fixed by Oracle in Java 8. In
> addition, Java 8 comes with better support for Unsafe and SIMD.
>
> 3. Scala 2.12 will come out soon, and we will want to add support for that.
> Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a fairly
> complicated compatibility matrix and testing infrastructure.
>
> 4. There are libraries that I've looked into in the past that support only
> Java 8. This is more common in high performance libraries such as Aeron (a
> messaging library). Having to support Java 7 means we are not able to use
> these. It is not that big of a deal right now, but will become increasingly
> more difficult as we optimize performance.
>
>
> The downside of not supporting Java 7 is also obvious. Some organizations
> are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without
> upgrading Java.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Ram Sriharsha <sr...@gmail.com>.

+1, yes Java 7 has been end of life for a year now, 2.0 is a good time to
upgrade to Java 8

On Thu, Mar 24, 2016 at 12:42 AM, Raymond Honderdors <
Raymond.Honderdors@sizmek.com> wrote:

> Very good points
>
>
>
> Going to support java 8 looks like a good direction
>
> 2.0 would be a good release to start with that
>
>
>
> *Raymond Honderdors *
>
> *Team Lead Analytics BI*
>
> *Business Intelligence Developer *
>
> *raymond.honderdors@sizmek.com <ra...@sizmek.com> *
>
> *T +972.7325.3569*
>
> *Herzliya*
>
>
>
> *From:* Reynold Xin [mailto:rxin@databricks.com]
> *Sent:* Thursday, March 24, 2016 9:37 AM
> *To:* dev@spark.apache.org
> *Subject:* Re: [discuss] ending support for Java 7 in Spark 2.0
>
>
>
> One other benefit that I didn't mention is that we'd be able to use Java
> 8's Optional class to replace our built-in Optional.
>
>
>
>
>
> On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <rx...@databricks.com> wrote:
>
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> Spark 2.0 would require Java 8 to run).
>
>
>
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> removed public downloads for JDK 7 in July 2015. In the past I've actually
> been against dropping Java 8, but today I ran into an issue with the new
> Dataset API not working well with Java 8 lambdas, and that changed my
> opinion on this.
>
>
>
> I've been thinking more about this issue today and also talked with a lot
> people offline to gather feedback, and I actually think the pros outweighs
> the cons, for the following reasons (in some rough order of importance):
>
>
>
> 1. It is complicated to test how well Spark APIs work for Java lambdas if
> we support Java 7. Jenkins machines need to have both Java 7 and Java 8
> installed and we must run through a set of test suites in 7, and then the
> lambda tests in Java 8. This complicates build environments/scripts, and
> makes them less robust. Without good testing infrastructure, I have no
> confidence in building good APIs for Java 8.
>
>
>
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java
> 7. The primary APIs we want users to use in Spark 2.x are
> Dataset/DataFrame, and this impacts pretty much everything from machine
> learning to structured streaming. We have made great progress in their
> performance through extensive use of code generation. (In many dimensions
> Spark 2.0 with DataFrames/Datasets looks more like a compiler than a
> MapReduce or query engine.) These optimizations don't work well in Java 7
> due to broken code cache flushing. This problem has been fixed by Oracle in
> Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.
>
>
>
> 3. Scala 2.12 will come out soon, and we will want to add support for
> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a
> fairly complicated compatibility matrix and testing infrastructure.
>
>
>
> 4. There are libraries that I've looked into in the past that support only
> Java 8. This is more common in high performance libraries such as Aeron (a
> messaging library). Having to support Java 7 means we are not able to use
> these. It is not that big of a deal right now, but will become increasingly
> more difficult as we optimize performance.
>
>
>
>
>
> The downside of not supporting Java 7 is also obvious. Some organizations
> are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without
> upgrading Java.
>
>
>
>
>
>
>



-- 
Ram Sriharsha
Architect, Spark and Data Science
Hortonworks, 2550 Great America Way, 2nd Floor
Santa Clara, CA 95054
Ph: 408-510-8635
email: harsha@apache.org

[image: https://www.linkedin.com/in/harsha340]
<https://www.linkedin.com/in/harsha340> <https://twitter.com/halfabrane>
<https://github.com/harsha2010/>

RE: [discuss] ending support for Java 7 in Spark 2.0

Posted by Raymond Honderdors <Ra...@sizmek.com>.

Very good points

Going to support java 8 looks like a good direction
2.0 would be a good release to start with that

Raymond Honderdors
Team Lead Analytics BI
Business Intelligence Developer
raymond.honderdors@sizmek.com<ma...@sizmek.com>
T +972.7325.3569
Herzliya

From: Reynold Xin [mailto:rxin@databricks.com]
Sent: Thursday, March 24, 2016 9:37 AM
To: dev@spark.apache.org
Subject: Re: [discuss] ending support for Java 7 in Spark 2.0

One other benefit that I didn't mention is that we'd be able to use Java 8's Optional class to replace our built-in Optional.


On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <rx...@databricks.com>> wrote:
About a year ago we decided to drop Java 6 support in Spark 1.5. I am wondering if we should also just drop Java 7 support in Spark 2.0 (i.e. Spark 2.0 would require Java 8 to run).

Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and removed public downloads for JDK 7 in July 2015. In the past I've actually been against dropping Java 8, but today I ran into an issue with the new Dataset API not working well with Java 8 lambdas, and that changed my opinion on this.

I've been thinking more about this issue today and also talked with a lot people offline to gather feedback, and I actually think the pros outweighs the cons, for the following reasons (in some rough order of importance):

1. It is complicated to test how well Spark APIs work for Java lambdas if we support Java 7. Jenkins machines need to have both Java 7 and Java 8 installed and we must run through a set of test suites in 7, and then the lambda tests in Java 8. This complicates build environments/scripts, and makes them less robust. Without good testing infrastructure, I have no confidence in building good APIs for Java 8.

2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7. The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame, and this impacts pretty much everything from machine learning to structured streaming. We have made great progress in their performance through extensive use of code generation. (In many dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler than a MapReduce or query engine.) These optimizations don't work well in Java 7 due to broken code cache flushing. This problem has been fixed by Oracle in Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.

3. Scala 2.12 will come out soon, and we will want to add support for that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a fairly complicated compatibility matrix and testing infrastructure.

4. There are libraries that I've looked into in the past that support only Java 8. This is more common in high performance libraries such as Aeron (a messaging library). Having to support Java 7 means we are not able to use these. It is not that big of a deal right now, but will become increasingly more difficult as we optimize performance.


The downside of not supporting Java 7 is also obvious. Some organizations are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without upgrading Java.

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

One other benefit that I didn't mention is that we'd be able to use Java
8's Optional class to replace our built-in Optional.


On Thu, Mar 24, 2016 at 12:27 AM, Reynold Xin <rx...@databricks.com> wrote:

> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> Spark 2.0 would require Java 8 to run).
>
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> removed public downloads for JDK 7 in July 2015. In the past I've actually
> been against dropping Java 8, but today I ran into an issue with the new
> Dataset API not working well with Java 8 lambdas, and that changed my
> opinion on this.
>
> I've been thinking more about this issue today and also talked with a lot
> people offline to gather feedback, and I actually think the pros outweighs
> the cons, for the following reasons (in some rough order of importance):
>
> 1. It is complicated to test how well Spark APIs work for Java lambdas if
> we support Java 7. Jenkins machines need to have both Java 7 and Java 8
> installed and we must run through a set of test suites in 7, and then the
> lambda tests in Java 8. This complicates build environments/scripts, and
> makes them less robust. Without good testing infrastructure, I have no
> confidence in building good APIs for Java 8.
>
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java
> 7. The primary APIs we want users to use in Spark 2.x are
> Dataset/DataFrame, and this impacts pretty much everything from machine
> learning to structured streaming. We have made great progress in their
> performance through extensive use of code generation. (In many dimensions
> Spark 2.0 with DataFrames/Datasets looks more like a compiler than a
> MapReduce or query engine.) These optimizations don't work well in Java 7
> due to broken code cache flushing. This problem has been fixed by Oracle in
> Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.
>
> 3. Scala 2.12 will come out soon, and we will want to add support for
> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a
> fairly complicated compatibility matrix and testing infrastructure.
>
> 4. There are libraries that I've looked into in the past that support only
> Java 8. This is more common in high performance libraries such as Aeron (a
> messaging library). Having to support Java 7 means we are not able to use
> these. It is not that big of a deal right now, but will become increasingly
> more difficult as we optimize performance.
>
>
> The downside of not supporting Java 7 is also obvious. Some organizations
> are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without
> upgrading Java.
>
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.

> On 24 Mar 2016, at 07:27, Reynold Xin <rx...@databricks.com> wrote:
> 
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am wondering if we should also just drop Java 7 support in Spark 2.0 (i.e. Spark 2.0 would require Java 8 to run).
> 
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and removed public downloads for JDK 7 in July 2015.

Still there, Jan 2016 was the last public one.

> In the past I've actually been against dropping Java 8, but today I ran into an issue with the new Dataset API not working well with Java 8 lambdas, and that changed my opinion on this.
> 
> I've been thinking more about this issue today and also talked with a lot people offline to gather feedback, and I actually think the pros outweighs the cons, for the following reasons (in some rough order of importance):
> 
> 1. It is complicated to test how well Spark APIs work for Java lambdas if we support Java 7. Jenkins machines need to have both Java 7 and Java 8 installed and we must run through a set of test suites in 7, and then the lambda tests in Java 8. This complicates build environments/scripts, and makes them less robust. Without good testing infrastructure, I have no confidence in building good APIs for Java 8.

+complicates the test matrix for problems: if something works on java 8 and fails on java 7, is that a java 8 problem or a java 7 one?
+most developers would want to be on java 8 on their desktop if they could; the risk is that people accidentally code for java 8 even if they don't realise it just by using java 8 libraries, etc

> 
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7. The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame, and this impacts pretty much everything from machine learning to structured streaming. We have made great progress in their performance through extensive use of code generation. (In many dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler than a MapReduce or query engine.) These optimizations don't work well in Java 7 due to broken code cache flushing. This problem has been fixed by Oracle in Java 8. In addition, Java 8 comes with better support for Unsafe and SIMD.
> 
> 3. Scala 2.12 will come out soon, and we will want to add support for that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a fairly complicated compatibility matrix and testing infrastructure.
> 
> 4. There are libraries that I've looked into in the past that support only Java 8. This is more common in high performance libraries such as Aeron (a messaging library). Having to support Java 7 means we are not able to use these. It is not that big of a deal right now, but will become increasingly more difficult as we optimize performance.
> 
> 
> The downside of not supporting Java 7 is also obvious. Some organizations are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without upgrading Java.
> 


One thing you have to consider here is : will the organisations that don't want to upgrade to java 8 want to be upgrading to spark 2.0 anyway? 
> 

If there is a price, it means all apps that use any remote Spark APIs will also have to be java 8. Something like a REST API is less of an issue, but anything loading an JAR in the group org.apache.spark will have to be Java 8+. That's what held hadoop back on Java 7 in 2015 : twitter made the case that it shouldn't be the hadoop cluster forcing them to upgrade all their client apps just to use the IPC and filesystem code.I don't believe that's so much of a constraint on Spark.

Finally, Java 8 lines you up better for worrying about Java 9, which is on the horizon.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Andrew Ash <an...@andrewash.com>.

Spark 2.x has to be the time for Java 8.

I'd rather increase JVM major version on a Spark major version than on a
Spark minor version, and I'd rather Spark do that upgrade for the 2.x
series than the 3.x series (~2yr from now based on the lifetime of Spark
1.x).  If we wait until the next opportunity for a breaking change to Spark
(3.x) we might be upgrading to Java 9 at that point rather than Java 8.

If Spark users need Java 7 they are free to continue using the 1.x series,
the same way that folks who need Java 6 are free to continue using 1.4

On Thu, Mar 24, 2016 at 11:46 AM, Stephen Boesch <ja...@gmail.com> wrote:

> +1 for java8 only   +1 for 2.11+ only .    At this point scala libraries
> supporting only 2.10 are typically less active and/or poorly maintained.
> That trend will only continue when considering the lifespan of spark 2.X.
>
> 2016-03-24 11:32 GMT-07:00 Steve Loughran <st...@hortonworks.com>:
>
>>
>> On 24 Mar 2016, at 15:27, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> i think the arguments are convincing, but it also makes me wonder if i
>> live in some kind of alternate universe... we deploy on customers clusters,
>> where the OS, python version, java version and hadoop distro are not chosen
>> by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
>> have access to a single proxy machine and launch through yarn. asking them
>> to upgrade java is pretty much out of the question or a 6+ month ordeal. of
>> the 10 client clusters i can think of on the top of my head all of them are
>> on java 7, none are on java 8. so by doing this you would make spark 2
>> basically unusable for us (unless most of them have plans of upgrading in
>> near term to java 8, i will ask around and report back...).
>>
>>
>>
>> It's not actually mandatory for the process executing in the Yarn cluster
>> to run with the same JVM as the rest of the Hadoop stack; all that is
>> needed is for the environment variables to set up the JAVA_HOME and PATH.
>> Switching JVMs not something which YARN makes it easy to do, but it may be
>> possible, especially if Spark itself provides some hooks, so you don't have
>> to manually lay with setting things up. That may be something which could
>> significantly ease adoption of Spark 2 in YARN clusters. Same for Python.
>>
>> This is something I could probably help others to address
>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Mark Hamstra <ma...@clearstorydata.com>.

There aren't many such libraries, but there are a few.  When faced with one
of those dependencies that still doesn't go beyond 2.10, you essentially
have the choice of taking on the maintenance burden to bring the library up
to date, or you do what is potentially a fairly larger refactoring to use
an alternative but well-maintained library.

On Thu, Mar 24, 2016 at 4:53 PM, Kostas Sakellis <ko...@cloudera.com>
wrote:

> In addition, with Spark 2.0, we are throwing away binary compatibility
> anyways so user applications will have to be recompiled.
>
> The only argument I can see is for libraries that have already been built
> on Scala 2.10 that are no longer being maintained. How big of an issue do
> we think that is?
>
> Kostas
>
> On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are
>> not
>> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>> certain
>> > esoteric cases.
>>
>> True, but ask anyone who manages a large cluster how long it would
>> take them to upgrade the jdk across their cluster and validate all
>> their applications and everything... binary compatibility is a tiny
>> drop in that bucket.
>>
>> --
>> Marcelo
>>
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Kostas Sakellis <ko...@cloudera.com>.

In addition, with Spark 2.0, we are throwing away binary compatibility
anyways so user applications will have to be recompiled.

The only argument I can see is for libraries that have already been built
on Scala 2.10 that are no longer being maintained. How big of an issue do
we think that is?

Kostas

On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:
> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are
> not
> > binary compatible, whereas JVM 7 and 8 are binary compatible except
> certain
> > esoteric cases.
>
> True, but ask anyone who manages a large cluster how long it would
> take them to upgrade the jdk across their cluster and validate all
> their applications and everything... binary compatibility is a tiny
> drop in that bucket.
>
> --
> Marcelo
>

RE: [discuss] ending support for Java 7 in Spark 2.0

Posted by Raymond Honderdors <Ra...@sizmek.com>.

Maybe the question should be how far back should spark be compatible?


There is nothings stopping people to run spark 1.6.x with jdk 7 or scala 2.10 or Hadoop <2.6
But if they want spark 2.x they should consider a migration to jdk8 and scala 2.11

Or am I getting it all wrong?


Raymond Honderdors
Team Lead Analytics BI
Business Intelligence Developer
raymond.honderdors@sizmek.com<ma...@sizmek.com>
T +972.7325.3569
Herzliya

From: Tom Graves [mailto:tgraves_cs@yahoo.com.INVALID]
Sent: Wednesday, March 30, 2016 4:46 PM
To: Steve Loughran <st...@hortonworks.com>
Cc: Reynold Xin <rx...@databricks.com>; Koert Kuipers <ko...@tresata.com>; Kostas Sakellis <ko...@cloudera.com>; Marcelo Vanzin <va...@cloudera.com>; dev@spark.apache.org
Subject: Re: [discuss] ending support for Java 7 in Spark 2.0

Steve, those are good points, I had forgotten Hadoop had those issues.    We run with jdk 8, hadoop is built for jdk7 compatibility, we are running hadoop 2.7 on our clusters and by the time Spark 2.0 is out I would expected a mix of Hadoop 2.7 and 2.8.  We also don't use spnego.

I didn't quite follow what you were saying with the hadoop services being on jdk7.  Are you saying building spark with say hadoop 2.8 libraries but your hadoop cluster is running hadoop 2.6 or less? If so I would agree that isn't a good idea.

Personally and from Yahoo point I'm still fine with going to jdk8 but I could see where other people are on older versions of Hadoop where it might be a problem.

Tom

On Wednesday, March 30, 2016 5:42 AM, Steve Loughran <st...@hortonworks.com>> wrote:


Can I note that if Spark 2.0 is going to be Java 8+ only, then that means Hadoop 2.6.x should be the minimum Hadoop version.

https://issues.apache.org/jira/browse/HADOOP-11090

Where things get complicated, is that situation of: Hadoop services on Java 7, Spark on Java 8 in its own JVM

I'm not sure that you could get away with having the newer version of the Hadoop classes in the spark assembly/lib dir, without coming up against incompatibilities with the Hadoop JNI libraries. These are currently backwards compatible, but trying to link up Hadoop 2.7 against a Hadoop 2.6 hadoop lib will generate an UnsatisfiedLinkException. Meaning: the whole cluster's hadoop libs have to be in sync, or at least the main cluster release in a version of hadoop 2.x >= the spark bundled edition.

Ignoring that detail,

Hadoop 2.6.1+
Guava >= 15? 17?

 I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug reports will be met with a "please upgrade, re-open if the problem is still there".

Kerberos is  a particular troublespot here : You need Hadoop 2.6.1+ for Kerberos to work in Java 8 and recent versions of Java 7 (HADOOP-10786)

Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about pulling that into 2.7.x, though I'm reluctant to go near 2.6 just to keep that extra stable.


Thomas: you've got the big clusters, what versions of Hadoop will they be on by the time you look at Spark 2.0?

-Steve

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

Steve, those are good points, I had forgotten Hadoop had those issues. We run with jdk 8, hadoop is built for jdk7 compatibility, we are running hadoop 2.7 on our clusters and by the time Spark 2.0 is out I would expected a mix of Hadoop 2.7 and 2.8. We also don't use spnego.
I didn't quite follow what you were saying with the hadoop services being on jdk7. Are you saying building spark with say hadoop 2.8 libraries but your hadoop cluster is running hadoop 2.6 or less? If so I would agree that isn't a good idea.
Personally and from Yahoo point I'm still fine with going to jdk8 but I could see where other people are on older versions of Hadoop where it might be a problem.
Tom

On Wednesday, March 30, 2016 5:42 AM, Steve Loughran <st...@hortonworks.com> wrote:

Can I note that if Spark 2.0 is going to be Java 8+ only, then that means Hadoop 2.6.x should be the minimum Hadoop version.
https://issues.apache.org/jira/browse/HADOOP-11090
Where things get complicated, is that situation of: Hadoop services on Java 7, Spark on Java 8 in its own JVM
I'm not sure that you could get away with having the newer version of the Hadoop classes in the spark assembly/lib dir, without coming up against incompatibilities with the Hadoop JNI libraries. These are currently backwards compatible, but trying to link up Hadoop 2.7 against a Hadoop 2.6 hadoop lib will generate an UnsatisfiedLinkException. Meaning: the whole cluster's hadoop libs have to be in sync, or at least the main cluster release in a version of hadoop 2.x >= the spark bundled edition.
Ignoring that detail,
Hadoop 2.6.1+Guava >= 15? 17?
I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug reports will be met with a "please upgrade, re-open if the problem is still there".
Kerberos is a particular troublespot here : You need Hadoop 2.6.1+ for Kerberos to work in Java 8 and recent versions of Java 7 (HADOOP-10786)
Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about pulling that into 2.7.x, though I'm reluctant to go near 2.6 just to keep that extra stable.

Thomas: you've got the big clusters, what versions of Hadoop will they be on by the time you look at Spark 2.0?
-Steve

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.

Can I note that if Spark 2.0 is going to be Java 8+ only, then that means Hadoop 2.6.x should be the minimum Hadoop version.

https://issues.apache.org/jira/browse/HADOOP-11090

Where things get complicated, is that situation of: Hadoop services on Java 7, Spark on Java 8 in its own JVM

I'm not sure that you could get away with having the newer version of the Hadoop classes in the spark assembly/lib dir, without coming up against incompatibilities with the Hadoop JNI libraries. These are currently backwards compatible, but trying to link up Hadoop 2.7 against a Hadoop 2.6 hadoop lib will generate an UnsatisfiedLinkException. Meaning: the whole cluster's hadoop libs have to be in sync, or at least the main cluster release in a version of hadoop 2.x >= the spark bundled edition.

Ignoring that detail,

Hadoop 2.6.1+
Guava >= 15? 17?

 I think the outcome of Hadoop < 2.6 and JDK >= 8 is "undefined"; all bug reports will be met with a "please upgrade, re-open if the problem is still there".

Kerberos is  a particular troublespot here : You need Hadoop 2.6.1+ for Kerberos to work in Java 8 and recent versions of Java 7 (HADOOP-10786)

Note also that HADOOP-11628 is in 2.8 only. SPNEGO + CNAMES. I'll see about pulling that into 2.7.x, though I'm reluctant to go near 2.6 just to keep that extra stable.


Thomas: you've got the big clusters, what versions of Hadoop will they be on by the time you look at Spark 2.0?

-Steve

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Tom Graves <tg...@yahoo.com.INVALID>.

+1.
Tom 

    On Tuesday, March 29, 2016 1:17 PM, Reynold Xin <rx...@databricks.com> wrote:
 

 They work.

On Tue, Mar 29, 2016 at 10:01 AM, Koert Kuipers <ko...@tresata.com> wrote:

if scala prior to sbt 2.10.4 didn't support java 8, does that mean that 3rd party scala libraries compiled with a scala version < 2.10.4 might not work on java 8?


On Mon, Mar 28, 2016 at 7:06 PM, Kostas Sakellis <ko...@cloudera.com> wrote:

Also, +1 on dropping jdk7 in Spark 2.0. 
Kostas
On Mon, Mar 28, 2016 at 2:01 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

Finally got some internal feedback on this, and we're ok with
requiring people to deploy jdk8 for 2.0, so +1 too.

On Mon, Mar 28, 2016 at 1:15 PM, Luciano Resende <lu...@gmail.com> wrote:
> +1, I also checked with few projects inside IBM that consume Spark and they
> seem to be ok with the direction of droping JDK 7.
>
> On Mon, Mar 28, 2016 at 11:24 AM, Michael Gummelt <mg...@mesosphere.io>
> wrote:
>>
>> +1 from Mesosphere
>>
>> On Mon, Mar 28, 2016 at 5:12 AM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>>>
>>>
>>> > On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com> wrote:
>>> >
>>> > Removing compatibility (with jdk, etc) can be done with a major
>>> > release- given that 7 has been EOLed a while back and is now unsupported, we
>>> > have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>> >
>>> > Given the functionality & performance benefits of going to jdk8, future
>>> > enhancements relevant in 2.x timeframe ( scala, dependencies) which requires
>>> > it, and simplicity wrt code, test & support it looks like a good checkpoint
>>> > to drop jdk7 support.
>>> >
>>> > As already mentioned in the thread, existing yarn clusters are
>>> > unaffected if they want to continue running jdk7 and yet use spark2 (install
>>> > jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>>> > archive - suboptimal).
>>>
>>> you wouldn't want to dist it as an archive; it's not just the binaries,
>>> it's the install phase. And you'd better remember to put the JCE jar in on
>>> top of the JDK for kerberos to work.
>>>
>>> setting up environment vars to point to JDK8 in the launched
>>> app/container avoids that. Yes, the ops team do need to install java, but if
>>> you offer them the choice of "installing a centrally managed Java" and
>>> "having my code try and install it", they should go for the managed option.
>>>
>>> One thing to consider for 2.0 is to make it easier to set up those env
>>> vars for both python and java. And, as the techniques for mixing JDK
>>> versions is clearly not that well known, documenting it.
>>>
>>> (FWIW I've done code which even uploads it's own hadoop-* JAR, but what
>>> gets you is changes in the hadoop-native libs; you do need to get the PATH
>>> var spot on)
>>>
>>>
>>> > I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>> >
>>> >
>>> > Proposal is for 1.6x line to continue to be supported with critical
>>> > fixes; newer features will require 2.x and so jdk8
>>> >
>>> > Regards
>>> > Mridul
>>> >
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>
>>
>>
>> --
>> Michael Gummelt
>> Software Engineer
>> Mesosphere
>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/



--
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

They work.


On Tue, Mar 29, 2016 at 10:01 AM, Koert Kuipers <ko...@tresata.com> wrote:

> if scala prior to sbt 2.10.4 didn't support java 8, does that mean that
> 3rd party scala libraries compiled with a scala version < 2.10.4 might not
> work on java 8?
>
>
> On Mon, Mar 28, 2016 at 7:06 PM, Kostas Sakellis <ko...@cloudera.com>
> wrote:
>
>> Also, +1 on dropping jdk7 in Spark 2.0.
>>
>> Kostas
>>
>> On Mon, Mar 28, 2016 at 2:01 PM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>>
>>> Finally got some internal feedback on this, and we're ok with
>>> requiring people to deploy jdk8 for 2.0, so +1 too.
>>>
>>> On Mon, Mar 28, 2016 at 1:15 PM, Luciano Resende <lu...@gmail.com>
>>> wrote:
>>> > +1, I also checked with few projects inside IBM that consume Spark and
>>> they
>>> > seem to be ok with the direction of droping JDK 7.
>>> >
>>> > On Mon, Mar 28, 2016 at 11:24 AM, Michael Gummelt <
>>> mgummelt@mesosphere.io>
>>> > wrote:
>>> >>
>>> >> +1 from Mesosphere
>>> >>
>>> >> On Mon, Mar 28, 2016 at 5:12 AM, Steve Loughran <
>>> stevel@hortonworks.com>
>>> >> wrote:
>>> >>>
>>> >>>
>>> >>> > On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com>
>>> wrote:
>>> >>> >
>>> >>> > Removing compatibility (with jdk, etc) can be done with a major
>>> >>> > release- given that 7 has been EOLed a while back and is now
>>> unsupported, we
>>> >>> > have to decide if we drop support for it in 2.0 or 3.0 (2+ years
>>> from now).
>>> >>> >
>>> >>> > Given the functionality & performance benefits of going to jdk8,
>>> future
>>> >>> > enhancements relevant in 2.x timeframe ( scala, dependencies)
>>> which requires
>>> >>> > it, and simplicity wrt code, test & support it looks like a good
>>> checkpoint
>>> >>> > to drop jdk7 support.
>>> >>> >
>>> >>> > As already mentioned in the thread, existing yarn clusters are
>>> >>> > unaffected if they want to continue running jdk7 and yet use
>>> spark2 (install
>>> >>> > jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>> distribute jdk8 as
>>> >>> > archive - suboptimal).
>>> >>>
>>> >>> you wouldn't want to dist it as an archive; it's not just the
>>> binaries,
>>> >>> it's the install phase. And you'd better remember to put the JCE jar
>>> in on
>>> >>> top of the JDK for kerberos to work.
>>> >>>
>>> >>> setting up environment vars to point to JDK8 in the launched
>>> >>> app/container avoids that. Yes, the ops team do need to install
>>> java, but if
>>> >>> you offer them the choice of "installing a centrally managed Java"
>>> and
>>> >>> "having my code try and install it", they should go for the managed
>>> option.
>>> >>>
>>> >>> One thing to consider for 2.0 is to make it easier to set up those
>>> env
>>> >>> vars for both python and java. And, as the techniques for mixing JDK
>>> >>> versions is clearly not that well known, documenting it.
>>> >>>
>>> >>> (FWIW I've done code which even uploads it's own hadoop-* JAR, but
>>> what
>>> >>> gets you is changes in the hadoop-native libs; you do need to get
>>> the PATH
>>> >>> var spot on)
>>> >>>
>>> >>>
>>> >>> > I am unsure about mesos (standalone might be easier upgrade I
>>> guess ?).
>>> >>> >
>>> >>> >
>>> >>> > Proposal is for 1.6x line to continue to be supported with critical
>>> >>> > fixes; newer features will require 2.x and so jdk8
>>> >>> >
>>> >>> > Regards
>>> >>> > Mridul
>>> >>> >
>>> >>> >
>>> >>>
>>> >>>
>>> >>> ---------------------------------------------------------------------
>>> >>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> >>> For additional commands, e-mail: dev-help@spark.apache.org
>>> >>>
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Michael Gummelt
>>> >> Software Engineer
>>> >> Mesosphere
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Luciano Resende
>>> > http://twitter.com/lresende1975
>>> > http://lresende.blogspot.com/
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

if scala prior to sbt 2.10.4 didn't support java 8, does that mean that 3rd
party scala libraries compiled with a scala version < 2.10.4 might not work
on java 8?


On Mon, Mar 28, 2016 at 7:06 PM, Kostas Sakellis <ko...@cloudera.com>
wrote:

> Also, +1 on dropping jdk7 in Spark 2.0.
>
> Kostas
>
> On Mon, Mar 28, 2016 at 2:01 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> Finally got some internal feedback on this, and we're ok with
>> requiring people to deploy jdk8 for 2.0, so +1 too.
>>
>> On Mon, Mar 28, 2016 at 1:15 PM, Luciano Resende <lu...@gmail.com>
>> wrote:
>> > +1, I also checked with few projects inside IBM that consume Spark and
>> they
>> > seem to be ok with the direction of droping JDK 7.
>> >
>> > On Mon, Mar 28, 2016 at 11:24 AM, Michael Gummelt <
>> mgummelt@mesosphere.io>
>> > wrote:
>> >>
>> >> +1 from Mesosphere
>> >>
>> >> On Mon, Mar 28, 2016 at 5:12 AM, Steve Loughran <
>> stevel@hortonworks.com>
>> >> wrote:
>> >>>
>> >>>
>> >>> > On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>> >>> >
>> >>> > Removing compatibility (with jdk, etc) can be done with a major
>> >>> > release- given that 7 has been EOLed a while back and is now
>> unsupported, we
>> >>> > have to decide if we drop support for it in 2.0 or 3.0 (2+ years
>> from now).
>> >>> >
>> >>> > Given the functionality & performance benefits of going to jdk8,
>> future
>> >>> > enhancements relevant in 2.x timeframe ( scala, dependencies) which
>> requires
>> >>> > it, and simplicity wrt code, test & support it looks like a good
>> checkpoint
>> >>> > to drop jdk7 support.
>> >>> >
>> >>> > As already mentioned in the thread, existing yarn clusters are
>> >>> > unaffected if they want to continue running jdk7 and yet use spark2
>> (install
>> >>> > jdk8 on all nodes and use it via JAVA_HOME, or worst case
>> distribute jdk8 as
>> >>> > archive - suboptimal).
>> >>>
>> >>> you wouldn't want to dist it as an archive; it's not just the
>> binaries,
>> >>> it's the install phase. And you'd better remember to put the JCE jar
>> in on
>> >>> top of the JDK for kerberos to work.
>> >>>
>> >>> setting up environment vars to point to JDK8 in the launched
>> >>> app/container avoids that. Yes, the ops team do need to install java,
>> but if
>> >>> you offer them the choice of "installing a centrally managed Java" and
>> >>> "having my code try and install it", they should go for the managed
>> option.
>> >>>
>> >>> One thing to consider for 2.0 is to make it easier to set up those env
>> >>> vars for both python and java. And, as the techniques for mixing JDK
>> >>> versions is clearly not that well known, documenting it.
>> >>>
>> >>> (FWIW I've done code which even uploads it's own hadoop-* JAR, but
>> what
>> >>> gets you is changes in the hadoop-native libs; you do need to get the
>> PATH
>> >>> var spot on)
>> >>>
>> >>>
>> >>> > I am unsure about mesos (standalone might be easier upgrade I guess
>> ?).
>> >>> >
>> >>> >
>> >>> > Proposal is for 1.6x line to continue to be supported with critical
>> >>> > fixes; newer features will require 2.x and so jdk8
>> >>> >
>> >>> > Regards
>> >>> > Mridul
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>> ---------------------------------------------------------------------
>> >>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >>> For additional commands, e-mail: dev-help@spark.apache.org
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Michael Gummelt
>> >> Software Engineer
>> >> Mesosphere
>> >
>> >
>> >
>> >
>> > --
>> > Luciano Resende
>> > http://twitter.com/lresende1975
>> > http://lresende.blogspot.com/
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Kostas Sakellis <ko...@cloudera.com>.

Also, +1 on dropping jdk7 in Spark 2.0.

Kostas

On Mon, Mar 28, 2016 at 2:01 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

> Finally got some internal feedback on this, and we're ok with
> requiring people to deploy jdk8 for 2.0, so +1 too.
>
> On Mon, Mar 28, 2016 at 1:15 PM, Luciano Resende <lu...@gmail.com>
> wrote:
> > +1, I also checked with few projects inside IBM that consume Spark and
> they
> > seem to be ok with the direction of droping JDK 7.
> >
> > On Mon, Mar 28, 2016 at 11:24 AM, Michael Gummelt <
> mgummelt@mesosphere.io>
> > wrote:
> >>
> >> +1 from Mesosphere
> >>
> >> On Mon, Mar 28, 2016 at 5:12 AM, Steve Loughran <stevel@hortonworks.com
> >
> >> wrote:
> >>>
> >>>
> >>> > On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com>
> wrote:
> >>> >
> >>> > Removing compatibility (with jdk, etc) can be done with a major
> >>> > release- given that 7 has been EOLed a while back and is now
> unsupported, we
> >>> > have to decide if we drop support for it in 2.0 or 3.0 (2+ years
> from now).
> >>> >
> >>> > Given the functionality & performance benefits of going to jdk8,
> future
> >>> > enhancements relevant in 2.x timeframe ( scala, dependencies) which
> requires
> >>> > it, and simplicity wrt code, test & support it looks like a good
> checkpoint
> >>> > to drop jdk7 support.
> >>> >
> >>> > As already mentioned in the thread, existing yarn clusters are
> >>> > unaffected if they want to continue running jdk7 and yet use spark2
> (install
> >>> > jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute
> jdk8 as
> >>> > archive - suboptimal).
> >>>
> >>> you wouldn't want to dist it as an archive; it's not just the binaries,
> >>> it's the install phase. And you'd better remember to put the JCE jar
> in on
> >>> top of the JDK for kerberos to work.
> >>>
> >>> setting up environment vars to point to JDK8 in the launched
> >>> app/container avoids that. Yes, the ops team do need to install java,
> but if
> >>> you offer them the choice of "installing a centrally managed Java" and
> >>> "having my code try and install it", they should go for the managed
> option.
> >>>
> >>> One thing to consider for 2.0 is to make it easier to set up those env
> >>> vars for both python and java. And, as the techniques for mixing JDK
> >>> versions is clearly not that well known, documenting it.
> >>>
> >>> (FWIW I've done code which even uploads it's own hadoop-* JAR, but what
> >>> gets you is changes in the hadoop-native libs; you do need to get the
> PATH
> >>> var spot on)
> >>>
> >>>
> >>> > I am unsure about mesos (standalone might be easier upgrade I guess
> ?).
> >>> >
> >>> >
> >>> > Proposal is for 1.6x line to continue to be supported with critical
> >>> > fixes; newer features will require 2.x and so jdk8
> >>> >
> >>> > Regards
> >>> > Mridul
> >>> >
> >>> >
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >>> For additional commands, e-mail: dev-help@spark.apache.org
> >>>
> >>
> >>
> >>
> >> --
> >> Michael Gummelt
> >> Software Engineer
> >> Mesosphere
> >
> >
> >
> >
> > --
> > Luciano Resende
> > http://twitter.com/lresende1975
> > http://lresende.blogspot.com/
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

Finally got some internal feedback on this, and we're ok with
requiring people to deploy jdk8 for 2.0, so +1 too.

On Mon, Mar 28, 2016 at 1:15 PM, Luciano Resende <lu...@gmail.com> wrote:
> +1, I also checked with few projects inside IBM that consume Spark and they
> seem to be ok with the direction of droping JDK 7.
>
> On Mon, Mar 28, 2016 at 11:24 AM, Michael Gummelt <mg...@mesosphere.io>
> wrote:
>>
>> +1 from Mesosphere
>>
>> On Mon, Mar 28, 2016 at 5:12 AM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>>>
>>>
>>> > On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com> wrote:
>>> >
>>> > Removing compatibility (with jdk, etc) can be done with a major
>>> > release- given that 7 has been EOLed a while back and is now unsupported, we
>>> > have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>> >
>>> > Given the functionality & performance benefits of going to jdk8, future
>>> > enhancements relevant in 2.x timeframe ( scala, dependencies) which requires
>>> > it, and simplicity wrt code, test & support it looks like a good checkpoint
>>> > to drop jdk7 support.
>>> >
>>> > As already mentioned in the thread, existing yarn clusters are
>>> > unaffected if they want to continue running jdk7 and yet use spark2 (install
>>> > jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>>> > archive - suboptimal).
>>>
>>> you wouldn't want to dist it as an archive; it's not just the binaries,
>>> it's the install phase. And you'd better remember to put the JCE jar in on
>>> top of the JDK for kerberos to work.
>>>
>>> setting up environment vars to point to JDK8 in the launched
>>> app/container avoids that. Yes, the ops team do need to install java, but if
>>> you offer them the choice of "installing a centrally managed Java" and
>>> "having my code try and install it", they should go for the managed option.
>>>
>>> One thing to consider for 2.0 is to make it easier to set up those env
>>> vars for both python and java. And, as the techniques for mixing JDK
>>> versions is clearly not that well known, documenting it.
>>>
>>> (FWIW I've done code which even uploads it's own hadoop-* JAR, but what
>>> gets you is changes in the hadoop-native libs; you do need to get the PATH
>>> var spot on)
>>>
>>>
>>> > I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>> >
>>> >
>>> > Proposal is for 1.6x line to continue to be supported with critical
>>> > fixes; newer features will require 2.x and so jdk8
>>> >
>>> > Regards
>>> > Mridul
>>> >
>>> >
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>
>>
>>
>> --
>> Michael Gummelt
>> Software Engineer
>> Mesosphere
>
>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/



-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Luciano Resende <lu...@gmail.com>.

+1, I also checked with few projects inside IBM that consume Spark and they
seem to be ok with the direction of droping JDK 7.

On Mon, Mar 28, 2016 at 11:24 AM, Michael Gummelt <mg...@mesosphere.io>
wrote:

> +1 from Mesosphere
>
> On Mon, Mar 28, 2016 at 5:12 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>> > On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com> wrote:
>> >
>> > Removing compatibility (with jdk, etc) can be done with a major
>> release- given that 7 has been EOLed a while back and is now unsupported,
>> we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from
>> now).
>> >
>> > Given the functionality & performance benefits of going to jdk8, future
>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>> requires it, and simplicity wrt code, test & support it looks like a good
>> checkpoint to drop jdk7 support.
>> >
>> > As already mentioned in the thread, existing yarn clusters are
>> unaffected if they want to continue running jdk7 and yet use spark2
>> (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>> distribute jdk8 as archive - suboptimal).
>>
>> you wouldn't want to dist it as an archive; it's not just the binaries,
>> it's the install phase. And you'd better remember to put the JCE jar in on
>> top of the JDK for kerberos to work.
>>
>> setting up environment vars to point to JDK8 in the launched
>> app/container avoids that. Yes, the ops team do need to install java, but
>> if you offer them the choice of "installing a centrally managed Java" and
>> "having my code try and install it", they should go for the managed option.
>>
>> One thing to consider for 2.0 is to make it easier to set up those env
>> vars for both python and java. And, as the techniques for mixing JDK
>> versions is clearly not that well known, documenting it.
>>
>> (FWIW I've done code which even uploads it's own hadoop-* JAR, but what
>> gets you is changes in the hadoop-native libs; you do need to get the PATH
>> var spot on)
>>
>>
>> > I am unsure about mesos (standalone might be easier upgrade I guess ?).
>> >
>> >
>> > Proposal is for 1.6x line to continue to be supported with critical
>> fixes; newer features will require 2.x and so jdk8
>> >
>> > Regards
>> > Mridul
>> >
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>
>
> --
> Michael Gummelt
> Software Engineer
> Mesosphere
>



-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Michael Gummelt <mg...@mesosphere.io>.

+1 from Mesosphere

On Mon, Mar 28, 2016 at 5:12 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> > On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com> wrote:
> >
> > Removing compatibility (with jdk, etc) can be done with a major release-
> given that 7 has been EOLed a while back and is now unsupported, we have to
> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
> >
> > Given the functionality & performance benefits of going to jdk8, future
> enhancements relevant in 2.x timeframe ( scala, dependencies) which
> requires it, and simplicity wrt code, test & support it looks like a good
> checkpoint to drop jdk7 support.
> >
> > As already mentioned in the thread, existing yarn clusters are
> unaffected if they want to continue running jdk7 and yet use spark2
> (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
> distribute jdk8 as archive - suboptimal).
>
> you wouldn't want to dist it as an archive; it's not just the binaries,
> it's the install phase. And you'd better remember to put the JCE jar in on
> top of the JDK for kerberos to work.
>
> setting up environment vars to point to JDK8 in the launched app/container
> avoids that. Yes, the ops team do need to install java, but if you offer
> them the choice of "installing a centrally managed Java" and "having my
> code try and install it", they should go for the managed option.
>
> One thing to consider for 2.0 is to make it easier to set up those env
> vars for both python and java. And, as the techniques for mixing JDK
> versions is clearly not that well known, documenting it.
>
> (FWIW I've done code which even uploads it's own hadoop-* JAR, but what
> gets you is changes in the hadoop-native libs; you do need to get the PATH
> var spot on)
>
>
> > I am unsure about mesos (standalone might be easier upgrade I guess ?).
> >
> >
> > Proposal is for 1.6x line to continue to be supported with critical
> fixes; newer features will require 2.x and so jdk8
> >
> > Regards
> > Mridul
> >
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Michael Gummelt
Software Engineer
Mesosphere

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.

> On 25 Mar 2016, at 01:59, Mridul Muralidharan <mr...@gmail.com> wrote:
> 
> Removing compatibility (with jdk, etc) can be done with a major release- given that 7 has been EOLed a while back and is now unsupported, we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
> 
> Given the functionality & performance benefits of going to jdk8, future enhancements relevant in 2.x timeframe ( scala, dependencies) which requires it, and simplicity wrt code, test & support it looks like a good checkpoint to drop jdk7 support.
> 
> As already mentioned in the thread, existing yarn clusters are unaffected if they want to continue running jdk7 and yet use spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as archive - suboptimal).

you wouldn't want to dist it as an archive; it's not just the binaries, it's the install phase. And you'd better remember to put the JCE jar in on top of the JDK for kerberos to work.

setting up environment vars to point to JDK8 in the launched app/container avoids that. Yes, the ops team do need to install java, but if you offer them the choice of "installing a centrally managed Java" and "having my code try and install it", they should go for the managed option.

One thing to consider for 2.0 is to make it easier to set up those env vars for both python and java. And, as the techniques for mixing JDK versions is clearly not that well known, documenting it. 

(FWIW I've done code which even uploads it's own hadoop-* JAR, but what gets you is changes in the hadoop-native libs; you do need to get the PATH var spot on)

> I am unsure about mesos (standalone might be easier upgrade I guess ?).
> 
> 
> Proposal is for 1.6x line to continue to be supported with critical fixes; newer features will require 2.x and so jdk8
> 
> Regards 
> Mridul 
> 
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.

On 4 Apr 2016, at 16:41, Xuefeng Wu <be...@gmail.com>> wrote:


Many open source projects are aggressive, such as Oracle JDK and Ubuntu, But they provide stable commercial supporting.


supporting old versions of jdk is one of the key revenue streams for oracle's sun group: there's a lot of webapps out there which need a secure/stable JDK version, and whose owners don't want to spend time & money doing the upgrade.


In other words, the enterprises doesn't drop JDK7, might aslo do not drop Spark 1.x to adopt Spark 2.x early version.



probably true, except for the complication that in a large multitenant cluster, you need to get everyone who runs code in the cluster and the ops team happy with the plan

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Xuefeng Wu <be...@gmail.com>.

Many open source projects are aggressive, such as Oracle JDK and Ubuntu, But they provide stable commercial supporting.

In other words, the enterprises doesn't drop JDK7, might aslo do not drop Spark 1.x to adopt Spark 2.x early version.

On Sun, Apr 3, 2016 at 10:29 PM -0700, "Reynold Xin" <rx...@databricks.com> wrote:

Since my original email, I've talked to a lot more users and looked at what various environments support. It is true that a lot of enterprises, and even some technology companies, are still using Java 7. One thing is that up until this date, users still can't install openjdk 8 on Ubuntu by default. I see that as an indication that it is too early to drop Java 7.

Looking at the timeline, JDK release a major new version roughly every 3 years. We dropped Java 6 support one year ago, so from a timeline point of view we would be very aggressive here if we were to drop Java 7 support in Spark 2.0.
Note that not dropping Java 7 support now doesn't mean we have to support Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even though Spark 1.0 started with Java 6.
In terms of testing, Josh has actually improved our test infra so now we would run the Java 8 tests: https://github.com/apache/spark/pull/12073

On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:

Arguments are really convincing; new Dataset API as well as performance

improvements is exiting, so I'm personally +1 on moving onto Java8.

However, I'm afraid Tencent is one of "the organizations stuck with Java7"

-- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and

wouldn't upgrade to Java8 until Java9 is out.

So: 

(non-binding) +1 on dropping scala 2.10 support

(non-binding)  -1 on dropping Java 7 support

                      * as long as we figure out a practical way to run Spark with

                        JDK8 on JDK7 clusters, this -1 would then definitely be +1

Thanks !
On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com> wrote:
i think that logic is reasonable, but then the same should also apply to scala 2.10, which is also unmaintained/unsupported at this point (basically has been since march 2015 except for one hotfix due to a license incompatibility)

who wants to support scala 2.10 three years after they did the last maintenance release?

On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com> wrote:
Removing compatibility (with jdk, etc) can be done with a major release- given that 7 has been EOLed a while back and is now unsupported, we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
Given the functionality & performance benefits of going to jdk8, future enhancements relevant in 2.x timeframe ( scala, dependencies) which requires it, and simplicity wrt code, test & support it looks like a good checkpoint to drop jdk7 support.
As already mentioned in the thread, existing yarn clusters are unaffected if they want to continue running jdk7 and yet use spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as archive - suboptimal).I am unsure about mesos (standalone might be easier upgrade I guess ?).

Proposal is for 1.6x line to continue to be supported with critical fixes; newer features will require 2.x and so jdk8
Regards Mridul 

On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:
On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com> wrote:

> If you want to go down that route, you should also ask somebody who has had

> experience managing a large organization's applications and try to update

> Scala version.

I understand both sides. But if you look at what I've been asking

since the beginning, it's all about the cost and benefits of dropping

support for java 1.7.

The biggest argument in your original e-mail is about testing. And the

testing cost is much bigger for supporting scala 2.10 than it is for

supporting java 1.7. If you read one of my earlier replies, it should

be even possible to just do everything in a single job - compile for

java 7 and still be able to test things in 1.8, including lambdas,

which seems to be the main thing you were worried about.

> On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

>>

>> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:

>> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than

>> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are

>> > not

>> > binary compatible, whereas JVM 7 and 8 are binary compatible except

>> > certain

>> > esoteric cases.

>>

>> True, but ask anyone who manages a large cluster how long it would

>> take them to upgrade the jdk across their cluster and validate all

>> their applications and everything... binary compatibility is a tiny

>> drop in that bucket.

>>

>> --

>> Marcelo

>

>

--

Marcelo

---------------------------------------------------------------------

To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org

For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Karlis Zigurs <ho...@gmail.com>.

Curveball: Is there a need to use lambdas quite yet?

On Mon, Apr 4, 2016 at 10:58 PM, Ofir Manor <of...@equalum.io> wrote:
> I think that a backup plan could be to announce that JDK7 is deprecated in
> Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives
> admins enough warning to install JDK8 along side their "main" JDK (or fully
> migrate to it), while allowing the project to merge JDK8-specific changes to
> trunk right after the 2.0 release.
>
> However, I personally think it is better to drop JDK7 now. I'm sure that
> both the community and the distributors (Databricks, Cloudera, Hortonworks,
> MapR, IBM etc) will all rush to help their customers migrate their
> environment to support Spark 2.0, so I think any backlash won't be dramatic
> or lasting.
>
> Just my two cents,
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io
>
>
> On Mon, Apr 4, 2016 at 6:48 PM, Luciano Resende <lu...@gmail.com>
> wrote:
>>
>> Reynold,
>>
>> Considering the performance improvements you mentioned in your original
>> e-mail and also considering that few other big data projects have already or
>> are in progress of abandoning JDK 7, I think it would benefit Spark if we go
>> with JDK 8.0 only.
>>
>> Are there users that will be less aggressive ? Yes, but those would most
>> likely be in more stable releases like 1.6.x.
>>
>> On Sun, Apr 3, 2016 at 10:28 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>
>>> Since my original email, I've talked to a lot more users and looked at
>>> what various environments support. It is true that a lot of enterprises, and
>>> even some technology companies, are still using Java 7. One thing is that up
>>> until this date, users still can't install openjdk 8 on Ubuntu by default. I
>>> see that as an indication that it is too early to drop Java 7.
>>>
>>> Looking at the timeline, JDK release a major new version roughly every 3
>>> years. We dropped Java 6 support one year ago, so from a timeline point of
>>> view we would be very aggressive here if we were to drop Java 7 support in
>>> Spark 2.0.
>>>
>>> Note that not dropping Java 7 support now doesn't mean we have to support
>>> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
>>> though Spark 1.0 started with Java 6.
>>>
>>> In terms of testing, Josh has actually improved our test infra so now we
>>> would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>>>
>>>
>>>
>>>
>>> On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:
>>>>
>>>> Arguments are really convincing; new Dataset API as well as performance
>>>>
>>>> improvements is exiting, so I'm personally +1 on moving onto Java8.
>>>>
>>>>
>>>>
>>>> However, I'm afraid Tencent is one of "the organizations stuck with
>>>> Java7"
>>>>
>>>> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out,
>>>> and
>>>>
>>>> wouldn't upgrade to Java8 until Java9 is out.
>>>>
>>>>
>>>> So:
>>>>
>>>> (non-binding) +1 on dropping scala 2.10 support
>>>>
>>>> (non-binding)  -1 on dropping Java 7 support
>>>>
>>>>                       * as long as we figure out a practical way to run
>>>> Spark with
>>>>
>>>>                         JDK8 on JDK7 clusters, this -1 would then
>>>> definitely be +1
>>>>
>>>>
>>>> Thanks !
>>>>
>>>>
>>>> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>>>
>>>>> i think that logic is reasonable, but then the same should also apply
>>>>> to scala 2.10, which is also unmaintained/unsupported at this point
>>>>> (basically has been since march 2015 except for one hotfix due to a license
>>>>> incompatibility)
>>>>>
>>>>> who wants to support scala 2.10 three years after they did the last
>>>>> maintenance release?
>>>>>
>>>>>
>>>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>>>>> wrote:
>>>>>>
>>>>>> Removing compatibility (with jdk, etc) can be done with a major
>>>>>> release- given that 7 has been EOLed a while back and is now unsupported, we
>>>>>> have to decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>>>>>
>>>>>> Given the functionality & performance benefits of going to jdk8,
>>>>>> future enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>>>>> requires it, and simplicity wrt code, test & support it looks like a good
>>>>>> checkpoint to drop jdk7 support.
>>>>>>
>>>>>> As already mentioned in the thread, existing yarn clusters are
>>>>>> unaffected if they want to continue running jdk7 and yet use spark2 (install
>>>>>> jdk8 on all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>>>>>> archive - suboptimal).
>>>>>> I am unsure about mesos (standalone might be easier upgrade I guess
>>>>>> ?).
>>>>>>
>>>>>>
>>>>>> Proposal is for 1.6x line to continue to be supported with critical
>>>>>> fixes; newer features will require 2.x and so jdk8
>>>>>>
>>>>>> Regards
>>>>>> Mridul
>>>>>>
>>>>>>
>>>>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>>>>> wrote:
>>>>>>> > If you want to go down that route, you should also ask somebody who
>>>>>>> > has had
>>>>>>> > experience managing a large organization's applications and try to
>>>>>>> > update
>>>>>>> > Scala version.
>>>>>>>
>>>>>>> I understand both sides. But if you look at what I've been asking
>>>>>>> since the beginning, it's all about the cost and benefits of dropping
>>>>>>> support for java 1.7.
>>>>>>>
>>>>>>> The biggest argument in your original e-mail is about testing. And
>>>>>>> the
>>>>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>>>>> be even possible to just do everything in a single job - compile for
>>>>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>>>>> which seems to be the main thing you were worried about.
>>>>>>>
>>>>>>>
>>>>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin
>>>>>>> > <va...@cloudera.com> wrote:
>>>>>>> >>
>>>>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>>>>> >> wrote:
>>>>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
>>>>>>> >> > than
>>>>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and
>>>>>>> >> > 2.11 are
>>>>>>> >> > not
>>>>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible
>>>>>>> >> > except
>>>>>>> >> > certain
>>>>>>> >> > esoteric cases.
>>>>>>> >>
>>>>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>>>>> >> their applications and everything... binary compatibility is a
>>>>>>> >> tiny
>>>>>>> >> drop in that bucket.
>>>>>>> >>
>>>>>>> >> --
>>>>>>> >> Marcelo
>>>>>>> >
>>>>>>> >
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Marcelo
>>>>>>>
>>>>>>> ---------------------------------------------------------------------
>>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.

> On 4 Apr 2016, at 20:58, Ofir Manor <of...@equalum.io> wrote:
> 
> I think that a backup plan could be to announce that JDK7 is deprecated in Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives admins enough warning to install JDK8 along side their "main" JDK (or fully migrate to it), while allowing the project to merge JDK8-specific changes to trunk right after the 2.0 release.
> 

Announcing a plan is good; anything which can be done to help mixed JVM deployment (documentation, testing) would be useful too

> However, I personally think it is better to drop JDK7 now. I'm sure that both the community and the distributors (Databricks, Cloudera, Hortonworks, MapR, IBM etc) will all rush to help their customers migrate their environment to support Spark 2.0, so I think any backlash won't be dramatic or lasting. 
> 

People using Spark tend to be pretty aggressive about wanting the latest version, at least on the 1.x line; so far there've been no major problems allowing mixed spark version deployments, provided shared bits of infrastructure (spark history server) were recent. Hive metadata repository access is the other big issue: moving spark up to hive  1.2.1 addresses that for the moment.

I don't know about organisations adoption of JDK8 vs 7; or how anyone would react to having to move to java 8 for spark 2. Maybe it'll be a barrier to adoption —maybe it'll be an incentive to upgrade.

Oh, I do know that Java 9 is going to be trouble. Different topic.

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Ofir Manor <of...@equalum.io>.

I think that a backup plan could be to announce that JDK7 is deprecated in
Spark 2.0 and support for it will be fully removed in Spark 2.1. This gives
admins enough warning to install JDK8 along side their "main" JDK (or fully
migrate to it), while allowing the project to merge JDK8-specific changes
to trunk right after the 2.0 release.

However, I personally think it is better to drop JDK7 now. I'm sure that
both the community and the distributors (Databricks, Cloudera, Hortonworks,
MapR, IBM etc) will all rush to help their customers migrate their
environment to support Spark 2.0, so I think any backlash won't be dramatic
or lasting.

Just my two cents,

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io

On Mon, Apr 4, 2016 at 6:48 PM, Luciano Resende <lu...@gmail.com>
wrote:

> Reynold,
>
> Considering the performance improvements you mentioned in your original
> e-mail and also considering that few other big data projects have already
> or are in progress of abandoning JDK 7, I think it would benefit Spark if
> we go with JDK 8.0 only.
>
> Are there users that will be less aggressive ? Yes, but those would most
> likely be in more stable releases like 1.6.x.
>
> On Sun, Apr 3, 2016 at 10:28 PM, Reynold Xin <rx...@databricks.com> wrote:
>
>> Since my original email, I've talked to a lot more users and looked at
>> what various environments support. It is true that a lot of enterprises,
>> and even some technology companies, are still using Java 7. One thing is
>> that up until this date, users still can't install openjdk 8 on Ubuntu by
>> default. I see that as an indication that it is too early to drop Java 7.
>>
>> Looking at the timeline, JDK release a major new version roughly every 3
>> years. We dropped Java 6 support one year ago, so from a timeline point of
>> view we would be very aggressive here if we were to drop Java 7 support in
>> Spark 2.0.
>>
>> Note that not dropping Java 7 support now doesn't mean we have to support
>> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
>> though Spark 1.0 started with Java 6.
>>
>> In terms of testing, Josh has actually improved our test infra so now we
>> would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>>
>>
>>
>>
>> On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:
>>
>>> Arguments are really convincing; new Dataset API as well as performance
>>>
>>> improvements is exiting, so I'm personally +1 on moving onto Java8.
>>>
>>>
>>>
>>> However, I'm afraid Tencent is one of "the organizations stuck with
>>> Java7"
>>>
>>> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out,
>>> and
>>>
>>> wouldn't upgrade to Java8 until Java9 is out.
>>>
>>>
>>> So:
>>>
>>> (non-binding) +1 on dropping scala 2.10 support
>>>
>>> (non-binding)  -1 on dropping Java 7 support
>>>
>>>                       * as long as we figure out a practical way to run
>>> Spark with
>>>
>>>                         JDK8 on JDK7 clusters, this -1 would then
>>> definitely be +1
>>>
>>>
>>> Thanks !
>>>
>>> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com>
>>> wrote:
>>>
>>>> i think that logic is reasonable, but then the same should also apply
>>>> to scala 2.10, which is also unmaintained/unsupported at this point
>>>> (basically has been since march 2015 except for one hotfix due to a license
>>>> incompatibility)
>>>>
>>>> who wants to support scala 2.10 three years after they did the last
>>>> maintenance release?
>>>>
>>>>
>>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>>>> wrote:
>>>>
>>>>> Removing compatibility (with jdk, etc) can be done with a major
>>>>> release- given that 7 has been EOLed a while back and is now unsupported,
>>>>> we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from
>>>>> now).
>>>>>
>>>>> Given the functionality & performance benefits of going to jdk8,
>>>>> future enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>>>> requires it, and simplicity wrt code, test & support it looks like a good
>>>>> checkpoint to drop jdk7 support.
>>>>>
>>>>> As already mentioned in the thread, existing yarn clusters are
>>>>> unaffected if they want to continue running jdk7 and yet use
>>>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>>>> distribute jdk8 as archive - suboptimal).
>>>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>>>
>>>>>
>>>>> Proposal is for 1.6x line to continue to be supported with critical
>>>>> fixes; newer features will require 2.x and so jdk8
>>>>>
>>>>> Regards
>>>>> Mridul
>>>>>
>>>>>
>>>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com>
>>>>> wrote:
>>>>>
>>>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>>>> wrote:
>>>>>> > If you want to go down that route, you should also ask somebody who
>>>>>> has had
>>>>>> > experience managing a large organization's applications and try to
>>>>>> update
>>>>>> > Scala version.
>>>>>>
>>>>>> I understand both sides. But if you look at what I've been asking
>>>>>> since the beginning, it's all about the cost and benefits of dropping
>>>>>> support for java 1.7.
>>>>>>
>>>>>> The biggest argument in your original e-mail is about testing. And the
>>>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>>>> be even possible to just do everything in a single job - compile for
>>>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>>>> which seems to be the main thing you were worried about.
>>>>>>
>>>>>>
>>>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <
>>>>>> vanzin@cloudera.com> wrote:
>>>>>> >>
>>>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>>>> wrote:
>>>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
>>>>>> than
>>>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and
>>>>>> 2.11 are
>>>>>> >> > not
>>>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible
>>>>>> except
>>>>>> >> > certain
>>>>>> >> > esoteric cases.
>>>>>> >>
>>>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>>>> >> their applications and everything... binary compatibility is a tiny
>>>>>> >> drop in that bucket.
>>>>>> >>
>>>>>> >> --
>>>>>> >> Marcelo
>>>>>> >
>>>>>> >
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Marcelo
>>>>>>
>>>>>> ---------------------------------------------------------------------
>>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>>
>>>>>>
>>>>
>>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Luciano Resende <lu...@gmail.com>.

Reynold,

Considering the performance improvements you mentioned in your original
e-mail and also considering that few other big data projects have already
or are in progress of abandoning JDK 7, I think it would benefit Spark if
we go with JDK 8.0 only.

Are there users that will be less aggressive ? Yes, but those would most
likely be in more stable releases like 1.6.x.

On Sun, Apr 3, 2016 at 10:28 PM, Reynold Xin <rx...@databricks.com> wrote:

> Since my original email, I've talked to a lot more users and looked at
> what various environments support. It is true that a lot of enterprises,
> and even some technology companies, are still using Java 7. One thing is
> that up until this date, users still can't install openjdk 8 on Ubuntu by
> default. I see that as an indication that it is too early to drop Java 7.
>
> Looking at the timeline, JDK release a major new version roughly every 3
> years. We dropped Java 6 support one year ago, so from a timeline point of
> view we would be very aggressive here if we were to drop Java 7 support in
> Spark 2.0.
>
> Note that not dropping Java 7 support now doesn't mean we have to support
> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
> though Spark 1.0 started with Java 6.
>
> In terms of testing, Josh has actually improved our test infra so now we
> would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>
>
>
>
> On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:
>
>> Arguments are really convincing; new Dataset API as well as performance
>>
>> improvements is exiting, so I'm personally +1 on moving onto Java8.
>>
>>
>>
>> However, I'm afraid Tencent is one of "the organizations stuck with
>> Java7"
>>
>> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and
>>
>> wouldn't upgrade to Java8 until Java9 is out.
>>
>>
>> So:
>>
>> (non-binding) +1 on dropping scala 2.10 support
>>
>> (non-binding)  -1 on dropping Java 7 support
>>
>>                       * as long as we figure out a practical way to run
>> Spark with
>>
>>                         JDK8 on JDK7 clusters, this -1 would then
>> definitely be +1
>>
>>
>> Thanks !
>>
>> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>>> i think that logic is reasonable, but then the same should also apply to
>>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>>> has been since march 2015 except for one hotfix due to a license
>>> incompatibility)
>>>
>>> who wants to support scala 2.10 three years after they did the last
>>> maintenance release?
>>>
>>>
>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>>> wrote:
>>>
>>>> Removing compatibility (with jdk, etc) can be done with a major
>>>> release- given that 7 has been EOLed a while back and is now unsupported,
>>>> we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from
>>>> now).
>>>>
>>>> Given the functionality & performance benefits of going to jdk8, future
>>>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>>> requires it, and simplicity wrt code, test & support it looks like a good
>>>> checkpoint to drop jdk7 support.
>>>>
>>>> As already mentioned in the thread, existing yarn clusters are
>>>> unaffected if they want to continue running jdk7 and yet use
>>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>>> distribute jdk8 as archive - suboptimal).
>>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>>
>>>>
>>>> Proposal is for 1.6x line to continue to be supported with critical
>>>> fixes; newer features will require 2.x and so jdk8
>>>>
>>>> Regards
>>>> Mridul
>>>>
>>>>
>>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com>
>>>> wrote:
>>>>
>>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>> > If you want to go down that route, you should also ask somebody who
>>>>> has had
>>>>> > experience managing a large organization's applications and try to
>>>>> update
>>>>> > Scala version.
>>>>>
>>>>> I understand both sides. But if you look at what I've been asking
>>>>> since the beginning, it's all about the cost and benefits of dropping
>>>>> support for java 1.7.
>>>>>
>>>>> The biggest argument in your original e-mail is about testing. And the
>>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>>> be even possible to just do everything in a single job - compile for
>>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>>> which seems to be the main thing you were worried about.
>>>>>
>>>>>
>>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
>>>>> than
>>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and
>>>>> 2.11 are
>>>>> >> > not
>>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible
>>>>> except
>>>>> >> > certain
>>>>> >> > esoteric cases.
>>>>> >>
>>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>>> >> their applications and everything... binary compatibility is a tiny
>>>>> >> drop in that bucket.
>>>>> >>
>>>>> >> --
>>>>> >> Marcelo
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>
>>>>>
>>>
>>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

Hi Sean,

See http://www.oracle.com/technetwork/java/eol-135779.html

Java 7 hasn't EOLed yet. If you look at support you can get from Oracle,
it's actually goes to 2019. And you can even get more support after that.

Spark has always maintained great backward compatibility with other
systems, way beyond what vendors typically support. For example, we
supported Hadoop 1.x all the way until Spark 1.6 (basically the last
release), while all the vendors have dropped support for them already.

Putting my Databricks hat on we actually only support Java 8, but I think
it would be great to still support Java 7 in the upstream release for some
larger deployments. I like the idea of deprecating or at least strongly
encouraging people to update.

On Tuesday, April 5, 2016, Sean Owen <so...@cloudera.com> wrote:

> Following
> https://github.com/apache/spark/pull/12165#issuecomment-205791222
> I'd like to make a point about process and then answer points below.
>
> We have this funny system where anyone can propose a change, and any
> of a few people can veto a change unilaterally. The latter rarely
> comes up. 9 changes out of 10 nobody disagrees on; sometimes a
> committer will say 'no' to a change and nobody else with that bit
> disagrees.
>
> Sometimes it matters and here I see, what, 4 out of 5 people including
> committers supporting a particular change. A veto to oppose that is
> pretty drastic. It's not something to use because you or customers
> prefer a certain outcome. This reads like you're informing people
> you've changed your mind and that's the decision, when it can't work
> that way. I saw this happen to a lesser extent in the thread about
> Scala 2.10.
>
> It doesn't mean majority rules here either, but can I suggest you
> instead counter-propose an outcome that the people here voting in
> favor of what you're vetoing would probably also buy into? I bet
> everyone's willing to give wide accommodation to your concerns. It's
> probably not hard, like: let's plan to not support Java 7 in Spark
> 2.1.0. (Then we can debate the logic of that.)
>
> On Mon, Apr 4, 2016 at 6:28 AM, Reynold Xin <rxin@databricks.com
> <javascript:;>> wrote:
> > some technology companies, are still using Java 7. One thing is that up
> > until this date, users still can't install openjdk 8 on Ubuntu by
> default. I
> > see that as an indication that it is too early to drop Java 7.
>
> I have Java 8 on my Ubuntu instance, and installed it directly via apt-get.
> http://openjdk.java.net/install/
>
>
> > Looking at the timeline, JDK release a major new version roughly every 3
> > years. We dropped Java 6 support one year ago, so from a timeline point
> of
> > view we would be very aggressive here if we were to drop Java 7 support
> in
> > Spark 2.0.
>
> The metric is really (IMHO) when the JDK goes EOL. Java 6 was EOL in
> Feb 2013, so supporting it into Spark 1.x was probably too long. Java
> 7 was EOL in April 2015. It's not really somehow every ~3 years.
>
>
> > Note that not dropping Java 7 support now doesn't mean we have to support
> > Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
> > though Spark 1.0 started with Java 6.
>
> Whatever arguments one has about preventing people from updating to
> the latest and greatest then apply to a *minor* release, which is
> worse. Java 6 support was probably overdue for removal at 1.0;
> better-late-than-never, not necessarily the right time to do it.
>
>
> > In terms of testing, Josh has actually improved our test infra so now we
> > would run the Java 8 tests: https://github.com/apache/spark/pull/12073
>
> Excellent, but, orthogonal.
>
> Even if I personally don't see the merit in these arguments compared
> to the counter-arguments, retaining Java 7 support now wouldn't be a
> terrible outcome. I'd like to see better process and a more reasonable
> compromise result though.
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Sean Owen <so...@cloudera.com>.

Following https://github.com/apache/spark/pull/12165#issuecomment-205791222
I'd like to make a point about process and then answer points below.

We have this funny system where anyone can propose a change, and any
of a few people can veto a change unilaterally. The latter rarely
comes up. 9 changes out of 10 nobody disagrees on; sometimes a
committer will say 'no' to a change and nobody else with that bit
disagrees.

Sometimes it matters and here I see, what, 4 out of 5 people including
committers supporting a particular change. A veto to oppose that is
pretty drastic. It's not something to use because you or customers
prefer a certain outcome. This reads like you're informing people
you've changed your mind and that's the decision, when it can't work
that way. I saw this happen to a lesser extent in the thread about
Scala 2.10.

It doesn't mean majority rules here either, but can I suggest you
instead counter-propose an outcome that the people here voting in
favor of what you're vetoing would probably also buy into? I bet
everyone's willing to give wide accommodation to your concerns. It's
probably not hard, like: let's plan to not support Java 7 in Spark
2.1.0. (Then we can debate the logic of that.)

On Mon, Apr 4, 2016 at 6:28 AM, Reynold Xin <rx...@databricks.com> wrote:
> some technology companies, are still using Java 7. One thing is that up
> until this date, users still can't install openjdk 8 on Ubuntu by default. I
> see that as an indication that it is too early to drop Java 7.

I have Java 8 on my Ubuntu instance, and installed it directly via apt-get.
http://openjdk.java.net/install/

> Looking at the timeline, JDK release a major new version roughly every 3
> years. We dropped Java 6 support one year ago, so from a timeline point of
> view we would be very aggressive here if we were to drop Java 7 support in
> Spark 2.0.

The metric is really (IMHO) when the JDK goes EOL. Java 6 was EOL in
Feb 2013, so supporting it into Spark 1.x was probably too long. Java
7 was EOL in April 2015. It's not really somehow every ~3 years.

> Note that not dropping Java 7 support now doesn't mean we have to support
> Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
> though Spark 1.0 started with Java 6.

Whatever arguments one has about preventing people from updating to
the latest and greatest then apply to a *minor* release, which is
worse. Java 6 support was probably overdue for removal at 1.0;
better-late-than-never, not necessarily the right time to do it.

> In terms of testing, Josh has actually improved our test infra so now we
> would run the Java 8 tests: https://github.com/apache/spark/pull/12073

Excellent, but, orthogonal.

Even if I personally don't see the merit in these arguments compared
to the counter-arguments, retaining Java 7 support now wouldn't be a
terrible outcome. I'd like to see better process and a more reasonable
compromise result though.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

Since my original email, I've talked to a lot more users and looked at what
various environments support. It is true that a lot of enterprises, and
even some technology companies, are still using Java 7. One thing is that
up until this date, users still can't install openjdk 8 on Ubuntu by
default. I see that as an indication that it is too early to drop Java 7.

Looking at the timeline, JDK release a major new version roughly every 3
years. We dropped Java 6 support one year ago, so from a timeline point of
view we would be very aggressive here if we were to drop Java 7 support in
Spark 2.0.

Note that not dropping Java 7 support now doesn't mean we have to support
Java 7 throughout Spark 2.x. We dropped Java 6 support in Spark 1.5, even
though Spark 1.0 started with Java 6.

In terms of testing, Josh has actually improved our test infra so now we
would run the Java 8 tests: https://github.com/apache/spark/pull/12073




On Thu, Mar 24, 2016 at 8:51 PM, Liwei Lin <lw...@gmail.com> wrote:

> Arguments are really convincing; new Dataset API as well as performance
>
> improvements is exiting, so I'm personally +1 on moving onto Java8.
>
>
>
> However, I'm afraid Tencent is one of "the organizations stuck with Java7"
>
> -- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and
>
> wouldn't upgrade to Java8 until Java9 is out.
>
>
> So:
>
> (non-binding) +1 on dropping scala 2.10 support
>
> (non-binding)  -1 on dropping Java 7 support
>
>                       * as long as we figure out a practical way to run
> Spark with
>
>                         JDK8 on JDK7 clusters, this -1 would then
> definitely be +1
>
>
> Thanks !
>
> On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i think that logic is reasonable, but then the same should also apply to
>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>> has been since march 2015 except for one hotfix due to a license
>> incompatibility)
>>
>> who wants to support scala 2.10 three years after they did the last
>> maintenance release?
>>
>>
>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>>> Removing compatibility (with jdk, etc) can be done with a major release-
>>> given that 7 has been EOLed a while back and is now unsupported, we have to
>>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>>
>>> Given the functionality & performance benefits of going to jdk8, future
>>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>> requires it, and simplicity wrt code, test & support it looks like a good
>>> checkpoint to drop jdk7 support.
>>>
>>> As already mentioned in the thread, existing yarn clusters are
>>> unaffected if they want to continue running jdk7 and yet use
>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>> distribute jdk8 as archive - suboptimal).
>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>
>>>
>>> Proposal is for 1.6x line to continue to be supported with critical
>>> fixes; newer features will require 2.x and so jdk8
>>>
>>> Regards
>>> Mridul
>>>
>>>
>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>
>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>> > If you want to go down that route, you should also ask somebody who
>>>> has had
>>>> > experience managing a large organization's applications and try to
>>>> update
>>>> > Scala version.
>>>>
>>>> I understand both sides. But if you look at what I've been asking
>>>> since the beginning, it's all about the cost and benefits of dropping
>>>> support for java 1.7.
>>>>
>>>> The biggest argument in your original e-mail is about testing. And the
>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>> be even possible to just do everything in a single job - compile for
>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>> which seems to be the main thing you were worried about.
>>>>
>>>>
>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>>> wrote:
>>>> >>
>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>>>> are
>>>> >> > not
>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>>>> >> > certain
>>>> >> > esoteric cases.
>>>> >>
>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>> >> their applications and everything... binary compatibility is a tiny
>>>> >> drop in that bucket.
>>>> >>
>>>> >> --
>>>> >> Marcelo
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Liwei Lin <lw...@gmail.com>.

Arguments are really convincing; new Dataset API as well as performance

improvements is exiting, so I'm personally +1 on moving onto Java8.



However, I'm afraid Tencent is one of "the organizations stuck with Java7"

-- our IT Infra division wouldn't upgrade to Java7 until Java8 is out, and

wouldn't upgrade to Java8 until Java9 is out.


So:

(non-binding) +1 on dropping scala 2.10 support

(non-binding)  -1 on dropping Java 7 support

                      * as long as we figure out a practical way to run
Spark with

                        JDK8 on JDK7 clusters, this -1 would then
definitely be +1


Thanks !

On Fri, Mar 25, 2016 at 10:28 AM, Koert Kuipers <ko...@tresata.com> wrote:

> i think that logic is reasonable, but then the same should also apply to
> scala 2.10, which is also unmaintained/unsupported at this point (basically
> has been since march 2015 except for one hotfix due to a license
> incompatibility)
>
> who wants to support scala 2.10 three years after they did the last
> maintenance release?
>
>
> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
> wrote:
>
>> Removing compatibility (with jdk, etc) can be done with a major release-
>> given that 7 has been EOLed a while back and is now unsupported, we have to
>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>
>> Given the functionality & performance benefits of going to jdk8, future
>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>> requires it, and simplicity wrt code, test & support it looks like a good
>> checkpoint to drop jdk7 support.
>>
>> As already mentioned in the thread, existing yarn clusters are unaffected
>> if they want to continue running jdk7 and yet use spark2 (install jdk8 on
>> all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>> archive - suboptimal).
>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>
>>
>> Proposal is for 1.6x line to continue to be supported with critical
>> fixes; newer features will require 2.x and so jdk8
>>
>> Regards
>> Mridul
>>
>>
>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:
>>
>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>> > If you want to go down that route, you should also ask somebody who
>>> has had
>>> > experience managing a large organization's applications and try to
>>> update
>>> > Scala version.
>>>
>>> I understand both sides. But if you look at what I've been asking
>>> since the beginning, it's all about the cost and benefits of dropping
>>> support for java 1.7.
>>>
>>> The biggest argument in your original e-mail is about testing. And the
>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>> supporting java 1.7. If you read one of my earlier replies, it should
>>> be even possible to just do everything in a single job - compile for
>>> java 7 and still be able to test things in 1.8, including lambdas,
>>> which seems to be the main thing you were worried about.
>>>
>>>
>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>> wrote:
>>> >>
>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>>> are
>>> >> > not
>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>>> >> > certain
>>> >> > esoteric cases.
>>> >>
>>> >> True, but ask anyone who manages a large cluster how long it would
>>> >> take them to upgrade the jdk across their cluster and validate all
>>> >> their applications and everything... binary compatibility is a tiny
>>> >> drop in that bucket.
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Mridul Muralidharan <mr...@gmail.com>.

I do agree w.r.t scala 2.10 as well; similar arguments apply (though there
is a nuanced diff - source compatibility for scala vs binary compatibility
wrt Java)
Was there a proposal which did not go through ? Not sure if I missed it.

Regards
Mridul

On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:

> i think that logic is reasonable, but then the same should also apply to
> scala 2.10, which is also unmaintained/unsupported at this point (basically
> has been since march 2015 except for one hotfix due to a license
> incompatibility)
>
> who wants to support scala 2.10 three years after they did the last
> maintenance release?
>
>
> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mridul@gmail.com
> <javascript:_e(%7B%7D,'cvml','mridul@gmail.com');>> wrote:
>
>> Removing compatibility (with jdk, etc) can be done with a major release-
>> given that 7 has been EOLed a while back and is now unsupported, we have to
>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>
>> Given the functionality & performance benefits of going to jdk8, future
>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>> requires it, and simplicity wrt code, test & support it looks like a good
>> checkpoint to drop jdk7 support.
>>
>> As already mentioned in the thread, existing yarn clusters are unaffected
>> if they want to continue running jdk7 and yet use spark2 (install jdk8 on
>> all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>> archive - suboptimal).
>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>
>>
>> Proposal is for 1.6x line to continue to be supported with critical
>> fixes; newer features will require 2.x and so jdk8
>>
>> Regards
>> Mridul
>>
>>
>> On Thursday, March 24, 2016, Marcelo Vanzin <vanzin@cloudera.com
>> <javascript:_e(%7B%7D,'cvml','vanzin@cloudera.com');>> wrote:
>>
>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>> > If you want to go down that route, you should also ask somebody who
>>> has had
>>> > experience managing a large organization's applications and try to
>>> update
>>> > Scala version.
>>>
>>> I understand both sides. But if you look at what I've been asking
>>> since the beginning, it's all about the cost and benefits of dropping
>>> support for java 1.7.
>>>
>>> The biggest argument in your original e-mail is about testing. And the
>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>> supporting java 1.7. If you read one of my earlier replies, it should
>>> be even possible to just do everything in a single job - compile for
>>> java 7 and still be able to test things in 1.8, including lambdas,
>>> which seems to be the main thing you were worried about.
>>>
>>>
>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>> wrote:
>>> >>
>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>>> are
>>> >> > not
>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>>> >> > certain
>>> >> > esoteric cases.
>>> >>
>>> >> True, but ask anyone who manages a large cluster how long it would
>>> >> take them to upgrade the jdk across their cluster and validate all
>>> >> their applications and everything... binary compatibility is a tiny
>>> >> drop in that bucket.
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

i asked around a little, and the general trend at our clients seems to be
that they plan to upgrade the clusters to java 8 within the year.

so with that in mind i wish this was a little later (i would have preferred
a java-8-only spark at the end of year). but since a major spark version
only comes around every so often, i guess it makes sense to make the jump
now. so:
+1 on dropping java 7
+1 on dropping scala 2.10

i would especially like to point out (as others have before me) that nobody
has come in and said they actually need scala 2.10 support, so that seems
like the easiest choice to me of all.

On Fri, Mar 25, 2016 at 10:03 AM, Andrew Ray <ra...@gmail.com> wrote:

> +1 on removing Java 7 and Scala 2.10 support.
>
> It looks to be entirely possible to support Java 8 containers in a YARN
> cluster otherwise running Java 7 (example code for alt JAVA_HOME
> https://issues.apache.org/jira/secure/attachment/12671739/YARN-1964.patch)
> so really there should be no big problem. Even if that somehow doesn't work
> I'm still +1 as the benefits are so large.
>
> I'd also like to point out that it is completely trivial to have multiple
> versions of Spark running concurrently on a YARN cluster. At my previous
> (extremely large) employer we had almost every release since 1.0 installed,
> with the latest being default and production apps pinned to a specific
> version. So if you want to keep using some Scala 2.10 only library or just
> don't want to migrate to Java 8, feel free to continue using Spark 1.x for
> those applications.
>
> IMHO we need to move on from EOL stuff to make room for the future (Java
> 9, Scala 2.12) and Spark 2.0 is the only chance we are going to have to do
> so for a long time.
>
> --Andrew
>
> On Thu, Mar 24, 2016 at 10:55 PM, Mridul Muralidharan <mr...@gmail.com>
> wrote:
>
>>
>> I do agree w.r.t scala 2.10 as well; similar arguments apply (though
>> there is a nuanced diff - source compatibility for scala vs binary
>> compatibility wrt Java)
>> Was there a proposal which did not go through ? Not sure if I missed it.
>>
>> Regards
>> Mridul
>>
>>
>> On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> i think that logic is reasonable, but then the same should also apply to
>>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>>> has been since march 2015 except for one hotfix due to a license
>>> incompatibility)
>>>
>>> who wants to support scala 2.10 three years after they did the last
>>> maintenance release?
>>>
>>>
>>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>>> wrote:
>>>
>>>> Removing compatibility (with jdk, etc) can be done with a major
>>>> release- given that 7 has been EOLed a while back and is now unsupported,
>>>> we have to decide if we drop support for it in 2.0 or 3.0 (2+ years from
>>>> now).
>>>>
>>>> Given the functionality & performance benefits of going to jdk8, future
>>>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>>> requires it, and simplicity wrt code, test & support it looks like a good
>>>> checkpoint to drop jdk7 support.
>>>>
>>>> As already mentioned in the thread, existing yarn clusters are
>>>> unaffected if they want to continue running jdk7 and yet use
>>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>>> distribute jdk8 as archive - suboptimal).
>>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>>
>>>>
>>>> Proposal is for 1.6x line to continue to be supported with critical
>>>> fixes; newer features will require 2.x and so jdk8
>>>>
>>>> Regards
>>>> Mridul
>>>>
>>>>
>>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com>
>>>> wrote:
>>>>
>>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>> > If you want to go down that route, you should also ask somebody who
>>>>> has had
>>>>> > experience managing a large organization's applications and try to
>>>>> update
>>>>> > Scala version.
>>>>>
>>>>> I understand both sides. But if you look at what I've been asking
>>>>> since the beginning, it's all about the cost and benefits of dropping
>>>>> support for java 1.7.
>>>>>
>>>>> The biggest argument in your original e-mail is about testing. And the
>>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>>> be even possible to just do everything in a single job - compile for
>>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>>> which seems to be the main thing you were worried about.
>>>>>
>>>>>
>>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>>>> wrote:
>>>>> >>
>>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>>> wrote:
>>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11,
>>>>> than
>>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and
>>>>> 2.11 are
>>>>> >> > not
>>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible
>>>>> except
>>>>> >> > certain
>>>>> >> > esoteric cases.
>>>>> >>
>>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>>> >> their applications and everything... binary compatibility is a tiny
>>>>> >> drop in that bucket.
>>>>> >>
>>>>> >> --
>>>>> >> Marcelo
>>>>> >
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Marcelo
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>>
>>>>>
>>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Andrew Ray <ra...@gmail.com>.

+1 on removing Java 7 and Scala 2.10 support.

It looks to be entirely possible to support Java 8 containers in a YARN
cluster otherwise running Java 7 (example code for alt JAVA_HOME
https://issues.apache.org/jira/secure/attachment/12671739/YARN-1964.patch)
so really there should be no big problem. Even if that somehow doesn't work
I'm still +1 as the benefits are so large.

I'd also like to point out that it is completely trivial to have multiple
versions of Spark running concurrently on a YARN cluster. At my previous
(extremely large) employer we had almost every release since 1.0 installed,
with the latest being default and production apps pinned to a specific
version. So if you want to keep using some Scala 2.10 only library or just
don't want to migrate to Java 8, feel free to continue using Spark 1.x for
those applications.

IMHO we need to move on from EOL stuff to make room for the future (Java 9,
Scala 2.12) and Spark 2.0 is the only chance we are going to have to do so
for a long time.

--Andrew

On Thu, Mar 24, 2016 at 10:55 PM, Mridul Muralidharan <mr...@gmail.com>
wrote:

>
> I do agree w.r.t scala 2.10 as well; similar arguments apply (though there
> is a nuanced diff - source compatibility for scala vs binary compatibility
> wrt Java)
> Was there a proposal which did not go through ? Not sure if I missed it.
>
> Regards
> Mridul
>
>
> On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i think that logic is reasonable, but then the same should also apply to
>> scala 2.10, which is also unmaintained/unsupported at this point (basically
>> has been since march 2015 except for one hotfix due to a license
>> incompatibility)
>>
>> who wants to support scala 2.10 three years after they did the last
>> maintenance release?
>>
>>
>> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
>> wrote:
>>
>>> Removing compatibility (with jdk, etc) can be done with a major release-
>>> given that 7 has been EOLed a while back and is now unsupported, we have to
>>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>>
>>> Given the functionality & performance benefits of going to jdk8, future
>>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>>> requires it, and simplicity wrt code, test & support it looks like a good
>>> checkpoint to drop jdk7 support.
>>>
>>> As already mentioned in the thread, existing yarn clusters are
>>> unaffected if they want to continue running jdk7 and yet use
>>> spark2 (install jdk8 on all nodes and use it via JAVA_HOME, or worst case
>>> distribute jdk8 as archive - suboptimal).
>>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>>
>>>
>>> Proposal is for 1.6x line to continue to be supported with critical
>>> fixes; newer features will require 2.x and so jdk8
>>>
>>> Regards
>>> Mridul
>>>
>>>
>>> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:
>>>
>>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>> > If you want to go down that route, you should also ask somebody who
>>>> has had
>>>> > experience managing a large organization's applications and try to
>>>> update
>>>> > Scala version.
>>>>
>>>> I understand both sides. But if you look at what I've been asking
>>>> since the beginning, it's all about the cost and benefits of dropping
>>>> support for java 1.7.
>>>>
>>>> The biggest argument in your original e-mail is about testing. And the
>>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>>> supporting java 1.7. If you read one of my earlier replies, it should
>>>> be even possible to just do everything in a single job - compile for
>>>> java 7 and still be able to test things in 1.8, including lambdas,
>>>> which seems to be the main thing you were worried about.
>>>>
>>>>
>>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>>> wrote:
>>>> >>
>>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>>> wrote:
>>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>>>> are
>>>> >> > not
>>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>>>> >> > certain
>>>> >> > esoteric cases.
>>>> >>
>>>> >> True, but ask anyone who manages a large cluster how long it would
>>>> >> take them to upgrade the jdk across their cluster and validate all
>>>> >> their applications and everything... binary compatibility is a tiny
>>>> >> drop in that bucket.
>>>> >>
>>>> >> --
>>>> >> Marcelo
>>>> >
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Marcelo
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>>
>>>>
>>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Mridul Muralidharan <mr...@gmail.com>.

I do agree w.r.t scala 2.10 as well; similar arguments apply (though there
is a nuanced diff - source compatibility for scala vs binary compatibility
wrt Java)
Was there a proposal which did not go through ? Not sure if I missed it.

Regards
Mridul

On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:

> i think that logic is reasonable, but then the same should also apply to
> scala 2.10, which is also unmaintained/unsupported at this point (basically
> has been since march 2015 except for one hotfix due to a license
> incompatibility)
>
> who wants to support scala 2.10 three years after they did the last
> maintenance release?
>
>
> On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mridul@gmail.com
> <javascript:_e(%7B%7D,'cvml','mridul@gmail.com');>> wrote:
>
>> Removing compatibility (with jdk, etc) can be done with a major release-
>> given that 7 has been EOLed a while back and is now unsupported, we have to
>> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>>
>> Given the functionality & performance benefits of going to jdk8, future
>> enhancements relevant in 2.x timeframe ( scala, dependencies) which
>> requires it, and simplicity wrt code, test & support it looks like a good
>> checkpoint to drop jdk7 support.
>>
>> As already mentioned in the thread, existing yarn clusters are unaffected
>> if they want to continue running jdk7 and yet use spark2 (install jdk8 on
>> all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
>> archive - suboptimal).
>> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>>
>>
>> Proposal is for 1.6x line to continue to be supported with critical
>> fixes; newer features will require 2.x and so jdk8
>>
>> Regards
>> Mridul
>>
>>
>> On Thursday, March 24, 2016, Marcelo Vanzin <vanzin@cloudera.com
>> <javascript:_e(%7B%7D,'cvml','vanzin@cloudera.com');>> wrote:
>>
>>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>> > If you want to go down that route, you should also ask somebody who
>>> has had
>>> > experience managing a large organization's applications and try to
>>> update
>>> > Scala version.
>>>
>>> I understand both sides. But if you look at what I've been asking
>>> since the beginning, it's all about the cost and benefits of dropping
>>> support for java 1.7.
>>>
>>> The biggest argument in your original e-mail is about testing. And the
>>> testing cost is much bigger for supporting scala 2.10 than it is for
>>> supporting java 1.7. If you read one of my earlier replies, it should
>>> be even possible to just do everything in a single job - compile for
>>> java 7 and still be able to test things in 1.8, including lambdas,
>>> which seems to be the main thing you were worried about.
>>>
>>>
>>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>>> wrote:
>>> >>
>>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>>> are
>>> >> > not
>>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>>> >> > certain
>>> >> > esoteric cases.
>>> >>
>>> >> True, but ask anyone who manages a large cluster how long it would
>>> >> take them to upgrade the jdk across their cluster and validate all
>>> >> their applications and everything... binary compatibility is a tiny
>>> >> drop in that bucket.
>>> >>
>>> >> --
>>> >> Marcelo
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Marcelo
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: dev-help@spark.apache.org
>>>
>>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

i think that logic is reasonable, but then the same should also apply to
scala 2.10, which is also unmaintained/unsupported at this point (basically
has been since march 2015 except for one hotfix due to a license
incompatibility)

who wants to support scala 2.10 three years after they did the last
maintenance release?


On Thu, Mar 24, 2016 at 9:59 PM, Mridul Muralidharan <mr...@gmail.com>
wrote:

> Removing compatibility (with jdk, etc) can be done with a major release-
> given that 7 has been EOLed a while back and is now unsupported, we have to
> decide if we drop support for it in 2.0 or 3.0 (2+ years from now).
>
> Given the functionality & performance benefits of going to jdk8, future
> enhancements relevant in 2.x timeframe ( scala, dependencies) which
> requires it, and simplicity wrt code, test & support it looks like a good
> checkpoint to drop jdk7 support.
>
> As already mentioned in the thread, existing yarn clusters are unaffected
> if they want to continue running jdk7 and yet use spark2 (install jdk8 on
> all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
> archive - suboptimal).
> I am unsure about mesos (standalone might be easier upgrade I guess ?).
>
>
> Proposal is for 1.6x line to continue to be supported with critical fixes; newer
> features will require 2.x and so jdk8
>
> Regards
> Mridul
>
>
> On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:
>
>> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > If you want to go down that route, you should also ask somebody who has
>> had
>> > experience managing a large organization's applications and try to
>> update
>> > Scala version.
>>
>> I understand both sides. But if you look at what I've been asking
>> since the beginning, it's all about the cost and benefits of dropping
>> support for java 1.7.
>>
>> The biggest argument in your original e-mail is about testing. And the
>> testing cost is much bigger for supporting scala 2.10 than it is for
>> supporting java 1.7. If you read one of my earlier replies, it should
>> be even possible to just do everything in a single job - compile for
>> java 7 and still be able to test things in 1.8, including lambdas,
>> which seems to be the main thing you were worried about.
>>
>>
>> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
>> wrote:
>> >>
>> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com>
>> wrote:
>> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11
>> are
>> >> > not
>> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>> >> > certain
>> >> > esoteric cases.
>> >>
>> >> True, but ask anyone who manages a large cluster how long it would
>> >> take them to upgrade the jdk across their cluster and validate all
>> >> their applications and everything... binary compatibility is a tiny
>> >> drop in that bucket.
>> >>
>> >> --
>> >> Marcelo
>> >
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Mridul Muralidharan <mr...@gmail.com>.

Removing compatibility (with jdk, etc) can be done with a major release-
given that 7 has been EOLed a while back and is now unsupported, we have to
decide if we drop support for it in 2.0 or 3.0 (2+ years from now).

Given the functionality & performance benefits of going to jdk8, future
enhancements relevant in 2.x timeframe ( scala, dependencies) which
requires it, and simplicity wrt code, test & support it looks like a good
checkpoint to drop jdk7 support.

As already mentioned in the thread, existing yarn clusters are unaffected
if they want to continue running jdk7 and yet use spark2 (install jdk8 on
all nodes and use it via JAVA_HOME, or worst case distribute jdk8 as
archive - suboptimal).
I am unsure about mesos (standalone might be easier upgrade I guess ?).


Proposal is for 1.6x line to continue to be supported with critical
fixes; newer
features will require 2.x and so jdk8

Regards
Mridul


On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:

> On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rxin@databricks.com
> <javascript:;>> wrote:
> > If you want to go down that route, you should also ask somebody who has
> had
> > experience managing a large organization's applications and try to update
> > Scala version.
>
> I understand both sides. But if you look at what I've been asking
> since the beginning, it's all about the cost and benefits of dropping
> support for java 1.7.
>
> The biggest argument in your original e-mail is about testing. And the
> testing cost is much bigger for supporting scala 2.10 than it is for
> supporting java 1.7. If you read one of my earlier replies, it should
> be even possible to just do everything in a single job - compile for
> java 7 and still be able to test things in 1.8, including lambdas,
> which seems to be the main thing you were worried about.
>
>
> > On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <vanzin@cloudera.com
> <javascript:;>> wrote:
> >>
> >> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rxin@databricks.com
> <javascript:;>> wrote:
> >> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
> >> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are
> >> > not
> >> > binary compatible, whereas JVM 7 and 8 are binary compatible except
> >> > certain
> >> > esoteric cases.
> >>
> >> True, but ask anyone who manages a large cluster how long it would
> >> take them to upgrade the jdk across their cluster and validate all
> >> their applications and everything... binary compatibility is a tiny
> >> drop in that bucket.
> >>
> >> --
> >> Marcelo
> >
> >
>
>
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org <javascript:;>
> For additional commands, e-mail: dev-help@spark.apache.org <javascript:;>
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com> wrote:
> If you want to go down that route, you should also ask somebody who has had
> experience managing a large organization's applications and try to update
> Scala version.

I understand both sides. But if you look at what I've been asking
since the beginning, it's all about the cost and benefits of dropping
support for java 1.7.

The biggest argument in your original e-mail is about testing. And the
testing cost is much bigger for supporting scala 2.10 than it is for
supporting java 1.7. If you read one of my earlier replies, it should
be even possible to just do everything in a single job - compile for
java 7 and still be able to test things in 1.8, including lambdas,
which seems to be the main thing you were worried about.

> On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>>
>> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are
>> > not
>> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>> > certain
>> > esoteric cases.
>>
>> True, but ask anyone who manages a large cluster how long it would
>> take them to upgrade the jdk across their cluster and validate all
>> their applications and everything... binary compatibility is a tiny
>> drop in that bucket.
>>
>> --
>> Marcelo
>
>

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Michael Armbrust <mi...@databricks.com>.

On Thu, Mar 24, 2016 at 4:54 PM, Mark Hamstra <ma...@clearstorydata.com>
 wrote:

> It's a pain in the ass.  Especially if some of your transitive
> dependencies never upgraded from 2.10 to 2.11.
>

Yeah, I'm going to have to agree here.  It is not as bad as it was in the
2.9 days, but its still non-trivial due to the eco-system part of it.  For
this reason I think that it is premature to drop support for 2.10.x.

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Mark Hamstra <ma...@clearstorydata.com>.

It's a pain in the ass.  Especially if some of your transitive dependencies
never upgraded from 2.10 to 2.11.

On Thu, Mar 24, 2016 at 4:50 PM, Reynold Xin <rx...@databricks.com> wrote:

> If you want to go down that route, you should also ask somebody who has
> had experience managing a large organization's applications and try to
> update Scala version.
>
>
> On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are
>> not
>> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>> certain
>> > esoteric cases.
>>
>> True, but ask anyone who manages a large cluster how long it would
>> take them to upgrade the jdk across their cluster and validate all
>> their applications and everything... binary compatibility is a tiny
>> drop in that bucket.
>>
>> --
>> Marcelo
>>
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

the good news is, that from an shared infrastructure perspective, most
places have zero scala, so the upgrade is actually very easy. i can see how
it would be different for say twitter....

On Thu, Mar 24, 2016 at 7:50 PM, Reynold Xin <rx...@databricks.com> wrote:

> If you want to go down that route, you should also ask somebody who has
> had experience managing a large organization's applications and try to
> update Scala version.
>
>
> On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:
>> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
>> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are
>> not
>> > binary compatible, whereas JVM 7 and 8 are binary compatible except
>> certain
>> > esoteric cases.
>>
>> True, but ask anyone who manages a large cluster how long it would
>> take them to upgrade the jdk across their cluster and validate all
>> their applications and everything... binary compatibility is a tiny
>> drop in that bucket.
>>
>> --
>> Marcelo
>>
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

If you want to go down that route, you should also ask somebody who has had
experience managing a large organization's applications and try to update
Scala version.


On Thu, Mar 24, 2016 at 4:48 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

> On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:
> > Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
> > upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are
> not
> > binary compatible, whereas JVM 7 and 8 are binary compatible except
> certain
> > esoteric cases.
>
> True, but ask anyone who manages a large cluster how long it would
> take them to upgrade the jdk across their cluster and validate all
> their applications and everything... binary compatibility is a tiny
> drop in that bucket.
>
> --
> Marcelo
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Thu, Mar 24, 2016 at 4:46 PM, Reynold Xin <rx...@databricks.com> wrote:
> Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
> upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are not
> binary compatible, whereas JVM 7 and 8 are binary compatible except certain
> esoteric cases.

True, but ask anyone who manages a large cluster how long it would
take them to upgrade the jdk across their cluster and validate all
their applications and everything... binary compatibility is a tiny
drop in that bucket.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

Actually it's *way* harder to upgrade Scala from 2.10 to 2.11, than
upgrading the JVM runtime from 7 to 8, because Scala 2.10 and 2.11 are not
binary compatible, whereas JVM 7 and 8 are binary compatible except certain
esoteric cases.


On Thu, Mar 24, 2016 at 4:44 PM, Kostas Sakellis <ko...@cloudera.com>
wrote:

> If an argument here is the ongoing build/maintenance burden I think we
> should seriously consider dropping scala 2.10 in Spark 2.0. Supporting
> scala 2.10 is bigger build/infrastructure burden than supporting jdk7 since
> you actually have to build different artifacts and test them whereas you
> can target Spark onto 1.7 and just test it on JDK8.
>
> In addition, as others pointed out, it seems like a bigger pain to drop
> support for a JDK than scala version. So if we are considering dropping
> java 7, which is a breaking change on the infra side, now is also a good
> time to drop Scala 2.10 support.
>
> Kostas
>
> P.S. I haven't heard anyone on this thread fight for Scala 2.10 support.
>
> On Thu, Mar 24, 2016 at 2:46 PM, Marcelo Vanzin <va...@cloudera.com>
> wrote:
>
>> On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky <ja...@odersky.com> wrote:
>> > You can, but since it's going to be a maintainability issue I would
>> > argue it is in fact a problem.
>>
>> Every thing you choose to support generates a maintenance burden.
>> Support 3 versions of Scala would be a huge maintenance burden, for
>> example, as is supporting 2 versions of the JDK. Just note that,
>> technically, we do support 2 versions of the jdk today; we just don't
>> do a lot of automated testing on jdk 8 (PRs are all built with jdk 7
>> AFAIK).
>>
>> So at the end it's a compromise. How many users will be affected by
>> your choices? That's the question that I think is the most important.
>> If switching to java 8-only means a bunch of users won't be able to
>> upgrade, it means that Spark 2.0 will get less use than 1.x and will
>> take longer to gain traction. That has other ramifications - such as
>> less use means less issues might be found and the overall quality may
>> suffer in the beginning of this transition.
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Kostas Sakellis <ko...@cloudera.com>.

If an argument here is the ongoing build/maintenance burden I think we
should seriously consider dropping scala 2.10 in Spark 2.0. Supporting
scala 2.10 is bigger build/infrastructure burden than supporting jdk7 since
you actually have to build different artifacts and test them whereas you
can target Spark onto 1.7 and just test it on JDK8.

In addition, as others pointed out, it seems like a bigger pain to drop
support for a JDK than scala version. So if we are considering dropping
java 7, which is a breaking change on the infra side, now is also a good
time to drop Scala 2.10 support.

Kostas

P.S. I haven't heard anyone on this thread fight for Scala 2.10 support.

On Thu, Mar 24, 2016 at 2:46 PM, Marcelo Vanzin <va...@cloudera.com> wrote:

> On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky <ja...@odersky.com> wrote:
> > You can, but since it's going to be a maintainability issue I would
> > argue it is in fact a problem.
>
> Every thing you choose to support generates a maintenance burden.
> Support 3 versions of Scala would be a huge maintenance burden, for
> example, as is supporting 2 versions of the JDK. Just note that,
> technically, we do support 2 versions of the jdk today; we just don't
> do a lot of automated testing on jdk 8 (PRs are all built with jdk 7
> AFAIK).
>
> So at the end it's a compromise. How many users will be affected by
> your choices? That's the question that I think is the most important.
> If switching to java 8-only means a bunch of users won't be able to
> upgrade, it means that Spark 2.0 will get less use than 1.x and will
> take longer to gain traction. That has other ramifications - such as
> less use means less issues might be found and the overall quality may
> suffer in the beginning of this transition.
>
> --
> Marcelo
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky <ja...@odersky.com> wrote:
> You can, but since it's going to be a maintainability issue I would
> argue it is in fact a problem.

Every thing you choose to support generates a maintenance burden.
Support 3 versions of Scala would be a huge maintenance burden, for
example, as is supporting 2 versions of the JDK. Just note that,
technically, we do support 2 versions of the jdk today; we just don't
do a lot of automated testing on jdk 8 (PRs are all built with jdk 7
AFAIK).

So at the end it's a compromise. How many users will be affected by
your choices? That's the question that I think is the most important.
If switching to java 8-only means a bunch of users won't be able to
upgrade, it means that Spark 2.0 will get less use than 1.x and will
take longer to gain traction. That has other ramifications - such as
less use means less issues might be found and the overall quality may
suffer in the beginning of this transition.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Jakob Odersky <ja...@odersky.com>.

I mean from the perspective of someone developing Spark, it makes
things more complicated. It's just my point of view, people that
actually support Spark deployments may have a different opinion ;)

On Thu, Mar 24, 2016 at 2:41 PM, Jakob Odersky <ja...@odersky.com> wrote:
> You can, but since it's going to be a maintainability issue I would
> argue it is in fact a problem.
>
> On Thu, Mar 24, 2016 at 2:34 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
>> Hi Jakob,
>>
>> On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky <ja...@odersky.com> wrote:
>>> Reynold's 3rd point is particularly strong in my opinion. Supporting
>>> Consider what would happen if Spark 2.0 doesn't require Java 8 and
>>> hence not support Scala 2.12. Will it be stuck on an older version
>>> until 3.0 is out?
>>
>> That's a false choice. You can support 2.10 (or 2.11) on Java 7 and
>> 2.12 on Java 8.
>>
>> I'm not saying it's a great idea, just that what you're suggesting
>> isn't really a problem.
>>
>> --
>> Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Jakob Odersky <ja...@odersky.com>.

You can, but since it's going to be a maintainability issue I would
argue it is in fact a problem.

On Thu, Mar 24, 2016 at 2:34 PM, Marcelo Vanzin <va...@cloudera.com> wrote:
> Hi Jakob,
>
> On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky <ja...@odersky.com> wrote:
>> Reynold's 3rd point is particularly strong in my opinion. Supporting
>> Consider what would happen if Spark 2.0 doesn't require Java 8 and
>> hence not support Scala 2.12. Will it be stuck on an older version
>> until 3.0 is out?
>
> That's a false choice. You can support 2.10 (or 2.11) on Java 7 and
> 2.12 on Java 8.
>
> I'm not saying it's a great idea, just that what you're suggesting
> isn't really a problem.
>
> --
> Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

Hi Jakob,

On Thu, Mar 24, 2016 at 2:29 PM, Jakob Odersky <ja...@odersky.com> wrote:
> Reynold's 3rd point is particularly strong in my opinion. Supporting
> Consider what would happen if Spark 2.0 doesn't require Java 8 and
> hence not support Scala 2.12. Will it be stuck on an older version
> until 3.0 is out?

That's a false choice. You can support 2.10 (or 2.11) on Java 7 and
2.12 on Java 8.

I'm not saying it's a great idea, just that what you're suggesting
isn't really a problem.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Jakob Odersky <ja...@odersky.com>.

Reynold's 3rd point is particularly strong in my opinion. Supporting
Scala 2.12 will require Java 8 anyway, and introducing such a change
is probably best done in a major release.
Consider what would happen if Spark 2.0 doesn't require Java 8 and
hence not support Scala 2.12. Will it be stuck on an older version
until 3.0 is out? Will it be introduced in a minor release?
I think 2.0 is the best time for such a change.

On Thu, Mar 24, 2016 at 11:46 AM, Stephen Boesch <ja...@gmail.com> wrote:
> +1 for java8 only   +1 for 2.11+ only .    At this point scala libraries
> supporting only 2.10 are typically less active and/or poorly maintained.
> That trend will only continue when considering the lifespan of spark 2.X.
>
> 2016-03-24 11:32 GMT-07:00 Steve Loughran <st...@hortonworks.com>:
>>
>>
>> On 24 Mar 2016, at 15:27, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> i think the arguments are convincing, but it also makes me wonder if i
>> live in some kind of alternate universe... we deploy on customers clusters,
>> where the OS, python version, java version and hadoop distro are not chosen
>> by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
>> have access to a single proxy machine and launch through yarn. asking them
>> to upgrade java is pretty much out of the question or a 6+ month ordeal. of
>> the 10 client clusters i can think of on the top of my head all of them are
>> on java 7, none are on java 8. so by doing this you would make spark 2
>> basically unusable for us (unless most of them have plans of upgrading in
>> near term to java 8, i will ask around and report back...).
>>
>>
>>
>> It's not actually mandatory for the process executing in the Yarn cluster
>> to run with the same JVM as the rest of the Hadoop stack; all that is needed
>> is for the environment variables to set up the JAVA_HOME and PATH. Switching
>> JVMs not something which YARN makes it easy to do, but it may be possible,
>> especially if Spark itself provides some hooks, so you don't have to
>> manually lay with setting things up. That may be something which could
>> significantly ease adoption of Spark 2 in YARN clusters. Same for Python.
>>
>> This is something I could probably help others to address
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Romi Kuntsman <ro...@totango.com>.

+1 for Java 8 only

I think it will make it easier to make a unified API for Java and Scala,
instead of the wrappers of Java over Scala.
On Mar 24, 2016 11:46 AM, "Stephen Boesch" <ja...@gmail.com> wrote:

> +1 for java8 only   +1 for 2.11+ only .    At this point scala libraries
> supporting only 2.10 are typically less active and/or poorly maintained.
> That trend will only continue when considering the lifespan of spark 2.X.
>
> 2016-03-24 11:32 GMT-07:00 Steve Loughran <st...@hortonworks.com>:
>
>>
>> On 24 Mar 2016, at 15:27, Koert Kuipers <ko...@tresata.com> wrote:
>>
>> i think the arguments are convincing, but it also makes me wonder if i
>> live in some kind of alternate universe... we deploy on customers clusters,
>> where the OS, python version, java version and hadoop distro are not chosen
>> by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
>> have access to a single proxy machine and launch through yarn. asking them
>> to upgrade java is pretty much out of the question or a 6+ month ordeal. of
>> the 10 client clusters i can think of on the top of my head all of them are
>> on java 7, none are on java 8. so by doing this you would make spark 2
>> basically unusable for us (unless most of them have plans of upgrading in
>> near term to java 8, i will ask around and report back...).
>>
>>
>>
>> It's not actually mandatory for the process executing in the Yarn cluster
>> to run with the same JVM as the rest of the Hadoop stack; all that is
>> needed is for the environment variables to set up the JAVA_HOME and PATH.
>> Switching JVMs not something which YARN makes it easy to do, but it may be
>> possible, especially if Spark itself provides some hooks, so you don't have
>> to manually lay with setting things up. That may be something which could
>> significantly ease adoption of Spark 2 in YARN clusters. Same for Python.
>>
>> This is something I could probably help others to address
>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Stephen Boesch <ja...@gmail.com>.

+1 for java8 only   +1 for 2.11+ only .    At this point scala libraries
supporting only 2.10 are typically less active and/or poorly maintained.
That trend will only continue when considering the lifespan of spark 2.X.

2016-03-24 11:32 GMT-07:00 Steve Loughran <st...@hortonworks.com>:

>
> On 24 Mar 2016, at 15:27, Koert Kuipers <ko...@tresata.com> wrote:
>
> i think the arguments are convincing, but it also makes me wonder if i
> live in some kind of alternate universe... we deploy on customers clusters,
> where the OS, python version, java version and hadoop distro are not chosen
> by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
> have access to a single proxy machine and launch through yarn. asking them
> to upgrade java is pretty much out of the question or a 6+ month ordeal. of
> the 10 client clusters i can think of on the top of my head all of them are
> on java 7, none are on java 8. so by doing this you would make spark 2
> basically unusable for us (unless most of them have plans of upgrading in
> near term to java 8, i will ask around and report back...).
>
>
>
> It's not actually mandatory for the process executing in the Yarn cluster
> to run with the same JVM as the rest of the Hadoop stack; all that is
> needed is for the environment variables to set up the JAVA_HOME and PATH.
> Switching JVMs not something which YARN makes it easy to do, but it may be
> possible, especially if Spark itself provides some hooks, so you don't have
> to manually lay with setting things up. That may be something which could
> significantly ease adoption of Spark 2 in YARN clusters. Same for Python.
>
> This is something I could probably help others to address
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Steve Loughran <st...@hortonworks.com>.

On 24 Mar 2016, at 15:27, Koert Kuipers <ko...@tresata.com>> wrote:

i think the arguments are convincing, but it also makes me wonder if i live in some kind of alternate universe... we deploy on customers clusters, where the OS, python version, java version and hadoop distro are not chosen by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply have access to a single proxy machine and launch through yarn. asking them to upgrade java is pretty much out of the question or a 6+ month ordeal. of the 10 client clusters i can think of on the top of my head all of them are on java 7, none are on java 8. so by doing this you would make spark 2 basically unusable for us (unless most of them have plans of upgrading in near term to java 8, i will ask around and report back...).


It's not actually mandatory for the process executing in the Yarn cluster to run with the same JVM as the rest of the Hadoop stack; all that is needed is for the environment variables to set up the JAVA_HOME and PATH. Switching JVMs not something which YARN makes it easy to do, but it may be possible, especially if Spark itself provides some hooks, so you don't have to manually lay with setting things up. That may be something which could significantly ease adoption of Spark 2 in YARN clusters. Same for Python.

This is something I could probably help others to address

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Al Pivonka <al...@gmail.com>.

Thank you for the context Jean...
I appreciate it...


On Thu, Mar 24, 2016 at 12:40 PM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> Hi Al,
>
> Spark 2.0 doesn't mean Spark 1.x will stop. Clearly, new features will go
> on Spark 2.0, but maintenance release can be performed on 1.x branch.
>
> Regards
> JB
>
> On 03/24/2016 05:38 PM, Al Pivonka wrote:
>
>> As an end user (developer) and Cluster Admin.
>> I would have to agree with Koet.
>>
>> To me the real question is timing,  current version is 1.6.1, the
>> question I have is how many more releases till 2.0 and what is the time
>> frame?
>>
>> If you give people six to twelve months to plan and make sure they know
>> (paste it all over the web site) most can plan ahead.
>>
>>
>> Just my two pennies
>>
>>
>>
>>
>>
>> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <sowen@cloudera.com
>> <ma...@cloudera.com>> wrote:
>>
>>     (PS CDH5 runs fine with Java 8, but I understand your more general
>>     point.)
>>
>>     This is a familiar context indeed, but in that context, would a group
>>     not wanting to update to Java 8 want to manually put Spark 2.0 into
>>     the mix? That is, if this is a context where the cluster is
>>     purposefully some stable mix of components, would you be updating just
>>     one?
>>
>>     You make a good point about Scala being more library than
>>     infrastructure component. So it can be updated on a per-app basis. On
>>     the one hand it's harder to handle different Scala versions from the
>>     framework side, it's less hard on the deployment side.
>>
>>     On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <koert@tresata.com
>>     <ma...@tresata.com>> wrote:
>>      > i think the arguments are convincing, but it also makes me wonder
>>     if i live
>>      > in some kind of alternate universe... we deploy on customers
>>     clusters, where
>>      > the OS, python version, java version and hadoop distro are not
>>     chosen by us.
>>      > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we
>>     simply have
>>      > access to a single proxy machine and launch through yarn. asking
>>     them to
>>      > upgrade java is pretty much out of the question or a 6+ month
>>     ordeal. of the
>>      > 10 client clusters i can think of on the top of my head all of
>>     them are on
>>      > java 7, none are on java 8. so by doing this you would make spark 2
>>      > basically unusable for us (unless most of them have plans of
>>     upgrading in
>>      > near term to java 8, i will ask around and report back...).
>>      >
>>      > on a side note, its particularly interesting to me that spark 2
>>     chose to
>>      > continue support for scala 2.10, because even for us in our very
>>     constricted
>>      > client environments the scala version is something we can easily
>>     upgrade (we
>>      > just deploy a custom build of spark for the relevant scala
>>     version and
>>      > hadoop distro). and because scala is not a dependency of any
>>     hadoop distro
>>      > (so not on classpath, which i am very happy about) we can use
>>     whatever scala
>>      > version we like. also i found the upgrade path from scala 2.10 to
>>     2.11 to be
>>      > very easy, so i have a hard time understanding why anyone would
>>     stay on
>>      > scala 2.10. and finally with scala 2.12 around the corner you
>>     really dont
>>      > want to be supporting 3 versions. so clearly i am missing
>>     something here.
>>      >
>>      >
>>      >
>>      > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré
>>     <jb@nanthrax.net <ma...@nanthrax.net>>
>>
>>      > wrote:
>>      >>
>>      >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end
>>     support of
>>      >> Java 7. It makes sense.
>>      >>
>>      >> Regards
>>      >> JB
>>      >>
>>      >>
>>      >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>>      >>>
>>      >>> About a year ago we decided to drop Java 6 support in Spark
>>     1.5. I am
>>      >>> wondering if we should also just drop Java 7 support in Spark
>>     2.0 (i.e.
>>      >>> Spark 2.0 would require Java 8 to run).
>>      >>>
>>      >>> Oracle ended public updates for JDK 7 in one year ago (Apr
>>     2015), and
>>      >>> removed public downloads for JDK 7 in July 2015. In the past I've
>>      >>> actually been against dropping Java 8, but today I ran into an
>>     issue
>>      >>> with the new Dataset API not working well with Java 8 lambdas,
>>     and that
>>      >>> changed my opinion on this.
>>      >>>
>>      >>> I've been thinking more about this issue today and also talked
>>     with a
>>      >>> lot people offline to gather feedback, and I actually think the
>>     pros
>>      >>> outweighs the cons, for the following reasons (in some rough
>>     order of
>>      >>> importance):
>>      >>>
>>      >>> 1. It is complicated to test how well Spark APIs work for Java
>>     lambdas
>>      >>> if we support Java 7. Jenkins machines need to have both Java 7
>>     and Java
>>      >>> 8 installed and we must run through a set of test suites in 7,
>>     and then
>>      >>> the lambda tests in Java 8. This complicates build
>>     environments/scripts,
>>      >>> and makes them less robust. Without good testing
>>     infrastructure, I have
>>      >>> no confidence in building good APIs for Java 8.
>>      >>>
>>      >>> 2. Dataset/DataFrame performance will be between 1x to 10x
>>     slower in
>>      >>> Java 7. The primary APIs we want users to use in Spark 2.x are
>>      >>> Dataset/DataFrame, and this impacts pretty much everything from
>>     machine
>>      >>> learning to structured streaming. We have made great progress
>>     in their
>>      >>> performance through extensive use of code generation. (In many
>>      >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
>>     compiler
>>      >>> than a MapReduce or query engine.) These optimizations don't
>>     work well
>>      >>> in Java 7 due to broken code cache flushing. This problem has
>>     been fixed
>>      >>> by Oracle in Java 8. In addition, Java 8 comes with better
>>     support for
>>      >>> Unsafe and SIMD.
>>      >>>
>>      >>> 3. Scala 2.12 will come out soon, and we will want to add
>>     support for
>>      >>> that. Scala 2.12 only works on Java 8. If we do support Java 7,
>>     we'd
>>      >>> have a fairly complicated compatibility matrix and testing
>>      >>> infrastructure.
>>      >>>
>>      >>> 4. There are libraries that I've looked into in the past that
>>     support
>>      >>> only Java 8. This is more common in high performance libraries
>>     such as
>>      >>> Aeron (a messaging library). Having to support Java 7 means we
>>     are not
>>      >>> able to use these. It is not that big of a deal right now, but
>> will
>>      >>> become increasingly more difficult as we optimize performance.
>>      >>>
>>      >>>
>>      >>> The downside of not supporting Java 7 is also obvious. Some
>>      >>> organizations are stuck with Java 7, and they wouldn't be able
>>     to use
>>      >>> Spark 2.0 without upgrading Java.
>>      >>>
>>      >>>
>>      >>
>>      >> --
>>      >> Jean-Baptiste Onofré
>>      >> jbonofre@apache.org <ma...@apache.org>
>>      >> http://blog.nanthrax.net
>>      >> Talend - http://www.talend.com
>>      >>
>>      >>
>>      >>
>>     ---------------------------------------------------------------------
>>      >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>     <ma...@spark.apache.org>
>>      >> For additional commands, e-mail: dev-help@spark.apache.org
>>     <ma...@spark.apache.org>
>>      >>
>>      >
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>     <ma...@spark.apache.org>
>>     For additional commands, e-mail: dev-help@spark.apache.org
>>     <ma...@spark.apache.org>
>>
>>
>>
>>
>> --
>> Those who say it can't be done, are usually interrupted by those doing it.
>>
>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Those who say it can't be done, are usually interrupted by those doing it.

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

Hi Al,

Spark 2.0 doesn't mean Spark 1.x will stop. Clearly, new features will 
go on Spark 2.0, but maintenance release can be performed on 1.x branch.

Regards
JB

On 03/24/2016 05:38 PM, Al Pivonka wrote:
> As an end user (developer) and Cluster Admin.
> I would have to agree with Koet.
>
> To me the real question is timing,  current version is 1.6.1, the
> question I have is how many more releases till 2.0 and what is the time
> frame?
>
> If you give people six to twelve months to plan and make sure they know
> (paste it all over the web site) most can plan ahead.
>
>
> Just my two pennies
>
>
>
>
>
> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <sowen@cloudera.com
> <ma...@cloudera.com>> wrote:
>
>     (PS CDH5 runs fine with Java 8, but I understand your more general
>     point.)
>
>     This is a familiar context indeed, but in that context, would a group
>     not wanting to update to Java 8 want to manually put Spark 2.0 into
>     the mix? That is, if this is a context where the cluster is
>     purposefully some stable mix of components, would you be updating just
>     one?
>
>     You make a good point about Scala being more library than
>     infrastructure component. So it can be updated on a per-app basis. On
>     the one hand it's harder to handle different Scala versions from the
>     framework side, it's less hard on the deployment side.
>
>     On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <koert@tresata.com
>     <ma...@tresata.com>> wrote:
>      > i think the arguments are convincing, but it also makes me wonder
>     if i live
>      > in some kind of alternate universe... we deploy on customers
>     clusters, where
>      > the OS, python version, java version and hadoop distro are not
>     chosen by us.
>      > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we
>     simply have
>      > access to a single proxy machine and launch through yarn. asking
>     them to
>      > upgrade java is pretty much out of the question or a 6+ month
>     ordeal. of the
>      > 10 client clusters i can think of on the top of my head all of
>     them are on
>      > java 7, none are on java 8. so by doing this you would make spark 2
>      > basically unusable for us (unless most of them have plans of
>     upgrading in
>      > near term to java 8, i will ask around and report back...).
>      >
>      > on a side note, its particularly interesting to me that spark 2
>     chose to
>      > continue support for scala 2.10, because even for us in our very
>     constricted
>      > client environments the scala version is something we can easily
>     upgrade (we
>      > just deploy a custom build of spark for the relevant scala
>     version and
>      > hadoop distro). and because scala is not a dependency of any
>     hadoop distro
>      > (so not on classpath, which i am very happy about) we can use
>     whatever scala
>      > version we like. also i found the upgrade path from scala 2.10 to
>     2.11 to be
>      > very easy, so i have a hard time understanding why anyone would
>     stay on
>      > scala 2.10. and finally with scala 2.12 around the corner you
>     really dont
>      > want to be supporting 3 versions. so clearly i am missing
>     something here.
>      >
>      >
>      >
>      > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré
>     <jb@nanthrax.net <ma...@nanthrax.net>>
>      > wrote:
>      >>
>      >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end
>     support of
>      >> Java 7. It makes sense.
>      >>
>      >> Regards
>      >> JB
>      >>
>      >>
>      >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>      >>>
>      >>> About a year ago we decided to drop Java 6 support in Spark
>     1.5. I am
>      >>> wondering if we should also just drop Java 7 support in Spark
>     2.0 (i.e.
>      >>> Spark 2.0 would require Java 8 to run).
>      >>>
>      >>> Oracle ended public updates for JDK 7 in one year ago (Apr
>     2015), and
>      >>> removed public downloads for JDK 7 in July 2015. In the past I've
>      >>> actually been against dropping Java 8, but today I ran into an
>     issue
>      >>> with the new Dataset API not working well with Java 8 lambdas,
>     and that
>      >>> changed my opinion on this.
>      >>>
>      >>> I've been thinking more about this issue today and also talked
>     with a
>      >>> lot people offline to gather feedback, and I actually think the
>     pros
>      >>> outweighs the cons, for the following reasons (in some rough
>     order of
>      >>> importance):
>      >>>
>      >>> 1. It is complicated to test how well Spark APIs work for Java
>     lambdas
>      >>> if we support Java 7. Jenkins machines need to have both Java 7
>     and Java
>      >>> 8 installed and we must run through a set of test suites in 7,
>     and then
>      >>> the lambda tests in Java 8. This complicates build
>     environments/scripts,
>      >>> and makes them less robust. Without good testing
>     infrastructure, I have
>      >>> no confidence in building good APIs for Java 8.
>      >>>
>      >>> 2. Dataset/DataFrame performance will be between 1x to 10x
>     slower in
>      >>> Java 7. The primary APIs we want users to use in Spark 2.x are
>      >>> Dataset/DataFrame, and this impacts pretty much everything from
>     machine
>      >>> learning to structured streaming. We have made great progress
>     in their
>      >>> performance through extensive use of code generation. (In many
>      >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
>     compiler
>      >>> than a MapReduce or query engine.) These optimizations don't
>     work well
>      >>> in Java 7 due to broken code cache flushing. This problem has
>     been fixed
>      >>> by Oracle in Java 8. In addition, Java 8 comes with better
>     support for
>      >>> Unsafe and SIMD.
>      >>>
>      >>> 3. Scala 2.12 will come out soon, and we will want to add
>     support for
>      >>> that. Scala 2.12 only works on Java 8. If we do support Java 7,
>     we'd
>      >>> have a fairly complicated compatibility matrix and testing
>      >>> infrastructure.
>      >>>
>      >>> 4. There are libraries that I've looked into in the past that
>     support
>      >>> only Java 8. This is more common in high performance libraries
>     such as
>      >>> Aeron (a messaging library). Having to support Java 7 means we
>     are not
>      >>> able to use these. It is not that big of a deal right now, but will
>      >>> become increasingly more difficult as we optimize performance.
>      >>>
>      >>>
>      >>> The downside of not supporting Java 7 is also obvious. Some
>      >>> organizations are stuck with Java 7, and they wouldn't be able
>     to use
>      >>> Spark 2.0 without upgrading Java.
>      >>>
>      >>>
>      >>
>      >> --
>      >> Jean-Baptiste Onofré
>      >> jbonofre@apache.org <ma...@apache.org>
>      >> http://blog.nanthrax.net
>      >> Talend - http://www.talend.com
>      >>
>      >>
>      >>
>     ---------------------------------------------------------------------
>      >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>      >> For additional commands, e-mail: dev-help@spark.apache.org
>     <ma...@spark.apache.org>
>      >>
>      >
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>     <ma...@spark.apache.org>
>     For additional commands, e-mail: dev-help@spark.apache.org
>     <ma...@spark.apache.org>
>
>
>
>
> --
> Those who say it can't be done, are usually interrupted by those doing it.

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Al Pivonka <al...@gmail.com>.

As an end user (developer) and Cluster Admin.
I would have to agree with Koet.

To me the real question is timing,  current version is 1.6.1, the question
I have is how many more releases till 2.0 and what is the time frame?

If you give people six to twelve months to plan and make sure they know
(paste it all over the web site) most can plan ahead.


Just my two pennies





On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com> wrote:

> (PS CDH5 runs fine with Java 8, but I understand your more general point.)
>
> This is a familiar context indeed, but in that context, would a group
> not wanting to update to Java 8 want to manually put Spark 2.0 into
> the mix? That is, if this is a context where the cluster is
> purposefully some stable mix of components, would you be updating just
> one?
>
> You make a good point about Scala being more library than
> infrastructure component. So it can be updated on a per-app basis. On
> the one hand it's harder to handle different Scala versions from the
> framework side, it's less hard on the deployment side.
>
> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > i think the arguments are convincing, but it also makes me wonder if i
> live
> > in some kind of alternate universe... we deploy on customers clusters,
> where
> > the OS, python version, java version and hadoop distro are not chosen by
> us.
> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply have
> > access to a single proxy machine and launch through yarn. asking them to
> > upgrade java is pretty much out of the question or a 6+ month ordeal. of
> the
> > 10 client clusters i can think of on the top of my head all of them are
> on
> > java 7, none are on java 8. so by doing this you would make spark 2
> > basically unusable for us (unless most of them have plans of upgrading in
> > near term to java 8, i will ask around and report back...).
> >
> > on a side note, its particularly interesting to me that spark 2 chose to
> > continue support for scala 2.10, because even for us in our very
> constricted
> > client environments the scala version is something we can easily upgrade
> (we
> > just deploy a custom build of spark for the relevant scala version and
> > hadoop distro). and because scala is not a dependency of any hadoop
> distro
> > (so not on classpath, which i am very happy about) we can use whatever
> scala
> > version we like. also i found the upgrade path from scala 2.10 to 2.11
> to be
> > very easy, so i have a hard time understanding why anyone would stay on
> > scala 2.10. and finally with scala 2.12 around the corner you really dont
> > want to be supporting 3 versions. so clearly i am missing something here.
> >
> >
> >
> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >>
> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support
> of
> >> Java 7. It makes sense.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
> >>>
> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> >>> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> >>> Spark 2.0 would require Java 8 to run).
> >>>
> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> >>> removed public downloads for JDK 7 in July 2015. In the past I've
> >>> actually been against dropping Java 8, but today I ran into an issue
> >>> with the new Dataset API not working well with Java 8 lambdas, and that
> >>> changed my opinion on this.
> >>>
> >>> I've been thinking more about this issue today and also talked with a
> >>> lot people offline to gather feedback, and I actually think the pros
> >>> outweighs the cons, for the following reasons (in some rough order of
> >>> importance):
> >>>
> >>> 1. It is complicated to test how well Spark APIs work for Java lambdas
> >>> if we support Java 7. Jenkins machines need to have both Java 7 and
> Java
> >>> 8 installed and we must run through a set of test suites in 7, and then
> >>> the lambda tests in Java 8. This complicates build
> environments/scripts,
> >>> and makes them less robust. Without good testing infrastructure, I have
> >>> no confidence in building good APIs for Java 8.
> >>>
> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
> >>> Java 7. The primary APIs we want users to use in Spark 2.x are
> >>> Dataset/DataFrame, and this impacts pretty much everything from machine
> >>> learning to structured streaming. We have made great progress in their
> >>> performance through extensive use of code generation. (In many
> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
> compiler
> >>> than a MapReduce or query engine.) These optimizations don't work well
> >>> in Java 7 due to broken code cache flushing. This problem has been
> fixed
> >>> by Oracle in Java 8. In addition, Java 8 comes with better support for
> >>> Unsafe and SIMD.
> >>>
> >>> 3. Scala 2.12 will come out soon, and we will want to add support for
> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
> >>> have a fairly complicated compatibility matrix and testing
> >>> infrastructure.
> >>>
> >>> 4. There are libraries that I've looked into in the past that support
> >>> only Java 8. This is more common in high performance libraries such as
> >>> Aeron (a messaging library). Having to support Java 7 means we are not
> >>> able to use these. It is not that big of a deal right now, but will
> >>> become increasingly more difficult as we optimize performance.
> >>>
> >>>
> >>> The downside of not supporting Java 7 is also obvious. Some
> >>> organizations are stuck with Java 7, and they wouldn't be able to use
> >>> Spark 2.0 without upgrading Java.
> >>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbonofre@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>


-- 
Those who say it can't be done, are usually interrupted by those doing it.

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Thu, Mar 24, 2016 at 9:54 AM, Koert Kuipers <ko...@tresata.com> wrote:
> i guess what i am saying is that in a yarn world the only hard restrictions
> left are the the containers you run in, which means the hadoop version, java
> version and python version (if you use python).

It is theoretically possible to run containers with a different JDK
than the NM (I've done it for testing), although I'm not sure about
whether that's recommended from YARN's perspective.

But I understand your concern is that you're not allowed to modify the
machines where the NMs are hosted. You could hack things and
distribute the JVM with your Spark application, but that would be
incredibly ugly.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

i think marcelo also pointed this out before. its very interesting to hear,
i was not aware of that until today. it would mean we would only have to
convince a group/client with a cluster to install jdk8 on the nodes,
without actually transitioning to it, if i understand it correctly. that
would definitely lower the hurdle by a lot.

On Thu, Mar 24, 2016 at 9:36 PM, Mridul Muralidharan <mr...@gmail.com>
wrote:

>
> Container Java version can be different from yarn Java version : we run
> jobs with jdk8 on jdk7 cluster without issues.
>
> Regards
> Mridul
>
>
> On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:
>
>> i guess what i am saying is that in a yarn world the only hard
>> restrictions left are the the containers you run in, which means the hadoop
>> version, java version and python version (if you use python).
>>
>>
>> On Thu, Mar 24, 2016 at 12:39 PM, Koert Kuipers <ko...@tresata.com>
>> wrote:
>>
>>> The group will not upgrade to spark 2.0 themselves, but they are mostly
>>> fine with vendors like us deploying our application via yarn with whatever
>>> spark version we choose (and bundle, so they do not install it separately,
>>> they might not even be aware of what spark version we use). This all works
>>> because spark does not need to be on the cluster nodes, just on the one
>>> machine where our application gets launched. Having yarn is pretty awesome
>>> in this respect.
>>>
>>> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>>> (PS CDH5 runs fine with Java 8, but I understand your more general
>>>> point.)
>>>>
>>>> This is a familiar context indeed, but in that context, would a group
>>>> not wanting to update to Java 8 want to manually put Spark 2.0 into
>>>> the mix? That is, if this is a context where the cluster is
>>>> purposefully some stable mix of components, would you be updating just
>>>> one?
>>>>
>>>> You make a good point about Scala being more library than
>>>> infrastructure component. So it can be updated on a per-app basis. On
>>>> the one hand it's harder to handle different Scala versions from the
>>>> framework side, it's less hard on the deployment side.
>>>>
>>>> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com>
>>>> wrote:
>>>> > i think the arguments are convincing, but it also makes me wonder if
>>>> i live
>>>> > in some kind of alternate universe... we deploy on customers
>>>> clusters, where
>>>> > the OS, python version, java version and hadoop distro are not chosen
>>>> by us.
>>>> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
>>>> have
>>>> > access to a single proxy machine and launch through yarn. asking them
>>>> to
>>>> > upgrade java is pretty much out of the question or a 6+ month ordeal.
>>>> of the
>>>> > 10 client clusters i can think of on the top of my head all of them
>>>> are on
>>>> > java 7, none are on java 8. so by doing this you would make spark 2
>>>> > basically unusable for us (unless most of them have plans of
>>>> upgrading in
>>>> > near term to java 8, i will ask around and report back...).
>>>> >
>>>> > on a side note, its particularly interesting to me that spark 2 chose
>>>> to
>>>> > continue support for scala 2.10, because even for us in our very
>>>> constricted
>>>> > client environments the scala version is something we can easily
>>>> upgrade (we
>>>> > just deploy a custom build of spark for the relevant scala version and
>>>> > hadoop distro). and because scala is not a dependency of any hadoop
>>>> distro
>>>> > (so not on classpath, which i am very happy about) we can use
>>>> whatever scala
>>>> > version we like. also i found the upgrade path from scala 2.10 to
>>>> 2.11 to be
>>>> > very easy, so i have a hard time understanding why anyone would stay
>>>> on
>>>> > scala 2.10. and finally with scala 2.12 around the corner you really
>>>> dont
>>>> > want to be supporting 3 versions. so clearly i am missing something
>>>> here.
>>>> >
>>>> >
>>>> >
>>>> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <
>>>> jb@nanthrax.net>
>>>> > wrote:
>>>> >>
>>>> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end
>>>> support of
>>>> >> Java 7. It makes sense.
>>>> >>
>>>> >> Regards
>>>> >> JB
>>>> >>
>>>> >>
>>>> >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>>>> >>>
>>>> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I
>>>> am
>>>> >>> wondering if we should also just drop Java 7 support in Spark 2.0
>>>> (i.e.
>>>> >>> Spark 2.0 would require Java 8 to run).
>>>> >>>
>>>> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015),
>>>> and
>>>> >>> removed public downloads for JDK 7 in July 2015. In the past I've
>>>> >>> actually been against dropping Java 8, but today I ran into an issue
>>>> >>> with the new Dataset API not working well with Java 8 lambdas, and
>>>> that
>>>> >>> changed my opinion on this.
>>>> >>>
>>>> >>> I've been thinking more about this issue today and also talked with
>>>> a
>>>> >>> lot people offline to gather feedback, and I actually think the pros
>>>> >>> outweighs the cons, for the following reasons (in some rough order
>>>> of
>>>> >>> importance):
>>>> >>>
>>>> >>> 1. It is complicated to test how well Spark APIs work for Java
>>>> lambdas
>>>> >>> if we support Java 7. Jenkins machines need to have both Java 7 and
>>>> Java
>>>> >>> 8 installed and we must run through a set of test suites in 7, and
>>>> then
>>>> >>> the lambda tests in Java 8. This complicates build
>>>> environments/scripts,
>>>> >>> and makes them less robust. Without good testing infrastructure, I
>>>> have
>>>> >>> no confidence in building good APIs for Java 8.
>>>> >>>
>>>> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
>>>> >>> Java 7. The primary APIs we want users to use in Spark 2.x are
>>>> >>> Dataset/DataFrame, and this impacts pretty much everything from
>>>> machine
>>>> >>> learning to structured streaming. We have made great progress in
>>>> their
>>>> >>> performance through extensive use of code generation. (In many
>>>> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
>>>> compiler
>>>> >>> than a MapReduce or query engine.) These optimizations don't work
>>>> well
>>>> >>> in Java 7 due to broken code cache flushing. This problem has been
>>>> fixed
>>>> >>> by Oracle in Java 8. In addition, Java 8 comes with better support
>>>> for
>>>> >>> Unsafe and SIMD.
>>>> >>>
>>>> >>> 3. Scala 2.12 will come out soon, and we will want to add support
>>>> for
>>>> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
>>>> >>> have a fairly complicated compatibility matrix and testing
>>>> >>> infrastructure.
>>>> >>>
>>>> >>> 4. There are libraries that I've looked into in the past that
>>>> support
>>>> >>> only Java 8. This is more common in high performance libraries such
>>>> as
>>>> >>> Aeron (a messaging library). Having to support Java 7 means we are
>>>> not
>>>> >>> able to use these. It is not that big of a deal right now, but will
>>>> >>> become increasingly more difficult as we optimize performance.
>>>> >>>
>>>> >>>
>>>> >>> The downside of not supporting Java 7 is also obvious. Some
>>>> >>> organizations are stuck with Java 7, and they wouldn't be able to
>>>> use
>>>> >>> Spark 2.0 without upgrading Java.
>>>> >>>
>>>> >>>
>>>> >>
>>>> >> --
>>>> >> Jean-Baptiste Onofré
>>>> >> jbonofre@apache.org
>>>> >> http://blog.nanthrax.net
>>>> >> Talend - http://www.talend.com
>>>> >>
>>>> >>
>>>> >> ---------------------------------------------------------------------
>>>> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>>> >> For additional commands, e-mail: dev-help@spark.apache.org
>>>> >>
>>>> >
>>>>
>>>
>>>
>>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Mridul Muralidharan <mr...@gmail.com>.

Container Java version can be different from yarn Java version : we run
jobs with jdk8 on jdk7 cluster without issues.

Regards
Mridul

On Thursday, March 24, 2016, Koert Kuipers <ko...@tresata.com> wrote:

> i guess what i am saying is that in a yarn world the only hard
> restrictions left are the the containers you run in, which means the hadoop
> version, java version and python version (if you use python).
>
>
> On Thu, Mar 24, 2016 at 12:39 PM, Koert Kuipers <koert@tresata.com
> <javascript:_e(%7B%7D,'cvml','koert@tresata.com');>> wrote:
>
>> The group will not upgrade to spark 2.0 themselves, but they are mostly
>> fine with vendors like us deploying our application via yarn with whatever
>> spark version we choose (and bundle, so they do not install it separately,
>> they might not even be aware of what spark version we use). This all works
>> because spark does not need to be on the cluster nodes, just on the one
>> machine where our application gets launched. Having yarn is pretty awesome
>> in this respect.
>>
>> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <sowen@cloudera.com
>> <javascript:_e(%7B%7D,'cvml','sowen@cloudera.com');>> wrote:
>>
>>> (PS CDH5 runs fine with Java 8, but I understand your more general
>>> point.)
>>>
>>> This is a familiar context indeed, but in that context, would a group
>>> not wanting to update to Java 8 want to manually put Spark 2.0 into
>>> the mix? That is, if this is a context where the cluster is
>>> purposefully some stable mix of components, would you be updating just
>>> one?
>>>
>>> You make a good point about Scala being more library than
>>> infrastructure component. So it can be updated on a per-app basis. On
>>> the one hand it's harder to handle different Scala versions from the
>>> framework side, it's less hard on the deployment side.
>>>
>>> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <koert@tresata.com
>>> <javascript:_e(%7B%7D,'cvml','koert@tresata.com');>> wrote:
>>> > i think the arguments are convincing, but it also makes me wonder if i
>>> live
>>> > in some kind of alternate universe... we deploy on customers clusters,
>>> where
>>> > the OS, python version, java version and hadoop distro are not chosen
>>> by us.
>>> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
>>> have
>>> > access to a single proxy machine and launch through yarn. asking them
>>> to
>>> > upgrade java is pretty much out of the question or a 6+ month ordeal.
>>> of the
>>> > 10 client clusters i can think of on the top of my head all of them
>>> are on
>>> > java 7, none are on java 8. so by doing this you would make spark 2
>>> > basically unusable for us (unless most of them have plans of upgrading
>>> in
>>> > near term to java 8, i will ask around and report back...).
>>> >
>>> > on a side note, its particularly interesting to me that spark 2 chose
>>> to
>>> > continue support for scala 2.10, because even for us in our very
>>> constricted
>>> > client environments the scala version is something we can easily
>>> upgrade (we
>>> > just deploy a custom build of spark for the relevant scala version and
>>> > hadoop distro). and because scala is not a dependency of any hadoop
>>> distro
>>> > (so not on classpath, which i am very happy about) we can use whatever
>>> scala
>>> > version we like. also i found the upgrade path from scala 2.10 to 2.11
>>> to be
>>> > very easy, so i have a hard time understanding why anyone would stay on
>>> > scala 2.10. and finally with scala 2.12 around the corner you really
>>> dont
>>> > want to be supporting 3 versions. so clearly i am missing something
>>> here.
>>> >
>>> >
>>> >
>>> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <jb@nanthrax.net
>>> <javascript:_e(%7B%7D,'cvml','jb@nanthrax.net');>>
>>> > wrote:
>>> >>
>>> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end
>>> support of
>>> >> Java 7. It makes sense.
>>> >>
>>> >> Regards
>>> >> JB
>>> >>
>>> >>
>>> >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>>> >>>
>>> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
>>> >>> wondering if we should also just drop Java 7 support in Spark 2.0
>>> (i.e.
>>> >>> Spark 2.0 would require Java 8 to run).
>>> >>>
>>> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
>>> >>> removed public downloads for JDK 7 in July 2015. In the past I've
>>> >>> actually been against dropping Java 8, but today I ran into an issue
>>> >>> with the new Dataset API not working well with Java 8 lambdas, and
>>> that
>>> >>> changed my opinion on this.
>>> >>>
>>> >>> I've been thinking more about this issue today and also talked with a
>>> >>> lot people offline to gather feedback, and I actually think the pros
>>> >>> outweighs the cons, for the following reasons (in some rough order of
>>> >>> importance):
>>> >>>
>>> >>> 1. It is complicated to test how well Spark APIs work for Java
>>> lambdas
>>> >>> if we support Java 7. Jenkins machines need to have both Java 7 and
>>> Java
>>> >>> 8 installed and we must run through a set of test suites in 7, and
>>> then
>>> >>> the lambda tests in Java 8. This complicates build
>>> environments/scripts,
>>> >>> and makes them less robust. Without good testing infrastructure, I
>>> have
>>> >>> no confidence in building good APIs for Java 8.
>>> >>>
>>> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
>>> >>> Java 7. The primary APIs we want users to use in Spark 2.x are
>>> >>> Dataset/DataFrame, and this impacts pretty much everything from
>>> machine
>>> >>> learning to structured streaming. We have made great progress in
>>> their
>>> >>> performance through extensive use of code generation. (In many
>>> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
>>> compiler
>>> >>> than a MapReduce or query engine.) These optimizations don't work
>>> well
>>> >>> in Java 7 due to broken code cache flushing. This problem has been
>>> fixed
>>> >>> by Oracle in Java 8. In addition, Java 8 comes with better support
>>> for
>>> >>> Unsafe and SIMD.
>>> >>>
>>> >>> 3. Scala 2.12 will come out soon, and we will want to add support for
>>> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
>>> >>> have a fairly complicated compatibility matrix and testing
>>> >>> infrastructure.
>>> >>>
>>> >>> 4. There are libraries that I've looked into in the past that support
>>> >>> only Java 8. This is more common in high performance libraries such
>>> as
>>> >>> Aeron (a messaging library). Having to support Java 7 means we are
>>> not
>>> >>> able to use these. It is not that big of a deal right now, but will
>>> >>> become increasingly more difficult as we optimize performance.
>>> >>>
>>> >>>
>>> >>> The downside of not supporting Java 7 is also obvious. Some
>>> >>> organizations are stuck with Java 7, and they wouldn't be able to use
>>> >>> Spark 2.0 without upgrading Java.
>>> >>>
>>> >>>
>>> >>
>>> >> --
>>> >> Jean-Baptiste Onofré
>>> >> jbonofre@apache.org
>>> <javascript:_e(%7B%7D,'cvml','jbonofre@apache.org');>
>>> >> http://blog.nanthrax.net
>>> >> Talend - http://www.talend.com
>>> >>
>>> >>
>>> >> ---------------------------------------------------------------------
>>> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>>> <javascript:_e(%7B%7D,'cvml','dev-unsubscribe@spark.apache.org');>
>>> >> For additional commands, e-mail: dev-help@spark.apache.org
>>> <javascript:_e(%7B%7D,'cvml','dev-help@spark.apache.org');>
>>> >>
>>> >
>>>
>>
>>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

i guess what i am saying is that in a yarn world the only hard restrictions
left are the the containers you run in, which means the hadoop version,
java version and python version (if you use python).


On Thu, Mar 24, 2016 at 12:39 PM, Koert Kuipers <ko...@tresata.com> wrote:

> The group will not upgrade to spark 2.0 themselves, but they are mostly
> fine with vendors like us deploying our application via yarn with whatever
> spark version we choose (and bundle, so they do not install it separately,
> they might not even be aware of what spark version we use). This all works
> because spark does not need to be on the cluster nodes, just on the one
> machine where our application gets launched. Having yarn is pretty awesome
> in this respect.
>
> On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com> wrote:
>
>> (PS CDH5 runs fine with Java 8, but I understand your more general point.)
>>
>> This is a familiar context indeed, but in that context, would a group
>> not wanting to update to Java 8 want to manually put Spark 2.0 into
>> the mix? That is, if this is a context where the cluster is
>> purposefully some stable mix of components, would you be updating just
>> one?
>>
>> You make a good point about Scala being more library than
>> infrastructure component. So it can be updated on a per-app basis. On
>> the one hand it's harder to handle different Scala versions from the
>> framework side, it's less hard on the deployment side.
>>
>> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> wrote:
>> > i think the arguments are convincing, but it also makes me wonder if i
>> live
>> > in some kind of alternate universe... we deploy on customers clusters,
>> where
>> > the OS, python version, java version and hadoop distro are not chosen
>> by us.
>> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
>> have
>> > access to a single proxy machine and launch through yarn. asking them to
>> > upgrade java is pretty much out of the question or a 6+ month ordeal.
>> of the
>> > 10 client clusters i can think of on the top of my head all of them are
>> on
>> > java 7, none are on java 8. so by doing this you would make spark 2
>> > basically unusable for us (unless most of them have plans of upgrading
>> in
>> > near term to java 8, i will ask around and report back...).
>> >
>> > on a side note, its particularly interesting to me that spark 2 chose to
>> > continue support for scala 2.10, because even for us in our very
>> constricted
>> > client environments the scala version is something we can easily
>> upgrade (we
>> > just deploy a custom build of spark for the relevant scala version and
>> > hadoop distro). and because scala is not a dependency of any hadoop
>> distro
>> > (so not on classpath, which i am very happy about) we can use whatever
>> scala
>> > version we like. also i found the upgrade path from scala 2.10 to 2.11
>> to be
>> > very easy, so i have a hard time understanding why anyone would stay on
>> > scala 2.10. and finally with scala 2.12 around the corner you really
>> dont
>> > want to be supporting 3 versions. so clearly i am missing something
>> here.
>> >
>> >
>> >
>> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
>> > wrote:
>> >>
>> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support
>> of
>> >> Java 7. It makes sense.
>> >>
>> >> Regards
>> >> JB
>> >>
>> >>
>> >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>> >>>
>> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
>> >>> wondering if we should also just drop Java 7 support in Spark 2.0
>> (i.e.
>> >>> Spark 2.0 would require Java 8 to run).
>> >>>
>> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
>> >>> removed public downloads for JDK 7 in July 2015. In the past I've
>> >>> actually been against dropping Java 8, but today I ran into an issue
>> >>> with the new Dataset API not working well with Java 8 lambdas, and
>> that
>> >>> changed my opinion on this.
>> >>>
>> >>> I've been thinking more about this issue today and also talked with a
>> >>> lot people offline to gather feedback, and I actually think the pros
>> >>> outweighs the cons, for the following reasons (in some rough order of
>> >>> importance):
>> >>>
>> >>> 1. It is complicated to test how well Spark APIs work for Java lambdas
>> >>> if we support Java 7. Jenkins machines need to have both Java 7 and
>> Java
>> >>> 8 installed and we must run through a set of test suites in 7, and
>> then
>> >>> the lambda tests in Java 8. This complicates build
>> environments/scripts,
>> >>> and makes them less robust. Without good testing infrastructure, I
>> have
>> >>> no confidence in building good APIs for Java 8.
>> >>>
>> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
>> >>> Java 7. The primary APIs we want users to use in Spark 2.x are
>> >>> Dataset/DataFrame, and this impacts pretty much everything from
>> machine
>> >>> learning to structured streaming. We have made great progress in their
>> >>> performance through extensive use of code generation. (In many
>> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
>> compiler
>> >>> than a MapReduce or query engine.) These optimizations don't work well
>> >>> in Java 7 due to broken code cache flushing. This problem has been
>> fixed
>> >>> by Oracle in Java 8. In addition, Java 8 comes with better support for
>> >>> Unsafe and SIMD.
>> >>>
>> >>> 3. Scala 2.12 will come out soon, and we will want to add support for
>> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
>> >>> have a fairly complicated compatibility matrix and testing
>> >>> infrastructure.
>> >>>
>> >>> 4. There are libraries that I've looked into in the past that support
>> >>> only Java 8. This is more common in high performance libraries such as
>> >>> Aeron (a messaging library). Having to support Java 7 means we are not
>> >>> able to use these. It is not that big of a deal right now, but will
>> >>> become increasingly more difficult as we optimize performance.
>> >>>
>> >>>
>> >>> The downside of not supporting Java 7 is also obvious. Some
>> >>> organizations are stuck with Java 7, and they wouldn't be able to use
>> >>> Spark 2.0 without upgrading Java.
>> >>>
>> >>>
>> >>
>> >> --
>> >> Jean-Baptiste Onofré
>> >> jbonofre@apache.org
>> >> http://blog.nanthrax.net
>> >> Talend - http://www.talend.com
>> >>
>> >>
>> >> ---------------------------------------------------------------------
>> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> >> For additional commands, e-mail: dev-help@spark.apache.org
>> >>
>> >
>>
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

The group will not upgrade to spark 2.0 themselves, but they are mostly
fine with vendors like us deploying our application via yarn with whatever
spark version we choose (and bundle, so they do not install it separately,
they might not even be aware of what spark version we use). This all works
because spark does not need to be on the cluster nodes, just on the one
machine where our application gets launched. Having yarn is pretty awesome
in this respect.

On Thu, Mar 24, 2016 at 12:25 PM, Sean Owen <so...@cloudera.com> wrote:

> (PS CDH5 runs fine with Java 8, but I understand your more general point.)
>
> This is a familiar context indeed, but in that context, would a group
> not wanting to update to Java 8 want to manually put Spark 2.0 into
> the mix? That is, if this is a context where the cluster is
> purposefully some stable mix of components, would you be updating just
> one?
>
> You make a good point about Scala being more library than
> infrastructure component. So it can be updated on a per-app basis. On
> the one hand it's harder to handle different Scala versions from the
> framework side, it's less hard on the deployment side.
>
> On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> wrote:
> > i think the arguments are convincing, but it also makes me wonder if i
> live
> > in some kind of alternate universe... we deploy on customers clusters,
> where
> > the OS, python version, java version and hadoop distro are not chosen by
> us.
> > so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply have
> > access to a single proxy machine and launch through yarn. asking them to
> > upgrade java is pretty much out of the question or a 6+ month ordeal. of
> the
> > 10 client clusters i can think of on the top of my head all of them are
> on
> > java 7, none are on java 8. so by doing this you would make spark 2
> > basically unusable for us (unless most of them have plans of upgrading in
> > near term to java 8, i will ask around and report back...).
> >
> > on a side note, its particularly interesting to me that spark 2 chose to
> > continue support for scala 2.10, because even for us in our very
> constricted
> > client environments the scala version is something we can easily upgrade
> (we
> > just deploy a custom build of spark for the relevant scala version and
> > hadoop distro). and because scala is not a dependency of any hadoop
> distro
> > (so not on classpath, which i am very happy about) we can use whatever
> scala
> > version we like. also i found the upgrade path from scala 2.10 to 2.11
> to be
> > very easy, so i have a hard time understanding why anyone would stay on
> > scala 2.10. and finally with scala 2.12 around the corner you really dont
> > want to be supporting 3 versions. so clearly i am missing something here.
> >
> >
> >
> > On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> > wrote:
> >>
> >> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support
> of
> >> Java 7. It makes sense.
> >>
> >> Regards
> >> JB
> >>
> >>
> >> On 03/24/2016 08:27 AM, Reynold Xin wrote:
> >>>
> >>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> >>> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> >>> Spark 2.0 would require Java 8 to run).
> >>>
> >>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> >>> removed public downloads for JDK 7 in July 2015. In the past I've
> >>> actually been against dropping Java 8, but today I ran into an issue
> >>> with the new Dataset API not working well with Java 8 lambdas, and that
> >>> changed my opinion on this.
> >>>
> >>> I've been thinking more about this issue today and also talked with a
> >>> lot people offline to gather feedback, and I actually think the pros
> >>> outweighs the cons, for the following reasons (in some rough order of
> >>> importance):
> >>>
> >>> 1. It is complicated to test how well Spark APIs work for Java lambdas
> >>> if we support Java 7. Jenkins machines need to have both Java 7 and
> Java
> >>> 8 installed and we must run through a set of test suites in 7, and then
> >>> the lambda tests in Java 8. This complicates build
> environments/scripts,
> >>> and makes them less robust. Without good testing infrastructure, I have
> >>> no confidence in building good APIs for Java 8.
> >>>
> >>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
> >>> Java 7. The primary APIs we want users to use in Spark 2.x are
> >>> Dataset/DataFrame, and this impacts pretty much everything from machine
> >>> learning to structured streaming. We have made great progress in their
> >>> performance through extensive use of code generation. (In many
> >>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a
> compiler
> >>> than a MapReduce or query engine.) These optimizations don't work well
> >>> in Java 7 due to broken code cache flushing. This problem has been
> fixed
> >>> by Oracle in Java 8. In addition, Java 8 comes with better support for
> >>> Unsafe and SIMD.
> >>>
> >>> 3. Scala 2.12 will come out soon, and we will want to add support for
> >>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
> >>> have a fairly complicated compatibility matrix and testing
> >>> infrastructure.
> >>>
> >>> 4. There are libraries that I've looked into in the past that support
> >>> only Java 8. This is more common in high performance libraries such as
> >>> Aeron (a messaging library). Having to support Java 7 means we are not
> >>> able to use these. It is not that big of a deal right now, but will
> >>> become increasingly more difficult as we optimize performance.
> >>>
> >>>
> >>> The downside of not supporting Java 7 is also obvious. Some
> >>> organizations are stuck with Java 7, and they wouldn't be able to use
> >>> Spark 2.0 without upgrading Java.
> >>>
> >>>
> >>
> >> --
> >> Jean-Baptiste Onofré
> >> jbonofre@apache.org
> >> http://blog.nanthrax.net
> >> Talend - http://www.talend.com
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> >> For additional commands, e-mail: dev-help@spark.apache.org
> >>
> >
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Sean Owen <so...@cloudera.com>.

(PS CDH5 runs fine with Java 8, but I understand your more general point.)

This is a familiar context indeed, but in that context, would a group
not wanting to update to Java 8 want to manually put Spark 2.0 into
the mix? That is, if this is a context where the cluster is
purposefully some stable mix of components, would you be updating just
one?

You make a good point about Scala being more library than
infrastructure component. So it can be updated on a per-app basis. On
the one hand it's harder to handle different Scala versions from the
framework side, it's less hard on the deployment side.

On Thu, Mar 24, 2016 at 4:27 PM, Koert Kuipers <ko...@tresata.com> wrote:
> i think the arguments are convincing, but it also makes me wonder if i live
> in some kind of alternate universe... we deploy on customers clusters, where
> the OS, python version, java version and hadoop distro are not chosen by us.
> so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply have
> access to a single proxy machine and launch through yarn. asking them to
> upgrade java is pretty much out of the question or a 6+ month ordeal. of the
> 10 client clusters i can think of on the top of my head all of them are on
> java 7, none are on java 8. so by doing this you would make spark 2
> basically unusable for us (unless most of them have plans of upgrading in
> near term to java 8, i will ask around and report back...).
>
> on a side note, its particularly interesting to me that spark 2 chose to
> continue support for scala 2.10, because even for us in our very constricted
> client environments the scala version is something we can easily upgrade (we
> just deploy a custom build of spark for the relevant scala version and
> hadoop distro). and because scala is not a dependency of any hadoop distro
> (so not on classpath, which i am very happy about) we can use whatever scala
> version we like. also i found the upgrade path from scala 2.10 to 2.11 to be
> very easy, so i have a hard time understanding why anyone would stay on
> scala 2.10. and finally with scala 2.12 around the corner you really dont
> want to be supporting 3 versions. so clearly i am missing something here.
>
>
>
> On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
> wrote:
>>
>> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support of
>> Java 7. It makes sense.
>>
>> Regards
>> JB
>>
>>
>> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>>>
>>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
>>> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
>>> Spark 2.0 would require Java 8 to run).
>>>
>>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
>>> removed public downloads for JDK 7 in July 2015. In the past I've
>>> actually been against dropping Java 8, but today I ran into an issue
>>> with the new Dataset API not working well with Java 8 lambdas, and that
>>> changed my opinion on this.
>>>
>>> I've been thinking more about this issue today and also talked with a
>>> lot people offline to gather feedback, and I actually think the pros
>>> outweighs the cons, for the following reasons (in some rough order of
>>> importance):
>>>
>>> 1. It is complicated to test how well Spark APIs work for Java lambdas
>>> if we support Java 7. Jenkins machines need to have both Java 7 and Java
>>> 8 installed and we must run through a set of test suites in 7, and then
>>> the lambda tests in Java 8. This complicates build environments/scripts,
>>> and makes them less robust. Without good testing infrastructure, I have
>>> no confidence in building good APIs for Java 8.
>>>
>>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
>>> Java 7. The primary APIs we want users to use in Spark 2.x are
>>> Dataset/DataFrame, and this impacts pretty much everything from machine
>>> learning to structured streaming. We have made great progress in their
>>> performance through extensive use of code generation. (In many
>>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler
>>> than a MapReduce or query engine.) These optimizations don't work well
>>> in Java 7 due to broken code cache flushing. This problem has been fixed
>>> by Oracle in Java 8. In addition, Java 8 comes with better support for
>>> Unsafe and SIMD.
>>>
>>> 3. Scala 2.12 will come out soon, and we will want to add support for
>>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
>>> have a fairly complicated compatibility matrix and testing
>>> infrastructure.
>>>
>>> 4. There are libraries that I've looked into in the past that support
>>> only Java 8. This is more common in high performance libraries such as
>>> Aeron (a messaging library). Having to support Java 7 means we are not
>>> able to use these. It is not that big of a deal right now, but will
>>> become increasingly more difficult as we optimize performance.
>>>
>>>
>>> The downside of not supporting Java 7 is also obvious. Some
>>> organizations are stuck with Java 7, and they wouldn't be able to use
>>> Spark 2.0 without upgrading Java.
>>>
>>>
>>
>> --
>> Jean-Baptiste Onofré
>> jbonofre@apache.org
>> http://blog.nanthrax.net
>> Talend - http://www.talend.com
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
>> For additional commands, e-mail: dev-help@spark.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Koert Kuipers <ko...@tresata.com>.

i think the arguments are convincing, but it also makes me wonder if i live
in some kind of alternate universe... we deploy on customers clusters,
where the OS, python version, java version and hadoop distro are not chosen
by us. so think centos 6, cdh5 or hdp 2.3, java 7 and python 2.6. we simply
have access to a single proxy machine and launch through yarn. asking them
to upgrade java is pretty much out of the question or a 6+ month ordeal. of
the 10 client clusters i can think of on the top of my head all of them are
on java 7, none are on java 8. so by doing this you would make spark 2
basically unusable for us (unless most of them have plans of upgrading in
near term to java 8, i will ask around and report back...).

on a side note, its particularly interesting to me that spark 2 chose to
continue support for scala 2.10, because even for us in our very
constricted client environments the scala version is something we can
easily upgrade (we just deploy a custom build of spark for the relevant
scala version and hadoop distro). and because scala is not a dependency of
any hadoop distro (so not on classpath, which i am very happy about) we can
use whatever scala version we like. also i found the upgrade path from
scala 2.10 to 2.11 to be very easy, so i have a hard time understanding why
anyone would stay on scala 2.10. and finally with scala 2.12 around the
corner you really dont want to be supporting 3 versions. so clearly i am
missing something here.

On Thu, Mar 24, 2016 at 8:52 AM, Jean-Baptiste Onofré <jb...@nanthrax.net>
wrote:

> +1 to support Java 8 (and future) *only* in Spark 2.0, and end support of
> Java 7. It makes sense.
>
> Regards
> JB
>
>
> On 03/24/2016 08:27 AM, Reynold Xin wrote:
>
>> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
>> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
>> Spark 2.0 would require Java 8 to run).
>>
>> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
>> removed public downloads for JDK 7 in July 2015. In the past I've
>> actually been against dropping Java 8, but today I ran into an issue
>> with the new Dataset API not working well with Java 8 lambdas, and that
>> changed my opinion on this.
>>
>> I've been thinking more about this issue today and also talked with a
>> lot people offline to gather feedback, and I actually think the pros
>> outweighs the cons, for the following reasons (in some rough order of
>> importance):
>>
>> 1. It is complicated to test how well Spark APIs work for Java lambdas
>> if we support Java 7. Jenkins machines need to have both Java 7 and Java
>> 8 installed and we must run through a set of test suites in 7, and then
>> the lambda tests in Java 8. This complicates build environments/scripts,
>> and makes them less robust. Without good testing infrastructure, I have
>> no confidence in building good APIs for Java 8.
>>
>> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
>> Java 7. The primary APIs we want users to use in Spark 2.x are
>> Dataset/DataFrame, and this impacts pretty much everything from machine
>> learning to structured streaming. We have made great progress in their
>> performance through extensive use of code generation. (In many
>> dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler
>> than a MapReduce or query engine.) These optimizations don't work well
>> in Java 7 due to broken code cache flushing. This problem has been fixed
>> by Oracle in Java 8. In addition, Java 8 comes with better support for
>> Unsafe and SIMD.
>>
>> 3. Scala 2.12 will come out soon, and we will want to add support for
>> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
>> have a fairly complicated compatibility matrix and testing infrastructure.
>>
>> 4. There are libraries that I've looked into in the past that support
>> only Java 8. This is more common in high performance libraries such as
>> Aeron (a messaging library). Having to support Java 7 means we are not
>> able to use these. It is not that big of a deal right now, but will
>> become increasingly more difficult as we optimize performance.
>>
>>
>> The downside of not supporting Java 7 is also obvious. Some
>> organizations are stuck with Java 7, and they wouldn't be able to use
>> Spark 2.0 without upgrading Java.
>>
>>
>>
> --
> Jean-Baptiste Onofré
> jbonofre@apache.org
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
> For additional commands, e-mail: dev-help@spark.apache.org
>
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Jean-Baptiste Onofré <jb...@nanthrax.net>.

+1 to support Java 8 (and future) *only* in Spark 2.0, and end support 
of Java 7. It makes sense.

Regards
JB

On 03/24/2016 08:27 AM, Reynold Xin wrote:
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> Spark 2.0 would require Java 8 to run).
>
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> removed public downloads for JDK 7 in July 2015. In the past I've
> actually been against dropping Java 8, but today I ran into an issue
> with the new Dataset API not working well with Java 8 lambdas, and that
> changed my opinion on this.
>
> I've been thinking more about this issue today and also talked with a
> lot people offline to gather feedback, and I actually think the pros
> outweighs the cons, for the following reasons (in some rough order of
> importance):
>
> 1. It is complicated to test how well Spark APIs work for Java lambdas
> if we support Java 7. Jenkins machines need to have both Java 7 and Java
> 8 installed and we must run through a set of test suites in 7, and then
> the lambda tests in Java 8. This complicates build environments/scripts,
> and makes them less robust. Without good testing infrastructure, I have
> no confidence in building good APIs for Java 8.
>
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in
> Java 7. The primary APIs we want users to use in Spark 2.x are
> Dataset/DataFrame, and this impacts pretty much everything from machine
> learning to structured streaming. We have made great progress in their
> performance through extensive use of code generation. (In many
> dimensions Spark 2.0 with DataFrames/Datasets looks more like a compiler
> than a MapReduce or query engine.) These optimizations don't work well
> in Java 7 due to broken code cache flushing. This problem has been fixed
> by Oracle in Java 8. In addition, Java 8 comes with better support for
> Unsafe and SIMD.
>
> 3. Scala 2.12 will come out soon, and we will want to add support for
> that. Scala 2.12 only works on Java 8. If we do support Java 7, we'd
> have a fairly complicated compatibility matrix and testing infrastructure.
>
> 4. There are libraries that I've looked into in the past that support
> only Java 8. This is more common in high performance libraries such as
> Aeron (a messaging library). Having to support Java 7 means we are not
> able to use these. It is not that big of a deal right now, but will
> become increasingly more difficult as we optimize performance.
>
>
> The downside of not supporting Java 7 is also obvious. Some
> organizations are stuck with Java 7, and they wouldn't be able to use
> Spark 2.0 without upgrading Java.
>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Sean Owen <so...@cloudera.com>.

Maybe so; I think we have a ticket open to update to 2.10.6, which
maybe fixes it.

It brings up a different point: supporting multiple Scala versions is
much more painful than Java versions because of mutual
incompatibility. Right now I get the sense there's an intent to keep
supporting 2.10, and 2.11, and 2.12 later in Spark 2. This seems like
relatively way more trouble. In the same breath -- why not remove 2.10
support anyway? It's also EOL, 2.11 also brought big improvements,
etc.

On Thu, Mar 24, 2016 at 9:04 AM, Reynold Xin <rx...@databricks.com> wrote:
> I actually talked quite a bit today with an engineer on the scala compiler
> team tonight and the scala 2.10 + java 8 combo should be ok. The latest
> Scala 2.10 release should have all the important fixes that are needed for
> Java 8.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Thu, Mar 24, 2016 at 10:13 AM, Reynold Xin <rx...@databricks.com> wrote:
> Yes

So is it safe to say the only hard requirements for Java 8 in your list is (4)?

(1) and (3) are infrastructure issues. Yes, it sucks to maintain more
testing infrastructure and potentially more complicated build scripts,
but does that really outweigh maintaining support for Java 7?

A cheap hack would also be to require jdk 1.8 for the build, but still
target java 7. You could then isolate java 8 tests in a separate
module that will get run in all builds because of that requirement.
There are downsides, of course: it's basically the same situation we
were in when we still supported Java 6 but were using jdk 1.7 to build
things. Setting the proper bootclasspath to use jdk 7's rt.jar during
compilation could solve a lot of those. (We already have both JDKs in
jenkins machines as far as I can tell.)

For Scala 2.12, and option might be dropping Java 7 when we decide to
add support for that (unless you're also suggesting Scala 2.12 as part
of 2.0?).

For (2) it seems the jvm used to compile things doesn't really make a
difference. It could be as simple as "we strongly recommend running
Spark 2.0 on Java 8".

Note I'm not for or against the change per se; I'd like to see more
data about what users are really using out there before making that
decision. But there was an explicit desire to maintain java 7
compatibility when we talked about going for Spark 2.0. And with those
kinds of decisions there's always a cost, including spending more
resources on infrastructure and testing.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

Yes

On Thursday, March 24, 2016, Marcelo Vanzin <va...@cloudera.com> wrote:

> On Thu, Mar 24, 2016 at 1:04 AM, Reynold Xin <rxin@databricks.com
> <javascript:;>> wrote:
> > I actually talked quite a bit today with an engineer on the scala
> compiler
> > team tonight and the scala 2.10 + java 8 combo should be ok. The latest
> > Scala 2.10 release should have all the important fixes that are needed
> for
> > Java 8.
>
> So, do you actually get the benefits you're looking for without
> compiling explicitly to the 1.8 jvm? Because:
>
> $ scala -version
> Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL
> $ scalac -target jvm-1.8
> scalac error: Usage: -target:<target>
>  where <target> choices are jvm-1.5, jvm-1.5-fjbg, jvm-1.5-asm,
> jvm-1.6, jvm-1.7, msil (default: jvm-1.6)
>
> So even if you use jdk 8 to compile with scala 2.10, you can't target
> jvm 1.8 as far as I can tell.
>
> --
> Marcelo
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Marcelo Vanzin <va...@cloudera.com>.

On Thu, Mar 24, 2016 at 1:04 AM, Reynold Xin <rx...@databricks.com> wrote:
> I actually talked quite a bit today with an engineer on the scala compiler
> team tonight and the scala 2.10 + java 8 combo should be ok. The latest
> Scala 2.10 release should have all the important fixes that are needed for
> Java 8.

So, do you actually get the benefits you're looking for without
compiling explicitly to the 1.8 jvm? Because:

$ scala -version
Scala code runner version 2.10.6 -- Copyright 2002-2013, LAMP/EPFL
$ scalac -target jvm-1.8
scalac error: Usage: -target:<target>
 where <target> choices are jvm-1.5, jvm-1.5-fjbg, jvm-1.5-asm,
jvm-1.6, jvm-1.7, msil (default: jvm-1.6)

So even if you use jdk 8 to compile with scala 2.10, you can't target
jvm 1.8 as far as I can tell.

-- 
Marcelo

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Reynold Xin <rx...@databricks.com>.

I actually talked quite a bit today with an engineer on the scala compiler
team tonight and the scala 2.10 + java 8 combo should be ok. The latest
Scala 2.10 release should have all the important fixes that are needed for
Java 8.

On Thu, Mar 24, 2016 at 1:01 AM, Sean Owen <so...@cloudera.com> wrote:

> I generally favor this for the simplification. I didn't realize there
> were actually some performance wins and important bug fixes.
>
> I've had lots of trouble with scalac 2.10 + Java 8. I don't know if
> it's still a problem since 2.11 + 8 seems OK, but for a long time the
> sql/ modules would never compile in this config. If it's actually
> required for 2.12, makes sense.
>
> As ever my general stance is that nobody has to make a major-version
> upgrade; Spark 1.6 does not stop working for those that need Java 7. I
> also think it's reasonable for anyone to expect that major-version
> upgrades require major-version dependency updates. Also remember that
> not removing Java 7 support means committing to it here for a couple
> more years. It's not just about the situation on release day.
>
> On Thu, Mar 24, 2016 at 8:27 AM, Reynold Xin <rx...@databricks.com> wrote:
> > About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> > wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> > Spark 2.0 would require Java 8 to run).
> >
> > Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> > removed public downloads for JDK 7 in July 2015. In the past I've
> actually
> > been against dropping Java 8, but today I ran into an issue with the new
> > Dataset API not working well with Java 8 lambdas, and that changed my
> > opinion on this.
> >
> > I've been thinking more about this issue today and also talked with a lot
> > people offline to gather feedback, and I actually think the pros
> outweighs
> > the cons, for the following reasons (in some rough order of importance):
> >
> > 1. It is complicated to test how well Spark APIs work for Java lambdas
> if we
> > support Java 7. Jenkins machines need to have both Java 7 and Java 8
> > installed and we must run through a set of test suites in 7, and then the
> > lambda tests in Java 8. This complicates build environments/scripts, and
> > makes them less robust. Without good testing infrastructure, I have no
> > confidence in building good APIs for Java 8.
> >
> > 2. Dataset/DataFrame performance will be between 1x to 10x slower in
> Java 7.
> > The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame,
> > and this impacts pretty much everything from machine learning to
> structured
> > streaming. We have made great progress in their performance through
> > extensive use of code generation. (In many dimensions Spark 2.0 with
> > DataFrames/Datasets looks more like a compiler than a MapReduce or query
> > engine.) These optimizations don't work well in Java 7 due to broken code
> > cache flushing. This problem has been fixed by Oracle in Java 8. In
> > addition, Java 8 comes with better support for Unsafe and SIMD.
> >
> > 3. Scala 2.12 will come out soon, and we will want to add support for
> that.
> > Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a
> fairly
> > complicated compatibility matrix and testing infrastructure.
> >
> > 4. There are libraries that I've looked into in the past that support
> only
> > Java 8. This is more common in high performance libraries such as Aeron
> (a
> > messaging library). Having to support Java 7 means we are not able to use
> > these. It is not that big of a deal right now, but will become
> increasingly
> > more difficult as we optimize performance.
> >
> >
> > The downside of not supporting Java 7 is also obvious. Some organizations
> > are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without
> > upgrading Java.
> >
> >
>

Re: [discuss] ending support for Java 7 in Spark 2.0

Posted by Sean Owen <so...@cloudera.com>.

I generally favor this for the simplification. I didn't realize there
were actually some performance wins and important bug fixes.

I've had lots of trouble with scalac 2.10 + Java 8. I don't know if
it's still a problem since 2.11 + 8 seems OK, but for a long time the
sql/ modules would never compile in this config. If it's actually
required for 2.12, makes sense.

As ever my general stance is that nobody has to make a major-version
upgrade; Spark 1.6 does not stop working for those that need Java 7. I
also think it's reasonable for anyone to expect that major-version
upgrades require major-version dependency updates. Also remember that
not removing Java 7 support means committing to it here for a couple
more years. It's not just about the situation on release day.

On Thu, Mar 24, 2016 at 8:27 AM, Reynold Xin <rx...@databricks.com> wrote:
> About a year ago we decided to drop Java 6 support in Spark 1.5. I am
> wondering if we should also just drop Java 7 support in Spark 2.0 (i.e.
> Spark 2.0 would require Java 8 to run).
>
> Oracle ended public updates for JDK 7 in one year ago (Apr 2015), and
> removed public downloads for JDK 7 in July 2015. In the past I've actually
> been against dropping Java 8, but today I ran into an issue with the new
> Dataset API not working well with Java 8 lambdas, and that changed my
> opinion on this.
>
> I've been thinking more about this issue today and also talked with a lot
> people offline to gather feedback, and I actually think the pros outweighs
> the cons, for the following reasons (in some rough order of importance):
>
> 1. It is complicated to test how well Spark APIs work for Java lambdas if we
> support Java 7. Jenkins machines need to have both Java 7 and Java 8
> installed and we must run through a set of test suites in 7, and then the
> lambda tests in Java 8. This complicates build environments/scripts, and
> makes them less robust. Without good testing infrastructure, I have no
> confidence in building good APIs for Java 8.
>
> 2. Dataset/DataFrame performance will be between 1x to 10x slower in Java 7.
> The primary APIs we want users to use in Spark 2.x are Dataset/DataFrame,
> and this impacts pretty much everything from machine learning to structured
> streaming. We have made great progress in their performance through
> extensive use of code generation. (In many dimensions Spark 2.0 with
> DataFrames/Datasets looks more like a compiler than a MapReduce or query
> engine.) These optimizations don't work well in Java 7 due to broken code
> cache flushing. This problem has been fixed by Oracle in Java 8. In
> addition, Java 8 comes with better support for Unsafe and SIMD.
>
> 3. Scala 2.12 will come out soon, and we will want to add support for that.
> Scala 2.12 only works on Java 8. If we do support Java 7, we'd have a fairly
> complicated compatibility matrix and testing infrastructure.
>
> 4. There are libraries that I've looked into in the past that support only
> Java 8. This is more common in high performance libraries such as Aeron (a
> messaging library). Having to support Java 7 means we are not able to use
> these. It is not that big of a deal right now, but will become increasingly
> more difficult as we optimize performance.
>
>
> The downside of not supporting Java 7 is also obvious. Some organizations
> are stuck with Java 7, and they wouldn't be able to use Spark 2.0 without
> upgrading Java.
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@spark.apache.org
For additional commands, e-mail: dev-help@spark.apache.org