You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sean Owen <so...@cloudera.com> on 2016/10/25 15:36:45 UTC

Straw poll: dropping support for things like Scala 2.10

I'd like to gauge where people stand on the issue of dropping support for a
few things that were considered for 2.0.

First: Scala 2.10. We've seen a number of build breakages this week because
the PR builder only tests 2.11. No big deal at this stage, but, it did
cause me to wonder whether it's time to plan to drop 2.10 support,
especially with 2.12 coming soon.

Next, Java 7. It's reasonably old and out of public updates at this stage.
It's not that painful to keep supporting, to be honest. It would simplify
some bits of code, some scripts, some testing.

Hadoop versions: I think the the general argument is that most anyone would
be using, at the least, 2.6, and it would simplify some code that has to
reflect to use not-even-that-new APIs. It would remove some moderate
complexity in the build.


"When" is a tricky question. Although it's a little aggressive for minor
releases, I think these will all happen before 3.x regardless. 2.1.0 is not
out of the question, though coming soon. What about ... 2.2.0?


Although I tend to favor dropping support, I'm mostly asking for current
opinions.

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Koert Kuipers <ko...@tresata.com>.
it will take time before all libraries that spark depends on are available
for scala 2.12, so we are not talking spark 2.1.x and probably also not
2.2.x for scala 2.12

it technically makes sense to drop java 7 and scala 2.10 around the same
time as scala 2.12 is introduced

we are still heavily dependent on java 7 (and python 2.6 if we used python
but we dont). i am surprised to see new clusters installed in last few
months (CDH and HDP latest versions) to still be running on java 7. even
getting java 8 installed on these clusters so we can use them in yarn is
often not an option. it beats me as to why this is still happening.

we do not use scala 2.10 at all anymore.

On Tue, Oct 25, 2016 at 12:31 PM, Ofir Manor <of...@equalum.io> wrote:

> I think that 2.1 should include a visible deprecation message about Java
> 7, Scala 2.10 and older Hadoop versions (plus python if there is a
> consensus on that), to give users / admins early warning, followed by
> dropping them from trunk for 2.2 once 2.1 is released.
> Personally, we use only Scala 2.11 on JDK8.
> Cody - Scala 2.12 will likely be released before Spark 2.1, maybe even
> later this week: http://scala-lang.org/news/2.12.0-RC2
>
> Ofir Manor
>
> Co-Founder & CTO | Equalum
>
> Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io
>
> On Tue, Oct 25, 2016 at 7:28 PM, Cody Koeninger <co...@koeninger.org>
> wrote:
>
>> I think only supporting 1 version of scala at any given time is not
>> sufficient, 2 probably is ok.
>>
>> I.e. don't drop 2.10 before 2.12 is out + supported
>>
>> On Tue, Oct 25, 2016 at 10:56 AM, Sean Owen <so...@cloudera.com> wrote:
>> > The general forces are that new versions of things to support emerge,
>> and
>> > are valuable to support, but have some cost to support in addition to
>> old
>> > versions. And the old versions become less used and therefore less
>> valuable
>> > to support, and at some point it tips to being more cost than value.
>> It's
>> > hard to judge these costs and benefits.
>> >
>> > Scala is perhaps the trickiest one because of the general mutual
>> > incompatibilities across minor versions. The cost of supporting multiple
>> > versions is high, and a third version is about to arrive. That's
>> probably
>> > the most pressing question. It's actually biting with some regularity
>> now,
>> > with compile errors on 2.10.
>> >
>> > (Python I confess I don't have an informed opinion about.)
>> >
>> > Java, Hadoop are not as urgent because they're more
>> backwards-compatible.
>> > Anecdotally, I'd be surprised if anyone today would "upgrade" to Java 7
>> or
>> > an old Hadoop version. And I think that's really the question. Even if
>> one
>> > decided to drop support for all this in 2.1.0, it would not mean people
>> > can't use Spark with these things. It merely means they can't
>> necessarily
>> > use Spark 2.1.x. This is why we have maintenance branches for 1.6.x,
>> 2.0.x.
>> >
>> > Tying Scala 2.11/12 support to Java 8 might make sense.
>> >
>> > In fact, I think that's part of the reason that an update in master,
>> perhaps
>> > 2.1.x, could be overdue, because it actually is just the beginning of
>> the
>> > end of the support burden. If you want to stop dealing with these in ~6
>> > months they need to stop being supported in minor branches by right
>> about
>> > now.
>> >
>> >
>> >
>> >
>> > On Tue, Oct 25, 2016 at 4:47 PM Mark Hamstra <ma...@clearstorydata.com>
>> > wrote:
>> >>
>> >> What's changed since the last time we discussed these issues, about 7
>> >> months ago?  Or, another way to formulate the question: What are the
>> >> threshold criteria that we should use to decide when to end Scala 2.10
>> >> and/or Java 7 support?
>> >>
>> >> On Tue, Oct 25, 2016 at 8:36 AM, Sean Owen <so...@cloudera.com> wrote:
>> >>>
>> >>> I'd like to gauge where people stand on the issue of dropping support
>> for
>> >>> a few things that were considered for 2.0.
>> >>>
>> >>> First: Scala 2.10. We've seen a number of build breakages this week
>> >>> because the PR builder only tests 2.11. No big deal at this stage,
>> but, it
>> >>> did cause me to wonder whether it's time to plan to drop 2.10 support,
>> >>> especially with 2.12 coming soon.
>> >>>
>> >>> Next, Java 7. It's reasonably old and out of public updates at this
>> >>> stage. It's not that painful to keep supporting, to be honest. It
>> would
>> >>> simplify some bits of code, some scripts, some testing.
>> >>>
>> >>> Hadoop versions: I think the the general argument is that most anyone
>> >>> would be using, at the least, 2.6, and it would simplify some code
>> that has
>> >>> to reflect to use not-even-that-new APIs. It would remove some
>> moderate
>> >>> complexity in the build.
>> >>>
>> >>>
>> >>> "When" is a tricky question. Although it's a little aggressive for
>> minor
>> >>> releases, I think these will all happen before 3.x regardless. 2.1.0
>> is not
>> >>> out of the question, though coming soon. What about ... 2.2.0?
>> >>>
>> >>>
>> >>> Although I tend to favor dropping support, I'm mostly asking for
>> current
>> >>> opinions.
>> >>
>> >>
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>>
>>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Daniel Siegmann <ds...@securityscorecard.io>.
After support is dropped for Java 7, can we have encoders for java.time
classes (e.g. LocalDate)? If so, then please drop support for Java 7 ASAP.
:-)

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Ofir Manor <of...@equalum.io>.
I think that 2.1 should include a visible deprecation message about Java 7,
Scala 2.10 and older Hadoop versions (plus python if there is a consensus
on that), to give users / admins early warning, followed by dropping them
from trunk for 2.2 once 2.1 is released.
Personally, we use only Scala 2.11 on JDK8.
Cody - Scala 2.12 will likely be released before Spark 2.1, maybe even
later this week: http://scala-lang.org/news/2.12.0-RC2

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io

On Tue, Oct 25, 2016 at 7:28 PM, Cody Koeninger <co...@koeninger.org> wrote:

> I think only supporting 1 version of scala at any given time is not
> sufficient, 2 probably is ok.
>
> I.e. don't drop 2.10 before 2.12 is out + supported
>
> On Tue, Oct 25, 2016 at 10:56 AM, Sean Owen <so...@cloudera.com> wrote:
> > The general forces are that new versions of things to support emerge, and
> > are valuable to support, but have some cost to support in addition to old
> > versions. And the old versions become less used and therefore less
> valuable
> > to support, and at some point it tips to being more cost than value. It's
> > hard to judge these costs and benefits.
> >
> > Scala is perhaps the trickiest one because of the general mutual
> > incompatibilities across minor versions. The cost of supporting multiple
> > versions is high, and a third version is about to arrive. That's probably
> > the most pressing question. It's actually biting with some regularity
> now,
> > with compile errors on 2.10.
> >
> > (Python I confess I don't have an informed opinion about.)
> >
> > Java, Hadoop are not as urgent because they're more backwards-compatible.
> > Anecdotally, I'd be surprised if anyone today would "upgrade" to Java 7
> or
> > an old Hadoop version. And I think that's really the question. Even if
> one
> > decided to drop support for all this in 2.1.0, it would not mean people
> > can't use Spark with these things. It merely means they can't necessarily
> > use Spark 2.1.x. This is why we have maintenance branches for 1.6.x,
> 2.0.x.
> >
> > Tying Scala 2.11/12 support to Java 8 might make sense.
> >
> > In fact, I think that's part of the reason that an update in master,
> perhaps
> > 2.1.x, could be overdue, because it actually is just the beginning of the
> > end of the support burden. If you want to stop dealing with these in ~6
> > months they need to stop being supported in minor branches by right about
> > now.
> >
> >
> >
> >
> > On Tue, Oct 25, 2016 at 4:47 PM Mark Hamstra <ma...@clearstorydata.com>
> > wrote:
> >>
> >> What's changed since the last time we discussed these issues, about 7
> >> months ago?  Or, another way to formulate the question: What are the
> >> threshold criteria that we should use to decide when to end Scala 2.10
> >> and/or Java 7 support?
> >>
> >> On Tue, Oct 25, 2016 at 8:36 AM, Sean Owen <so...@cloudera.com> wrote:
> >>>
> >>> I'd like to gauge where people stand on the issue of dropping support
> for
> >>> a few things that were considered for 2.0.
> >>>
> >>> First: Scala 2.10. We've seen a number of build breakages this week
> >>> because the PR builder only tests 2.11. No big deal at this stage,
> but, it
> >>> did cause me to wonder whether it's time to plan to drop 2.10 support,
> >>> especially with 2.12 coming soon.
> >>>
> >>> Next, Java 7. It's reasonably old and out of public updates at this
> >>> stage. It's not that painful to keep supporting, to be honest. It would
> >>> simplify some bits of code, some scripts, some testing.
> >>>
> >>> Hadoop versions: I think the the general argument is that most anyone
> >>> would be using, at the least, 2.6, and it would simplify some code
> that has
> >>> to reflect to use not-even-that-new APIs. It would remove some moderate
> >>> complexity in the build.
> >>>
> >>>
> >>> "When" is a tricky question. Although it's a little aggressive for
> minor
> >>> releases, I think these will all happen before 3.x regardless. 2.1.0
> is not
> >>> out of the question, though coming soon. What about ... 2.2.0?
> >>>
> >>>
> >>> Although I tend to favor dropping support, I'm mostly asking for
> current
> >>> opinions.
> >>
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Cody Koeninger <co...@koeninger.org>.
I think only supporting 1 version of scala at any given time is not
sufficient, 2 probably is ok.

I.e. don't drop 2.10 before 2.12 is out + supported

On Tue, Oct 25, 2016 at 10:56 AM, Sean Owen <so...@cloudera.com> wrote:
> The general forces are that new versions of things to support emerge, and
> are valuable to support, but have some cost to support in addition to old
> versions. And the old versions become less used and therefore less valuable
> to support, and at some point it tips to being more cost than value. It's
> hard to judge these costs and benefits.
>
> Scala is perhaps the trickiest one because of the general mutual
> incompatibilities across minor versions. The cost of supporting multiple
> versions is high, and a third version is about to arrive. That's probably
> the most pressing question. It's actually biting with some regularity now,
> with compile errors on 2.10.
>
> (Python I confess I don't have an informed opinion about.)
>
> Java, Hadoop are not as urgent because they're more backwards-compatible.
> Anecdotally, I'd be surprised if anyone today would "upgrade" to Java 7 or
> an old Hadoop version. And I think that's really the question. Even if one
> decided to drop support for all this in 2.1.0, it would not mean people
> can't use Spark with these things. It merely means they can't necessarily
> use Spark 2.1.x. This is why we have maintenance branches for 1.6.x, 2.0.x.
>
> Tying Scala 2.11/12 support to Java 8 might make sense.
>
> In fact, I think that's part of the reason that an update in master, perhaps
> 2.1.x, could be overdue, because it actually is just the beginning of the
> end of the support burden. If you want to stop dealing with these in ~6
> months they need to stop being supported in minor branches by right about
> now.
>
>
>
>
> On Tue, Oct 25, 2016 at 4:47 PM Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>>
>> What's changed since the last time we discussed these issues, about 7
>> months ago?  Or, another way to formulate the question: What are the
>> threshold criteria that we should use to decide when to end Scala 2.10
>> and/or Java 7 support?
>>
>> On Tue, Oct 25, 2016 at 8:36 AM, Sean Owen <so...@cloudera.com> wrote:
>>>
>>> I'd like to gauge where people stand on the issue of dropping support for
>>> a few things that were considered for 2.0.
>>>
>>> First: Scala 2.10. We've seen a number of build breakages this week
>>> because the PR builder only tests 2.11. No big deal at this stage, but, it
>>> did cause me to wonder whether it's time to plan to drop 2.10 support,
>>> especially with 2.12 coming soon.
>>>
>>> Next, Java 7. It's reasonably old and out of public updates at this
>>> stage. It's not that painful to keep supporting, to be honest. It would
>>> simplify some bits of code, some scripts, some testing.
>>>
>>> Hadoop versions: I think the the general argument is that most anyone
>>> would be using, at the least, 2.6, and it would simplify some code that has
>>> to reflect to use not-even-that-new APIs. It would remove some moderate
>>> complexity in the build.
>>>
>>>
>>> "When" is a tricky question. Although it's a little aggressive for minor
>>> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
>>> out of the question, though coming soon. What about ... 2.2.0?
>>>
>>>
>>> Although I tend to favor dropping support, I'm mostly asking for current
>>> opinions.
>>
>>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Straw poll: dropping support for things like Scala 2.10

Posted by Sean Owen <so...@cloudera.com>.
The general forces are that new versions of things to support emerge, and
are valuable to support, but have some cost to support in addition to old
versions. And the old versions become less used and therefore less valuable
to support, and at some point it tips to being more cost than value. It's
hard to judge these costs and benefits.

Scala is perhaps the trickiest one because of the general mutual
incompatibilities across minor versions. The cost of supporting multiple
versions is high, and a third version is about to arrive. That's probably
the most pressing question. It's actually biting with some regularity now,
with compile errors on 2.10.

(Python I confess I don't have an informed opinion about.)

Java, Hadoop are not as urgent because they're more backwards-compatible.
Anecdotally, I'd be surprised if anyone today would "upgrade" to Java 7 or
an old Hadoop version. And I think that's really the question. Even if one
decided to drop support for all this in 2.1.0, it would not mean people
can't use Spark with these things. It merely means they can't necessarily
use Spark 2.1.x. This is why we have maintenance branches for 1.6.x, 2.0.x.

Tying Scala 2.11/12 support to Java 8 might make sense.

In fact, I think that's part of the reason that an update in master,
perhaps 2.1.x, could be overdue, because it actually is just the beginning
of the end of the support burden. If you want to stop dealing with these in
~6 months they need to stop being supported in minor branches by right
about now.




On Tue, Oct 25, 2016 at 4:47 PM Mark Hamstra <ma...@clearstorydata.com>
wrote:

> What's changed since the last time we discussed these issues, about 7
> months ago?  Or, another way to formulate the question: What are the
> threshold criteria that we should use to decide when to end Scala 2.10
> and/or Java 7 support?
>
> On Tue, Oct 25, 2016 at 8:36 AM, Sean Owen <so...@cloudera.com> wrote:
>
> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Mark Hamstra <ma...@clearstorydata.com>.
What's changed since the last time we discussed these issues, about 7
months ago?  Or, another way to formulate the question: What are the
threshold criteria that we should use to decide when to end Scala 2.10
and/or Java 7 support?

On Tue, Oct 25, 2016 at 8:36 AM, Sean Owen <so...@cloudera.com> wrote:

> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Sean Owen <so...@cloudera.com>.
Let's track further discussion at
https://issues.apache.org/jira/browse/SPARK-19810

I am also in favor of removing Scala 2.10 support, and will open a WIP to
discuss the change, but am not yet sure whether there are objections or
deeper support for this.

On Thu, Mar 2, 2017 at 7:51 PM Russell Spitzer <ru...@gmail.com>
wrote:

> +1 on removing 2.10
>
>
> On Thu, Mar 2, 2017 at 8:51 AM Koert Kuipers <ko...@tresata.com> wrote:
>
> given the issues with scala 2.10 and java 8 i am in favor of dropping
> scala 2.10 in next release
>
> On Sat, Feb 25, 2017 at 2:10 PM, Sean Owen <so...@cloudera.com> wrote:
>
> I want to bring up the issue of Scala 2.10 support again, to see how
> people feel about it. Key opinions from the previous responses, I think:
>
> Cody: only drop 2.10 support when 2.12 support is added
> Koert: we need all dependencies to support 2.12; Scala updates are pretty
> transparent to IT/ops
> Ofir: make sure to deprecate 2.10 in Spark 2.1
> Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
> Matei: let’s not remove things unless they’re burdensome for the project;
> some people are still on old environments that their IT can’t easily update
>
> Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support
> for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the
> work to support 2.12, and there is progress, especially in dependencies
> supporting 2.12.
>
> It looks like 2.12 support may even entail a breaking change as documented
> in https://issues.apache.org/jira/browse/SPARK-14643 and will mean
> dropping Kafka 0.8, for example. In any event it’s going to take some
> surgery and a few hacks to make one code base work across 2.11 and 2.12. I
> don’t see this happening for Spark 2.2.0 because there are just a few weeks
> left.
>
> Supporting three versions at once is probably infeasible, so dropping 2.10
> should precede 2.12 support. Right now, I would like to make progress
> towards changes that 2.12 will require but that 2.11/2.10 can support. For
> example, we have to update scalatest, breeze, chill, etc. and can do that
> before 2.12 is enabled. However I’m finding making those changes tricky or
> maybe impossible in one case while 2.10 is still supported.
>
> For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10
> support, and even get in additional prep work for 2.12, into the 2.2.0
> release. The move to support 2.12 in 2.3.0 would then be a smaller change.
> It isn’t strictly necessary. We could delay all of that until after 2.2.0
> and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy
> enough at this stage to drop for Spark 2.2.0?
>
> I don’t feel strongly about it but there are some reasonable arguments for
> dropping it:
>
> - 2.10 doesn’t technically support Java 8, though we do have it working
> still even after requiring Java 8
> - Safe to say virtually all common _2.10 libraries has a _2.11 counterpart
> at this point?
> - 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
> - For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x
>
> Before I open a JIRA, just soliciting opinions.
>
>
> On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <so...@cloudera.com> wrote:
>
> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Russell Spitzer <ru...@gmail.com>.
+1 on removing 2.10


On Thu, Mar 2, 2017 at 8:51 AM Koert Kuipers <ko...@tresata.com> wrote:

given the issues with scala 2.10 and java 8 i am in favor of dropping scala
2.10 in next release

On Sat, Feb 25, 2017 at 2:10 PM, Sean Owen <so...@cloudera.com> wrote:

I want to bring up the issue of Scala 2.10 support again, to see how people
feel about it. Key opinions from the previous responses, I think:

Cody: only drop 2.10 support when 2.12 support is added
Koert: we need all dependencies to support 2.12; Scala updates are pretty
transparent to IT/ops
Ofir: make sure to deprecate 2.10 in Spark 2.1
Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
Matei: let’s not remove things unless they’re burdensome for the project;
some people are still on old environments that their IT can’t easily update

Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support
for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the work
to support 2.12, and there is progress, especially in dependencies
supporting 2.12.

It looks like 2.12 support may even entail a breaking change as documented
in https://issues.apache.org/jira/browse/SPARK-14643 and will mean dropping
Kafka 0.8, for example. In any event it’s going to take some surgery and a
few hacks to make one code base work across 2.11 and 2.12. I don’t see this
happening for Spark 2.2.0 because there are just a few weeks left.

Supporting three versions at once is probably infeasible, so dropping 2.10
should precede 2.12 support. Right now, I would like to make progress
towards changes that 2.12 will require but that 2.11/2.10 can support. For
example, we have to update scalatest, breeze, chill, etc. and can do that
before 2.12 is enabled. However I’m finding making those changes tricky or
maybe impossible in one case while 2.10 is still supported.

For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10
support, and even get in additional prep work for 2.12, into the 2.2.0
release. The move to support 2.12 in 2.3.0 would then be a smaller change.
It isn’t strictly necessary. We could delay all of that until after 2.2.0
and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy
enough at this stage to drop for Spark 2.2.0?

I don’t feel strongly about it but there are some reasonable arguments for
dropping it:

- 2.10 doesn’t technically support Java 8, though we do have it working
still even after requiring Java 8
- Safe to say virtually all common _2.10 libraries has a _2.11 counterpart
at this point?
- 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
- For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x

Before I open a JIRA, just soliciting opinions.


On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <so...@cloudera.com> wrote:

I'd like to gauge where people stand on the issue of dropping support for a
few things that were considered for 2.0.

First: Scala 2.10. We've seen a number of build breakages this week because
the PR builder only tests 2.11. No big deal at this stage, but, it did
cause me to wonder whether it's time to plan to drop 2.10 support,
especially with 2.12 coming soon.

Next, Java 7. It's reasonably old and out of public updates at this stage.
It's not that painful to keep supporting, to be honest. It would simplify
some bits of code, some scripts, some testing.

Hadoop versions: I think the the general argument is that most anyone would
be using, at the least, 2.6, and it would simplify some code that has to
reflect to use not-even-that-new APIs. It would remove some moderate
complexity in the build.


"When" is a tricky question. Although it's a little aggressive for minor
releases, I think these will all happen before 3.x regardless. 2.1.0 is not
out of the question, though coming soon. What about ... 2.2.0?


Although I tend to favor dropping support, I'm mostly asking for current
opinions.

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Koert Kuipers <ko...@tresata.com>.
given the issues with scala 2.10 and java 8 i am in favor of dropping scala
2.10 in next release

On Sat, Feb 25, 2017 at 2:10 PM, Sean Owen <so...@cloudera.com> wrote:

> I want to bring up the issue of Scala 2.10 support again, to see how
> people feel about it. Key opinions from the previous responses, I think:
>
> Cody: only drop 2.10 support when 2.12 support is added
> Koert: we need all dependencies to support 2.12; Scala updates are pretty
> transparent to IT/ops
> Ofir: make sure to deprecate 2.10 in Spark 2.1
> Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
> Matei: let’s not remove things unless they’re burdensome for the project;
> some people are still on old environments that their IT can’t easily update
>
> Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support
> for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the
> work to support 2.12, and there is progress, especially in dependencies
> supporting 2.12.
>
> It looks like 2.12 support may even entail a breaking change as documented
> in https://issues.apache.org/jira/browse/SPARK-14643 and will mean
> dropping Kafka 0.8, for example. In any event it’s going to take some
> surgery and a few hacks to make one code base work across 2.11 and 2.12. I
> don’t see this happening for Spark 2.2.0 because there are just a few weeks
> left.
>
> Supporting three versions at once is probably infeasible, so dropping 2.10
> should precede 2.12 support. Right now, I would like to make progress
> towards changes that 2.12 will require but that 2.11/2.10 can support. For
> example, we have to update scalatest, breeze, chill, etc. and can do that
> before 2.12 is enabled. However I’m finding making those changes tricky or
> maybe impossible in one case while 2.10 is still supported.
>
> For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10
> support, and even get in additional prep work for 2.12, into the 2.2.0
> release. The move to support 2.12 in 2.3.0 would then be a smaller change.
> It isn’t strictly necessary. We could delay all of that until after 2.2.0
> and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy
> enough at this stage to drop for Spark 2.2.0?
>
> I don’t feel strongly about it but there are some reasonable arguments for
> dropping it:
>
> - 2.10 doesn’t technically support Java 8, though we do have it working
> still even after requiring Java 8
> - Safe to say virtually all common _2.10 libraries has a _2.11 counterpart
> at this point?
> - 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
> - For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x
>
> Before I open a JIRA, just soliciting opinions.
>
>
> On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <so...@cloudera.com> wrote:
>
>> I'd like to gauge where people stand on the issue of dropping support for
>> a few things that were considered for 2.0.
>>
>> First: Scala 2.10. We've seen a number of build breakages this week
>> because the PR builder only tests 2.11. No big deal at this stage, but, it
>> did cause me to wonder whether it's time to plan to drop 2.10 support,
>> especially with 2.12 coming soon.
>>
>> Next, Java 7. It's reasonably old and out of public updates at this
>> stage. It's not that painful to keep supporting, to be honest. It would
>> simplify some bits of code, some scripts, some testing.
>>
>> Hadoop versions: I think the the general argument is that most anyone
>> would be using, at the least, 2.6, and it would simplify some code that has
>> to reflect to use not-even-that-new APIs. It would remove some moderate
>> complexity in the build.
>>
>>
>> "When" is a tricky question. Although it's a little aggressive for minor
>> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
>> out of the question, though coming soon. What about ... 2.2.0?
>>
>>
>> Although I tend to favor dropping support, I'm mostly asking for current
>> opinions.
>>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Sean Owen <so...@cloudera.com>.
I want to bring up the issue of Scala 2.10 support again, to see how people
feel about it. Key opinions from the previous responses, I think:

Cody: only drop 2.10 support when 2.12 support is added
Koert: we need all dependencies to support 2.12; Scala updates are pretty
transparent to IT/ops
Ofir: make sure to deprecate 2.10 in Spark 2.1
Reynold: let’s maybe remove support for Scala 2.10 and Java 7 in Spark 2.2
Matei: let’s not remove things unless they’re burdensome for the project;
some people are still on old environments that their IT can’t easily update

Scala 2.10 support was deprecated in 2.1, and we did remove Java 7 support
for 2.2. https://issues.apache.org/jira/browse/SPARK-14220 tracks the work
to support 2.12, and there is progress, especially in dependencies
supporting 2.12.

It looks like 2.12 support may even entail a breaking change as documented
in https://issues.apache.org/jira/browse/SPARK-14643 and will mean dropping
Kafka 0.8, for example. In any event it’s going to take some surgery and a
few hacks to make one code base work across 2.11 and 2.12. I don’t see this
happening for Spark 2.2.0 because there are just a few weeks left.

Supporting three versions at once is probably infeasible, so dropping 2.10
should precede 2.12 support. Right now, I would like to make progress
towards changes that 2.12 will require but that 2.11/2.10 can support. For
example, we have to update scalatest, breeze, chill, etc. and can do that
before 2.12 is enabled. However I’m finding making those changes tricky or
maybe impossible in one case while 2.10 is still supported.

For 2.2.0, I’m wondering if it makes sense to go ahead and drop 2.10
support, and even get in additional prep work for 2.12, into the 2.2.0
release. The move to support 2.12 in 2.3.0 would then be a smaller change.
It isn’t strictly necessary. We could delay all of that until after 2.2.0
and get it all done between 2.2.0 and 2.3.0. But I wonder if 2.10 is legacy
enough at this stage to drop for Spark 2.2.0?

I don’t feel strongly about it but there are some reasonable arguments for
dropping it:

- 2.10 doesn’t technically support Java 8, though we do have it working
still even after requiring Java 8
- Safe to say virtually all common _2.10 libraries has a _2.11 counterpart
at this point?
- 2.10.x was “EOL” in September 2015 with the final 2.10.6 release
- For a vendor viewpoint: CDH only supports Scala 2.11 with Spark 2.x

Before I open a JIRA, just soliciting opinions.


On Tue, Oct 25, 2016 at 4:36 PM Sean Owen <so...@cloudera.com> wrote:

> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Matei Zaharia <ma...@gmail.com>.
BTW maybe one key point that isn't obvious is that with YARN and Mesos, the version of Spark used can be solely up to the developer who writes an app, not to the cluster administrator. So even in very conservative orgs, developers can download a new version of Spark, run it, and demonstrate value, which is good both for them and for the project. On the other hand, if they were stuck with, say, Spark 1.3, they'd have a much worse experience and perhaps get a worse impression of the project.

Matei

> On Oct 28, 2016, at 9:58 AM, Matei Zaharia <ma...@gmail.com> wrote:
> 
> Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.
> 
> Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.
> 
> Matei
> 
>> On Oct 27, 2016, at 11:51 PM, Ofir Manor <ofir.manor@equalum.io <ma...@equalum.io>> wrote:
>> 
>> I totally agree with Sean, just a small correction:
>> Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a lengthy discussion), so there is no need to discuss whether they should become deprecated in 2.1
>>   http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations <http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations>
>> The discussion is whether Scala 2.10 should also be marked as deprecated (no one is objecting that), and more importantly, when to actually move from deprecation to actually dropping support for any combination of JDK / Scala / Hadoop / Python.
>> 
>> Ofir Manor
>> 
>> Co-Founder & CTO | Equalum
>> 
>> 
>> Mobile: +972-54-7801286 <tel:%2B972-54-7801286> | Email: ofir.manor@equalum.io <ma...@equalum.io>
>> On Fri, Oct 28, 2016 at 12:13 AM, Sean Owen <sowen@cloudera.com <ma...@cloudera.com>> wrote:
>> The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win for users too, and there's also a cost to not doing that.
>> 
>> I must say I don't see a risk of fragmentation as nearly the problem it's made out to be here. We are, after all, here discussing _beginning_ to remove support _in 6 months_, for long since non-current versions of things. An org's decision to not, say, use Java 8 is a decision to not use the new version of lots of things. It's not clear this is a constituency that is either large or one to reasonably serve indefinitely.
>> 
>> In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good reason to for Spark to require Java 8. And Steve suggests that means a minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of the issue.)
>> 
>> Put another way I am not sure what the criteria is, if not the above?
>> 
>> I support deprecating all of these things, at the least, in 2.1.0. Although it's a separate question, I believe it's going to be necessary to remove support in ~6 months in 2.2.0.
>> 
>> 
>> On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
>> Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.
>> 
>> In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.
>> 
>> In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.
>> 
>> Matei
>> 
>>> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rxin@databricks.com <ma...@databricks.com>> wrote:
>>> 
>>> I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138 <https://issues.apache.org/jira/browse/SPARK-18138>
>>> 
>>> 
>>> 
>>> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <stevel@hortonworks.com <ma...@hortonworks.com>> wrote:
>>> 
>>>> On 27 Oct 2016, at 10:03, Sean Owen <sowen@cloudera.com <ma...@cloudera.com>> wrote:
>>>> 
>>>> Seems OK by me.
>>>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.
>>>> 
>>> 
>>> If you go to java 8 only, then hadoop 2.6+ is mandatory. 
>>> 
>>> 
>>>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <koert@tresata.com <ma...@tresata.com>> wrote:
>>>> that sounds good to me
>>>> 
>>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rxin@databricks.com <ma...@databricks.com>> wrote:
>>>> We can do the following concrete proposal:
>>>> 
>>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).
>>>> 
>>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.
>>>> 
>>>> (a) It should appear in release notes, documentations that mention how to build Spark
>>>> 
>>>> (b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.
>>>> 
>>> 
>>> 
>> 
>> 
> 


Re: Straw poll: dropping support for things like Scala 2.10

Posted by Koert Kuipers <ko...@tresata.com>.
thats correct in my experience: we have found a scala update to be
straightforward and basically somewhat invisible to ops, but a java upgrade
a pain because it is managed and "certified" by ops.

On Fri, Oct 28, 2016 at 9:44 AM, Steve Loughran <st...@hortonworks.com>
wrote:

> Twitter just led the release of Hadoop 2.6.5 precisely because they wanted
> to keep a Java 6 cluster up: the bigger your cluster, the less of a rush to
> upgrade.
>
> HDP? I believe we install & prefer (openjdk) Java 8, but the Hadoop
> branch-2 line is intended to build/run on Java 7 too. There's always a
> conflict between us developers "shiny new features" and ops "keep cluster
> alive". That's actually where Scala has an edge: no need to upgrade the
> cluster-wide JVM just for an update, or play games configuring your
> deployed application to use a different JVM from the Hadoop services (which
> you can do, after all: it's just path setup). Thinking about it, knowing
> what can be done there —including documenting it in the spark docs, could
> be a good migration strategy.
>
> me? I look forward to when we can use Java 9 to isolate transitive
> dependencies; the bane of everyone's life. Someone needs to start on
> preparing everything for that to work though.
>
>
> On 28 Oct 2016, at 11:47, Chris Fregly <ch...@fregly.com> wrote:
>
> i seem to remember a large spark user (tencent, i believe) chiming in late
> during these discussions 6-12 months ago and squashing any sort of
> deprecation given the massive effort that would be required to upgrade
> their environment.
>
> i just want to make sure these convos take into consideration large spark
> users - and reflect the real world versus ideal world.
>
> otherwise, this is all for naught like last time.
>
> On Oct 28, 2016, at 10:43 AM, Sean Owen <so...@cloudera.com> wrote:
>
> If the subtext is vendors, then I'd have a look at what recent distros
> look like. I'll write about CDH as a representative example, but I think
> other distros are naturally similar.
>
> CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH
> 5.3 / Dec 2014). Granted, this depends on installing on an OS with that
> Java / Python version. But Java 8 / Python 2.7 is available for all of the
> supported OSes. The population that isn't on CDH 4, because that supported
> was dropped a long time ago in Spark, and who is on a version released
> 2-2.5 years ago, and won't update, is a couple percent of the installed
> base. They do not in general want anything to change at all.
>
> I assure everyone that vendors too are aligned in wanting to cater to the
> crowd that wants the most recent version of everything. For example, CDH
> offers both Spark 2.0.1 and 1.6 at the same time.
>
> I wouldn't dismiss support for these supporting components as a relevant
> proxy for whether they are worth supporting in Spark. Java 7 is long since
> EOL (no, I don't count paying Oracle for support). No vendor is supporting
> Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria
> here that reaches a different conclusion about these things just for Spark?
> This was roughly the same conversation that happened 6 months ago.
>
> I imagine we're going to find that in about 6 months it'll make more sense
> all around to remove these. If we can just give a heads up with deprecation
> and then kick the can down the road a bit more, that sounds like enough for
> now.
>
> On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> Deprecating them is fine (and I know they're already deprecated), the
>> question is just whether to remove them. For example, what exactly is the
>> downside of having Python 2.6 or Java 7 right now? If it's high, then we
>> can remove them, but I just haven't seen a ton of details. It also sounded
>> like fairly recent versions of CDH, HDP, RHEL, etc still have old versions
>> of these.
>>
>> Just talking with users, I've seen many of people who say "we have a
>> Hadoop cluster from $VENDOR, but we just download Spark from Apache and run
>> newer versions of that". That's great for Spark IMO, and we need to stay
>> compatible even with somewhat older Hadoop installs because they are
>> time-consuming to update. Having the whole community on a small set of
>> versions leads to a better experience for everyone and also to more of a
>> "network effect": more people can battle-test new versions, answer
>> questions about them online, write libraries that easily reach the majority
>> of Spark users, etc.
>>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Steve Loughran <st...@hortonworks.com>.
Twitter just led the release of Hadoop 2.6.5 precisely because they wanted to keep a Java 6 cluster up: the bigger your cluster, the less of a rush to upgrade.

HDP? I believe we install & prefer (openjdk) Java 8, but the Hadoop branch-2 line is intended to build/run on Java 7 too. There's always a conflict between us developers "shiny new features" and ops "keep cluster alive". That's actually where Scala has an edge: no need to upgrade the cluster-wide JVM just for an update, or play games configuring your deployed application to use a different JVM from the Hadoop services (which you can do, after all: it's just path setup). Thinking about it, knowing what can be done there —including documenting it in the spark docs, could be a good migration strategy.

me? I look forward to when we can use Java 9 to isolate transitive dependencies; the bane of everyone's life. Someone needs to start on preparing everything for that to work though.

On 28 Oct 2016, at 11:47, Chris Fregly <ch...@fregly.com>> wrote:

i seem to remember a large spark user (tencent, i believe) chiming in late during these discussions 6-12 months ago and squashing any sort of deprecation given the massive effort that would be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users - and reflect the real world versus ideal world.

otherwise, this is all for naught like last time.

On Oct 28, 2016, at 10:43 AM, Sean Owen <so...@cloudera.com>> wrote:

If the subtext is vendors, then I'd have a look at what recent distros look like. I'll write about CDH as a representative example, but I think other distros are naturally similar.

CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014). Granted, this depends on installing on an OS with that Java / Python version. But Java 8 / Python 2.7 is available for all of the supported OSes. The population that isn't on CDH 4, because that supported was dropped a long time ago in Spark, and who is on a version released 2-2.5 years ago, and won't update, is a couple percent of the installed base. They do not in general want anything to change at all.

I assure everyone that vendors too are aligned in wanting to cater to the crowd that wants the most recent version of everything. For example, CDH offers both Spark 2.0.1 and 1.6 at the same time.

I wouldn't dismiss support for these supporting components as a relevant proxy for whether they are worth supporting in Spark. Java 7 is long since EOL (no, I don't count paying Oracle for support). No vendor is supporting Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here that reaches a different conclusion about these things just for Spark? This was roughly the same conversation that happened 6 months ago.

I imagine we're going to find that in about 6 months it'll make more sense all around to remove these. If we can just give a heads up with deprecation and then kick the can down the road a bit more, that sounds like enough for now.

On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <ma...@gmail.com>> wrote:
Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.


Re: Straw poll: dropping support for things like Scala 2.10

Posted by Chris Fregly <ch...@fregly.com>.
i seem to remember a large spark user (tencent, i believe) chiming in late during these discussions 6-12 months ago and squashing any sort of deprecation given the massive effort that would be required to upgrade their environment.

i just want to make sure these convos take into consideration large spark users - and reflect the real world versus ideal world.

otherwise, this is all for naught like last time.

> On Oct 28, 2016, at 10:43 AM, Sean Owen <so...@cloudera.com> wrote:
> 
> If the subtext is vendors, then I'd have a look at what recent distros look like. I'll write about CDH as a representative example, but I think other distros are naturally similar.
> 
> CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH 5.3 / Dec 2014). Granted, this depends on installing on an OS with that Java / Python version. But Java 8 / Python 2.7 is available for all of the supported OSes. The population that isn't on CDH 4, because that supported was dropped a long time ago in Spark, and who is on a version released 2-2.5 years ago, and won't update, is a couple percent of the installed base. They do not in general want anything to change at all.
> 
> I assure everyone that vendors too are aligned in wanting to cater to the crowd that wants the most recent version of everything. For example, CDH offers both Spark 2.0.1 and 1.6 at the same time.
> 
> I wouldn't dismiss support for these supporting components as a relevant proxy for whether they are worth supporting in Spark. Java 7 is long since EOL (no, I don't count paying Oracle for support). No vendor is supporting Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria here that reaches a different conclusion about these things just for Spark? This was roughly the same conversation that happened 6 months ago.
> 
> I imagine we're going to find that in about 6 months it'll make more sense all around to remove these. If we can just give a heads up with deprecation and then kick the can down the road a bit more, that sounds like enough for now.
> 
>> On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <ma...@gmail.com> wrote:
>> Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.
>> 
>> Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Sean Owen <so...@cloudera.com>.
If the subtext is vendors, then I'd have a look at what recent distros look
like. I'll write about CDH as a representative example, but I think other
distros are naturally similar.

CDH has been on Java 8, Hadoop 2.6, Python 2.7 for almost two years (CDH
5.3 / Dec 2014). Granted, this depends on installing on an OS with that
Java / Python version. But Java 8 / Python 2.7 is available for all of the
supported OSes. The population that isn't on CDH 4, because that supported
was dropped a long time ago in Spark, and who is on a version released
2-2.5 years ago, and won't update, is a couple percent of the installed
base. They do not in general want anything to change at all.

I assure everyone that vendors too are aligned in wanting to cater to the
crowd that wants the most recent version of everything. For example, CDH
offers both Spark 2.0.1 and 1.6 at the same time.

I wouldn't dismiss support for these supporting components as a relevant
proxy for whether they are worth supporting in Spark. Java 7 is long since
EOL (no, I don't count paying Oracle for support). No vendor is supporting
Hadoop < 2.6. Scala 2.10 was EOL at the end of 2014. Is there a criteria
here that reaches a different conclusion about these things just for Spark?
This was roughly the same conversation that happened 6 months ago.

I imagine we're going to find that in about 6 months it'll make more sense
all around to remove these. If we can just give a heads up with deprecation
and then kick the can down the road a bit more, that sounds like enough for
now.

On Fri, Oct 28, 2016 at 8:58 AM Matei Zaharia <ma...@gmail.com>
wrote:

> Deprecating them is fine (and I know they're already deprecated), the
> question is just whether to remove them. For example, what exactly is the
> downside of having Python 2.6 or Java 7 right now? If it's high, then we
> can remove them, but I just haven't seen a ton of details. It also sounded
> like fairly recent versions of CDH, HDP, RHEL, etc still have old versions
> of these.
>
> Just talking with users, I've seen many of people who say "we have a
> Hadoop cluster from $VENDOR, but we just download Spark from Apache and run
> newer versions of that". That's great for Spark IMO, and we need to stay
> compatible even with somewhat older Hadoop installs because they are
> time-consuming to update. Having the whole community on a small set of
> versions leads to a better experience for everyone and also to more of a
> "network effect": more people can battle-test new versions, answer
> questions about them online, write libraries that easily reach the majority
> of Spark users, etc.
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Matei Zaharia <ma...@gmail.com>.
Deprecating them is fine (and I know they're already deprecated), the question is just whether to remove them. For example, what exactly is the downside of having Python 2.6 or Java 7 right now? If it's high, then we can remove them, but I just haven't seen a ton of details. It also sounded like fairly recent versions of CDH, HDP, RHEL, etc still have old versions of these.

Just talking with users, I've seen many of people who say "we have a Hadoop cluster from $VENDOR, but we just download Spark from Apache and run newer versions of that". That's great for Spark IMO, and we need to stay compatible even with somewhat older Hadoop installs because they are time-consuming to update. Having the whole community on a small set of versions leads to a better experience for everyone and also to more of a "network effect": more people can battle-test new versions, answer questions about them online, write libraries that easily reach the majority of Spark users, etc.

Matei

> On Oct 27, 2016, at 11:51 PM, Ofir Manor <of...@equalum.io> wrote:
> 
> I totally agree with Sean, just a small correction:
> Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a lengthy discussion), so there is no need to discuss whether they should become deprecated in 2.1
>   http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations <http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations>
> The discussion is whether Scala 2.10 should also be marked as deprecated (no one is objecting that), and more importantly, when to actually move from deprecation to actually dropping support for any combination of JDK / Scala / Hadoop / Python.
> 
> Ofir Manor
> 
> Co-Founder & CTO | Equalum
> 
> 
> Mobile: +972-54-7801286 <tel:%2B972-54-7801286> | Email: ofir.manor@equalum.io <ma...@equalum.io>
> On Fri, Oct 28, 2016 at 12:13 AM, Sean Owen <sowen@cloudera.com <ma...@cloudera.com>> wrote:
> The burden may be a little more apparent when dealing with the day to day merging and fixing of breaks. The upside is maybe the more compelling argument though. For example, lambda-fying all the Java code, supporting java.time, and taking advantage of some newer Hadoop/YARN APIs is a moderate win for users too, and there's also a cost to not doing that.
> 
> I must say I don't see a risk of fragmentation as nearly the problem it's made out to be here. We are, after all, here discussing _beginning_ to remove support _in 6 months_, for long since non-current versions of things. An org's decision to not, say, use Java 8 is a decision to not use the new version of lots of things. It's not clear this is a constituency that is either large or one to reasonably serve indefinitely.
> 
> In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12 simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a good reason to for Spark to require Java 8. And Steve suggests that means a minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of the issue.)
> 
> Put another way I am not sure what the criteria is, if not the above?
> 
> I support deprecating all of these things, at the least, in 2.1.0. Although it's a separate question, I believe it's going to be necessary to remove support in ~6 months in 2.2.0.
> 
> 
> On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <matei.zaharia@gmail.com <ma...@gmail.com>> wrote:
> Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.
> 
> In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.
> 
> In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.
> 
> Matei
> 
>> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rxin@databricks.com <ma...@databricks.com>> wrote:
>> 
>> I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138 <https://issues.apache.org/jira/browse/SPARK-18138>
>> 
>> 
>> 
>> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <stevel@hortonworks.com <ma...@hortonworks.com>> wrote:
>> 
>>> On 27 Oct 2016, at 10:03, Sean Owen <sowen@cloudera.com <ma...@cloudera.com>> wrote:
>>> 
>>> Seems OK by me.
>>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.
>>> 
>> 
>> If you go to java 8 only, then hadoop 2.6+ is mandatory. 
>> 
>> 
>>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <koert@tresata.com <ma...@tresata.com>> wrote:
>>> that sounds good to me
>>> 
>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rxin@databricks.com <ma...@databricks.com>> wrote:
>>> We can do the following concrete proposal:
>>> 
>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).
>>> 
>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.
>>> 
>>> (a) It should appear in release notes, documentations that mention how to build Spark
>>> 
>>> (b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.
>>> 
>> 
>> 
> 
> 


Re: Straw poll: dropping support for things like Scala 2.10

Posted by Ofir Manor <of...@equalum.io>.
I totally agree with Sean, just a small correction:
Java 7 and Python 2.6 are already deprecated since Spark 2.0 (after a
lengthy discussion), so there is no need to discuss whether they should
become deprecated in 2.1
  http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations
The discussion is whether Scala 2.10 should also be marked as deprecated
(no one is objecting that), and more importantly, when to actually move
from deprecation to actually dropping support for any combination of JDK /
Scala / Hadoop / Python.

Ofir Manor

Co-Founder & CTO | Equalum

Mobile: +972-54-7801286 | Email: ofir.manor@equalum.io

On Fri, Oct 28, 2016 at 12:13 AM, Sean Owen <so...@cloudera.com> wrote:

> The burden may be a little more apparent when dealing with the day to day
> merging and fixing of breaks. The upside is maybe the more compelling
> argument though. For example, lambda-fying all the Java code, supporting
> java.time, and taking advantage of some newer Hadoop/YARN APIs is a
> moderate win for users too, and there's also a cost to not doing that.
>
> I must say I don't see a risk of fragmentation as nearly the problem it's
> made out to be here. We are, after all, here discussing _beginning_ to
> remove support _in 6 months_, for long since non-current versions of
> things. An org's decision to not, say, use Java 8 is a decision to not use
> the new version of lots of things. It's not clear this is a constituency
> that is either large or one to reasonably serve indefinitely.
>
> In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12
> simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a
> good reason to for Spark to require Java 8. And Steve suggests that means a
> minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of
> the issue.)
>
> Put another way I am not sure what the criteria is, if not the above?
>
> I support deprecating all of these things, at the least, in 2.1.0.
> Although it's a separate question, I believe it's going to be necessary to
> remove support in ~6 months in 2.2.0.
>
>
> On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <ma...@gmail.com>
> wrote:
>
>> Just to comment on this, I'm generally against removing these types of
>> things unless they create a substantial burden on project contributors. It
>> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
>> but then of course we need to wait for 2.12 to be out and stable.
>>
>> In general, this type of stuff only hurts users, and doesn't have a huge
>> impact on Spark contributors' productivity (sure, it's a bit unpleasant,
>> but that's life). If we break compatibility this way too quickly, we
>> fragment the user community, and then either people have a crappy
>> experience with Spark because their corporate IT doesn't yet have an
>> environment that can run the latest version, or worse, they create more
>> maintenance burden for us because they ask for more patches to be
>> backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular
>> is pretty fundamental to many Linux distros.
>>
>> In the future, rather than just looking at when some software came out,
>> it may be good to have some criteria for when to drop support for
>> something. For example, if there are really nice libraries in Python 2.7 or
>> Java 8 that we're missing out on, that may be a good reason. The
>> maintenance burden for multiple Scala versions is definitely painful but I
>> also think we should always support the latest two Scala releases.
>>
>> Matei
>>
>> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>> I created a JIRA ticket to track this: https://issues.apache.
>> org/jira/browse/SPARK-18138
>>
>>
>>
>> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <st...@hortonworks.com>
>> wrote:
>>
>>
>> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote:
>>
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
>> to add that to a list of things that will begin to be unsupported 6 months
>> from now.
>>
>>
>> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>>
>>
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>> that sounds good to me
>>
>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>> We can do the following concrete proposal:
>>
>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
>> 2017).
>>
>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>> deprecation of Java 7 / Scala 2.10 support.
>>
>> (a) It should appear in release notes, documentations that mention how to
>> build Spark
>>
>> (b) and a warning should be shown every time SparkContext is started
>> using Scala 2.10 or Java 7.
>>
>>
>>
>>
>>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Sean Owen <so...@cloudera.com>.
The burden may be a little more apparent when dealing with the day to day
merging and fixing of breaks. The upside is maybe the more compelling
argument though. For example, lambda-fying all the Java code, supporting
java.time, and taking advantage of some newer Hadoop/YARN APIs is a
moderate win for users too, and there's also a cost to not doing that.

I must say I don't see a risk of fragmentation as nearly the problem it's
made out to be here. We are, after all, here discussing _beginning_ to
remove support _in 6 months_, for long since non-current versions of
things. An org's decision to not, say, use Java 8 is a decision to not use
the new version of lots of things. It's not clear this is a constituency
that is either large or one to reasonably serve indefinitely.

In the end, the Scala issue may be decisive. Supporting 2.10 - 2.12
simultaneously is a bridge too far, and if 2.12 requires Java 8, it's a
good reason to for Spark to require Java 8. And Steve suggests that means a
minimum of Hadoop 2.6 too. (I still profess ignorance of the Python part of
the issue.)

Put another way I am not sure what the criteria is, if not the above?

I support deprecating all of these things, at the least, in 2.1.0. Although
it's a separate question, I believe it's going to be necessary to remove
support in ~6 months in 2.2.0.


On Thu, Oct 27, 2016 at 4:36 PM Matei Zaharia <ma...@gmail.com>
wrote:

> Just to comment on this, I'm generally against removing these types of
> things unless they create a substantial burden on project contributors. It
> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
> but then of course we need to wait for 2.12 to be out and stable.
>
> In general, this type of stuff only hurts users, and doesn't have a huge
> impact on Spark contributors' productivity (sure, it's a bit unpleasant,
> but that's life). If we break compatibility this way too quickly, we
> fragment the user community, and then either people have a crappy
> experience with Spark because their corporate IT doesn't yet have an
> environment that can run the latest version, or worse, they create more
> maintenance burden for us because they ask for more patches to be
> backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular
> is pretty fundamental to many Linux distros.
>
> In the future, rather than just looking at when some software came out, it
> may be good to have some criteria for when to drop support for something.
> For example, if there are really nice libraries in Python 2.7 or Java 8
> that we're missing out on, that may be a good reason. The maintenance
> burden for multiple Scala versions is definitely painful but I also think
> we should always support the latest two Scala releases.
>
> Matei
>
> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rx...@databricks.com> wrote:
>
> I created a JIRA ticket to track this:
> https://issues.apache.org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>
> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote:
>
> Seems OK by me.
> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
> to add that to a list of things that will begin to be unsupported 6 months
> from now.
>
>
> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>
>
> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>
> that sounds good to me
>
> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com> wrote:
>
> We can do the following concrete proposal:
>
> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
> 2017).
>
> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
> deprecation of Java 7 / Scala 2.10 support.
>
> (a) It should appear in release notes, documentations that mention how to
> build Spark
>
> (b) and a warning should be shown every time SparkContext is started using
> Scala 2.10 or Java 7.
>
>
>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Amit Tank <am...@gmail.com>.
+1 for Matei's point.

On Thursday, October 27, 2016, Matei Zaharia <ma...@gmail.com>
wrote:

> Just to comment on this, I'm generally against removing these types of
> things unless they create a substantial burden on project contributors. It
> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
> but then of course we need to wait for 2.12 to be out and stable.
>
> In general, this type of stuff only hurts users, and doesn't have a huge
> impact on Spark contributors' productivity (sure, it's a bit unpleasant,
> but that's life). If we break compatibility this way too quickly, we
> fragment the user community, and then either people have a crappy
> experience with Spark because their corporate IT doesn't yet have an
> environment that can run the latest version, or worse, they create more
> maintenance burden for us because they ask for more patches to be
> backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular
> is pretty fundamental to many Linux distros.
>
> In the future, rather than just looking at when some software came out, it
> may be good to have some criteria for when to drop support for something.
> For example, if there are really nice libraries in Python 2.7 or Java 8
> that we're missing out on, that may be a good reason. The maintenance
> burden for multiple Scala versions is definitely painful but I also think
> we should always support the latest two Scala releases.
>
> Matei
>
> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rxin@databricks.com
> <javascript:_e(%7B%7D,'cvml','rxin@databricks.com');>> wrote:
>
> I created a JIRA ticket to track this: https://issues.apache.
> org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <stevel@hortonworks.com
> <javascript:_e(%7B%7D,'cvml','stevel@hortonworks.com');>> wrote:
>
>>
>> On 27 Oct 2016, at 10:03, Sean Owen <sowen@cloudera.com
>> <javascript:_e(%7B%7D,'cvml','sowen@cloudera.com');>> wrote:
>>
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
>> to add that to a list of things that will begin to be unsupported 6 months
>> from now.
>>
>>
>> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>>
>>
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <koert@tresata.com
>> <javascript:_e(%7B%7D,'cvml','koert@tresata.com');>> wrote:
>>
>>> that sounds good to me
>>>
>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rxin@databricks.com
>>> <javascript:_e(%7B%7D,'cvml','rxin@databricks.com');>> wrote:
>>>
>>>> We can do the following concrete proposal:
>>>>
>>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0
>>>> (Mar/Apr 2017).
>>>>
>>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>>>> deprecation of Java 7 / Scala 2.10 support.
>>>>
>>>> (a) It should appear in release notes, documentations that mention how
>>>> to build Spark
>>>>
>>>> (b) and a warning should be shown every time SparkContext is started
>>>> using Scala 2.10 or Java 7.
>>>>
>>>>
>>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Felix Cheung <fe...@hotmail.com>.
+1 on Matei's.


_____________________________
From: Davies Liu <da...@databricks.com>>
Sent: Thursday, October 27, 2016 9:58 AM
Subject: Re: Straw poll: dropping support for things like Scala 2.10
To: Matei Zaharia <ma...@gmail.com>>
Cc: Reynold Xin <rx...@databricks.com>>, Steve Loughran <st...@hortonworks.com>>, Sean Owen <so...@cloudera.com>>, Koert Kuipers <ko...@tresata.com>>, Dongjoon Hyun <do...@apache.org>>, Daniel Siegmann <ds...@securityscorecard.io>>, Apache Spark Dev <de...@spark.apache.org>>


+1 for Matei's point.

On Thu, Oct 27, 2016 at 8:36 AM, Matei Zaharia <ma...@gmail.com>> wrote:
> Just to comment on this, I'm generally against removing these types of
> things unless they create a substantial burden on project contributors. It
> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
> but then of course we need to wait for 2.12 to be out and stable.
>
> In general, this type of stuff only hurts users, and doesn't have a huge
> impact on Spark contributors' productivity (sure, it's a bit unpleasant, but
> that's life). If we break compatibility this way too quickly, we fragment
> the user community, and then either people have a crappy experience with
> Spark because their corporate IT doesn't yet have an environment that can
> run the latest version, or worse, they create more maintenance burden for us
> because they ask for more patches to be backported to old Spark versions
> (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many
> Linux distros.
>
> In the future, rather than just looking at when some software came out, it
> may be good to have some criteria for when to drop support for something.
> For example, if there are really nice libraries in Python 2.7 or Java 8 that
> we're missing out on, that may be a good reason. The maintenance burden for
> multiple Scala versions is definitely painful but I also think we should
> always support the latest two Scala releases.
>
> Matei
>
> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rx...@databricks.com>> wrote:
>
> I created a JIRA ticket to track this:
> https://issues.apache.org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <st...@hortonworks.com>>
> wrote:
>>
>>
>> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com>> wrote:
>>
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
>> to add that to a list of things that will begin to be unsupported 6 months
>> from now.
>>
>>
>> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>>
>>
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com>> wrote:
>>>
>>> that sounds good to me
>>>
>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com>> wrote:
>>>>
>>>> We can do the following concrete proposal:
>>>>
>>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0
>>>> (Mar/Apr 2017).
>>>>
>>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>>>> deprecation of Java 7 / Scala 2.10 support.
>>>>
>>>> (a) It should appear in release notes, documentations that mention how
>>>> to build Spark
>>>>
>>>> (b) and a warning should be shown every time SparkContext is started
>>>> using Scala 2.10 or Java 7.
>>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org<ma...@spark.apache.org>




Re: Straw poll: dropping support for things like Scala 2.10

Posted by Davies Liu <da...@databricks.com>.
+1 for Matei's point.

On Thu, Oct 27, 2016 at 8:36 AM, Matei Zaharia <ma...@gmail.com> wrote:
> Just to comment on this, I'm generally against removing these types of
> things unless they create a substantial burden on project contributors. It
> doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might,
> but then of course we need to wait for 2.12 to be out and stable.
>
> In general, this type of stuff only hurts users, and doesn't have a huge
> impact on Spark contributors' productivity (sure, it's a bit unpleasant, but
> that's life). If we break compatibility this way too quickly, we fragment
> the user community, and then either people have a crappy experience with
> Spark because their corporate IT doesn't yet have an environment that can
> run the latest version, or worse, they create more maintenance burden for us
> because they ask for more patches to be backported to old Spark versions
> (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many
> Linux distros.
>
> In the future, rather than just looking at when some software came out, it
> may be good to have some criteria for when to drop support for something.
> For example, if there are really nice libraries in Python 2.7 or Java 8 that
> we're missing out on, that may be a good reason. The maintenance burden for
> multiple Scala versions is definitely painful but I also think we should
> always support the latest two Scala releases.
>
> Matei
>
> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rx...@databricks.com> wrote:
>
> I created a JIRA ticket to track this:
> https://issues.apache.org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>>
>>
>> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote:
>>
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
>> to add that to a list of things that will begin to be unsupported 6 months
>> from now.
>>
>>
>> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>>
>>
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>>
>>> that sounds good to me
>>>
>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com> wrote:
>>>>
>>>> We can do the following concrete proposal:
>>>>
>>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0
>>>> (Mar/Apr 2017).
>>>>
>>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>>>> deprecation of Java 7 / Scala 2.10 support.
>>>>
>>>> (a) It should appear in release notes, documentations that mention how
>>>> to build Spark
>>>>
>>>> (b) and a warning should be shown every time SparkContext is started
>>>> using Scala 2.10 or Java 7.
>>>>
>>
>
>

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Straw poll: dropping support for things like Scala 2.10

Posted by Matei Zaharia <ma...@gmail.com>.
Just to comment on this, I'm generally against removing these types of things unless they create a substantial burden on project contributors. It doesn't sound like Python 2.6 and Java 7 do that yet -- Scala 2.10 might, but then of course we need to wait for 2.12 to be out and stable.

In general, this type of stuff only hurts users, and doesn't have a huge impact on Spark contributors' productivity (sure, it's a bit unpleasant, but that's life). If we break compatibility this way too quickly, we fragment the user community, and then either people have a crappy experience with Spark because their corporate IT doesn't yet have an environment that can run the latest version, or worse, they create more maintenance burden for us because they ask for more patches to be backported to old Spark versions (1.6.x, 2.0.x, etc). Python in particular is pretty fundamental to many Linux distros.

In the future, rather than just looking at when some software came out, it may be good to have some criteria for when to drop support for something. For example, if there are really nice libraries in Python 2.7 or Java 8 that we're missing out on, that may be a good reason. The maintenance burden for multiple Scala versions is definitely painful but I also think we should always support the latest two Scala releases.

Matei

> On Oct 27, 2016, at 12:15 PM, Reynold Xin <rx...@databricks.com> wrote:
> 
> I created a JIRA ticket to track this: https://issues.apache.org/jira/browse/SPARK-18138 <https://issues.apache.org/jira/browse/SPARK-18138>
> 
> 
> 
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <stevel@hortonworks.com <ma...@hortonworks.com>> wrote:
> 
>> On 27 Oct 2016, at 10:03, Sean Owen <sowen@cloudera.com <ma...@cloudera.com>> wrote:
>> 
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.
>> 
> 
> If you go to java 8 only, then hadoop 2.6+ is mandatory. 
> 
> 
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <koert@tresata.com <ma...@tresata.com>> wrote:
>> that sounds good to me
>> 
>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rxin@databricks.com <ma...@databricks.com>> wrote:
>> We can do the following concrete proposal:
>> 
>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).
>> 
>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.
>> 
>> (a) It should appear in release notes, documentations that mention how to build Spark
>> 
>> (b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.
>> 
> 
> 


Re: Straw poll: dropping support for things like Scala 2.10

Posted by Yanbo Liang <yb...@gmail.com>.
+1

On Thu, Oct 27, 2016 at 3:15 AM, Reynold Xin <rx...@databricks.com> wrote:

> I created a JIRA ticket to track this: https://issues.apache.
> org/jira/browse/SPARK-18138
>
>
>
> On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <st...@hortonworks.com>
> wrote:
>
>>
>> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote:
>>
>> Seems OK by me.
>> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
>> to add that to a list of things that will begin to be unsupported 6 months
>> from now.
>>
>>
>> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>>
>>
>> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>>
>>> that sounds good to me
>>>
>>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com>
>>> wrote:
>>>
>>> We can do the following concrete proposal:
>>>
>>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0
>>> (Mar/Apr 2017).
>>>
>>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>>> deprecation of Java 7 / Scala 2.10 support.
>>>
>>> (a) It should appear in release notes, documentations that mention how
>>> to build Spark
>>>
>>> (b) and a warning should be shown every time SparkContext is started
>>> using Scala 2.10 or Java 7.
>>>
>>>
>>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Reynold Xin <rx...@databricks.com>.
I created a JIRA ticket to track this:
https://issues.apache.org/jira/browse/SPARK-18138



On Thu, Oct 27, 2016 at 10:19 AM, Steve Loughran <st...@hortonworks.com>
wrote:

>
> On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com> wrote:
>
> Seems OK by me.
> How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like
> to add that to a list of things that will begin to be unsupported 6 months
> from now.
>
>
> If you go to java 8 only, then hadoop 2.6+ is mandatory.
>
>
> On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote:
>
>> that sounds good to me
>>
>> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com> wrote:
>>
>> We can do the following concrete proposal:
>>
>> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
>> 2017).
>>
>> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
>> deprecation of Java 7 / Scala 2.10 support.
>>
>> (a) It should appear in release notes, documentations that mention how to
>> build Spark
>>
>> (b) and a warning should be shown every time SparkContext is started
>> using Scala 2.10 or Java 7.
>>
>>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Steve Loughran <st...@hortonworks.com>.
On 27 Oct 2016, at 10:03, Sean Owen <so...@cloudera.com>> wrote:

Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to add that to a list of things that will begin to be unsupported 6 months from now.


If you go to java 8 only, then hadoop 2.6+ is mandatory.


On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com>> wrote:
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com>> wrote:
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr 2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to build Spark

(b) and a warning should be shown every time SparkContext is started using Scala 2.10 or Java 7.



Re: Straw poll: dropping support for things like Scala 2.10

Posted by Sean Owen <so...@cloudera.com>.
Seems OK by me.
How about Hadoop < 2.6, Python 2.6? Those seem more removeable. I'd like to
add that to a list of things that will begin to be unsupported 6 months
from now.

On Wed, Oct 26, 2016 at 8:49 PM Koert Kuipers <ko...@tresata.com> wrote:

> that sounds good to me
>
> On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com> wrote:
>
> We can do the following concrete proposal:
>
> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
> 2017).
>
> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
> deprecation of Java 7 / Scala 2.10 support.
>
> (a) It should appear in release notes, documentations that mention how to
> build Spark
>
> (b) and a warning should be shown every time SparkContext is started using
> Scala 2.10 or Java 7.
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Koert Kuipers <ko...@tresata.com>.
that sounds good to me

On Wed, Oct 26, 2016 at 2:26 PM, Reynold Xin <rx...@databricks.com> wrote:

> We can do the following concrete proposal:
>
> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
> 2017).
>
> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
> deprecation of Java 7 / Scala 2.10 support.
>
> (a) It should appear in release notes, documentations that mention how to
> build Spark
>
> (b) and a warning should be shown every time SparkContext is started using
> Scala 2.10 or Java 7.
>
>
>
> On Wed, Oct 26, 2016 at 7:50 PM, Dongjoon Hyun <do...@apache.org>
> wrote:
>
>> Hi, Daniel.
>>
>> I guess that kind of works will start sufficiently in 2.1.0 after PMC's
>> annoucement/reminder on mailing list.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Wednesday, October 26, 2016, Daniel Siegmann <
>> dsiegmann@securityscorecard.io> wrote:
>>
>>> Is the deprecation of JDK 7 and Scala 2.10 documented anywhere outside
>>> the release notes for Spark 2.0.0? I do not consider release notes to be
>>> sufficient public notice for deprecation of supported platforms - this
>>> should be noted in the documentation somewhere. Here are on the only
>>> mentions I could find:
>>>
>>> At http://spark.apache.org/downloads.html it says:
>>>
>>> "*Note: Starting version 2.0, Spark is built with Scala 2.11 by
>>> default. Scala 2.10 users should download the Spark source package and
>>> build with Scala 2.10 support
>>> <http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-210>."*
>>>
>>> At http://spark.apache.org/docs/latest/#downloading it says:
>>>
>>> "Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API,
>>> Spark 2.0.1 uses Scala 2.11. You will need to use a compatible Scala
>>> version (2.11.x)."
>>>
>>> At http://spark.apache.org/docs/latest/programming-guide.html#l
>>> inking-with-spark it says:
>>>
>>>    - "Spark 2.0.1 is built and distributed to work with Scala 2.11 by
>>>    default. (Spark can be built to work with other versions of Scala, too.) To
>>>    write applications in Scala, you will need to use a compatible Scala
>>>    version (e.g. 2.11.X)."
>>>    - "Spark 2.0.1 works with Java 7 and higher. If you are using Java
>>>    8, Spark supports lambda expressions
>>>    <http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html>
>>>    for concisely writing functions, otherwise you can use the classes in the
>>>    org.apache.spark.api.java.function
>>>    <http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/function/package-summary.html>
>>>    package."
>>>    - "Spark 2.0.1 works with Python 2.6+ or Python 3.4+. It can use the
>>>    standard CPython interpreter, so C libraries like NumPy can be used. It
>>>    also works with PyPy 2.3+."
>>>
>>>
>>>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Michael Armbrust <mi...@databricks.com>.
+1

On Wed, Oct 26, 2016 at 11:26 AM, Reynold Xin <rx...@databricks.com> wrote:

> We can do the following concrete proposal:
>
> 1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
> 2017).
>
> 2. In Spark 2.1.0 release, aggressively and explicitly announce the
> deprecation of Java 7 / Scala 2.10 support.
>
> (a) It should appear in release notes, documentations that mention how to
> build Spark
>
> (b) and a warning should be shown every time SparkContext is started using
> Scala 2.10 or Java 7.
>
>
>
> On Wed, Oct 26, 2016 at 7:50 PM, Dongjoon Hyun <do...@apache.org>
> wrote:
>
>> Hi, Daniel.
>>
>> I guess that kind of works will start sufficiently in 2.1.0 after PMC's
>> annoucement/reminder on mailing list.
>>
>> Bests,
>> Dongjoon.
>>
>>
>> On Wednesday, October 26, 2016, Daniel Siegmann <
>> dsiegmann@securityscorecard.io> wrote:
>>
>>> Is the deprecation of JDK 7 and Scala 2.10 documented anywhere outside
>>> the release notes for Spark 2.0.0? I do not consider release notes to be
>>> sufficient public notice for deprecation of supported platforms - this
>>> should be noted in the documentation somewhere. Here are on the only
>>> mentions I could find:
>>>
>>> At http://spark.apache.org/downloads.html it says:
>>>
>>> "*Note: Starting version 2.0, Spark is built with Scala 2.11 by
>>> default. Scala 2.10 users should download the Spark source package and
>>> build with Scala 2.10 support
>>> <http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-210>."*
>>>
>>> At http://spark.apache.org/docs/latest/#downloading it says:
>>>
>>> "Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API,
>>> Spark 2.0.1 uses Scala 2.11. You will need to use a compatible Scala
>>> version (2.11.x)."
>>>
>>> At http://spark.apache.org/docs/latest/programming-guide.html#l
>>> inking-with-spark it says:
>>>
>>>    - "Spark 2.0.1 is built and distributed to work with Scala 2.11 by
>>>    default. (Spark can be built to work with other versions of Scala, too.) To
>>>    write applications in Scala, you will need to use a compatible Scala
>>>    version (e.g. 2.11.X)."
>>>    - "Spark 2.0.1 works with Java 7 and higher. If you are using Java
>>>    8, Spark supports lambda expressions
>>>    <http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html>
>>>    for concisely writing functions, otherwise you can use the classes in the
>>>    org.apache.spark.api.java.function
>>>    <http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/function/package-summary.html>
>>>    package."
>>>    - "Spark 2.0.1 works with Python 2.6+ or Python 3.4+. It can use the
>>>    standard CPython interpreter, so C libraries like NumPy can be used. It
>>>    also works with PyPy 2.3+."
>>>
>>>
>>>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Reynold Xin <rx...@databricks.com>.
We can do the following concrete proposal:

1. Plan to remove support for Java 7 / Scala 2.10 in Spark 2.2.0 (Mar/Apr
2017).

2. In Spark 2.1.0 release, aggressively and explicitly announce the
deprecation of Java 7 / Scala 2.10 support.

(a) It should appear in release notes, documentations that mention how to
build Spark

(b) and a warning should be shown every time SparkContext is started using
Scala 2.10 or Java 7.



On Wed, Oct 26, 2016 at 7:50 PM, Dongjoon Hyun <do...@apache.org> wrote:

> Hi, Daniel.
>
> I guess that kind of works will start sufficiently in 2.1.0 after PMC's
> annoucement/reminder on mailing list.
>
> Bests,
> Dongjoon.
>
>
> On Wednesday, October 26, 2016, Daniel Siegmann <
> dsiegmann@securityscorecard.io> wrote:
>
>> Is the deprecation of JDK 7 and Scala 2.10 documented anywhere outside
>> the release notes for Spark 2.0.0? I do not consider release notes to be
>> sufficient public notice for deprecation of supported platforms - this
>> should be noted in the documentation somewhere. Here are on the only
>> mentions I could find:
>>
>> At http://spark.apache.org/downloads.html it says:
>>
>> "*Note: Starting version 2.0, Spark is built with Scala 2.11 by default.
>> Scala 2.10 users should download the Spark source package and build with
>> Scala 2.10 support
>> <http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-210>."*
>>
>> At http://spark.apache.org/docs/latest/#downloading it says:
>>
>> "Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API,
>> Spark 2.0.1 uses Scala 2.11. You will need to use a compatible Scala
>> version (2.11.x)."
>>
>> At http://spark.apache.org/docs/latest/programming-guide.html#l
>> inking-with-spark it says:
>>
>>    - "Spark 2.0.1 is built and distributed to work with Scala 2.11 by
>>    default. (Spark can be built to work with other versions of Scala, too.) To
>>    write applications in Scala, you will need to use a compatible Scala
>>    version (e.g. 2.11.X)."
>>    - "Spark 2.0.1 works with Java 7 and higher. If you are using Java 8,
>>    Spark supports lambda expressions
>>    <http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html>
>>    for concisely writing functions, otherwise you can use the classes in the
>>    org.apache.spark.api.java.function
>>    <http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/function/package-summary.html>
>>    package."
>>    - "Spark 2.0.1 works with Python 2.6+ or Python 3.4+. It can use the
>>    standard CPython interpreter, so C libraries like NumPy can be used. It
>>    also works with PyPy 2.3+."
>>
>>
>>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Dongjoon Hyun <do...@apache.org>.
Hi, Daniel.

I guess that kind of works will start sufficiently in 2.1.0 after PMC's
annoucement/reminder on mailing list.

Bests,
Dongjoon.


On Wednesday, October 26, 2016, Daniel Siegmann <
dsiegmann@securityscorecard.io> wrote:

> Is the deprecation of JDK 7 and Scala 2.10 documented anywhere outside the
> release notes for Spark 2.0.0? I do not consider release notes to be
> sufficient public notice for deprecation of supported platforms - this
> should be noted in the documentation somewhere. Here are on the only
> mentions I could find:
>
> At http://spark.apache.org/downloads.html it says:
>
> "*Note: Starting version 2.0, Spark is built with Scala 2.11 by default.
> Scala 2.10 users should download the Spark source package and build with
> Scala 2.10 support
> <http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-210>."*
>
> At http://spark.apache.org/docs/latest/#downloading it says:
>
> "Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API,
> Spark 2.0.1 uses Scala 2.11. You will need to use a compatible Scala
> version (2.11.x)."
>
> At http://spark.apache.org/docs/latest/programming-guide.html#
> linking-with-spark it says:
>
>    - "Spark 2.0.1 is built and distributed to work with Scala 2.11 by
>    default. (Spark can be built to work with other versions of Scala, too.) To
>    write applications in Scala, you will need to use a compatible Scala
>    version (e.g. 2.11.X)."
>    - "Spark 2.0.1 works with Java 7 and higher. If you are using Java 8,
>    Spark supports lambda expressions
>    <http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html>
>    for concisely writing functions, otherwise you can use the classes in the
>    org.apache.spark.api.java.function
>    <http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/function/package-summary.html>
>    package."
>    - "Spark 2.0.1 works with Python 2.6+ or Python 3.4+. It can use the
>    standard CPython interpreter, so C libraries like NumPy can be used. It
>    also works with PyPy 2.3+."
>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Daniel Siegmann <ds...@securityscorecard.io>.
Is the deprecation of JDK 7 and Scala 2.10 documented anywhere outside the
release notes for Spark 2.0.0? I do not consider release notes to be
sufficient public notice for deprecation of supported platforms - this
should be noted in the documentation somewhere. Here are on the only
mentions I could find:

At http://spark.apache.org/downloads.html it says:

"*Note: Starting version 2.0, Spark is built with Scala 2.11 by default.
Scala 2.10 users should download the Spark source package and build with
Scala 2.10 support
<http://spark.apache.org/docs/latest/building-spark.html#building-for-scala-210>."*

At http://spark.apache.org/docs/latest/#downloading it says:

"Spark runs on Java 7+, Python 2.6+/3.4+ and R 3.1+. For the Scala API,
Spark 2.0.1 uses Scala 2.11. You will need to use a compatible Scala
version (2.11.x)."

At
http://spark.apache.org/docs/latest/programming-guide.html#linking-with-spark
it says:

   - "Spark 2.0.1 is built and distributed to work with Scala 2.11 by
   default. (Spark can be built to work with other versions of Scala, too.) To
   write applications in Scala, you will need to use a compatible Scala
   version (e.g. 2.11.X)."
   - "Spark 2.0.1 works with Java 7 and higher. If you are using Java 8,
   Spark supports lambda expressions
   <http://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html>
   for concisely writing functions, otherwise you can use the classes in the
   org.apache.spark.api.java.function
   <http://spark.apache.org/docs/latest/api/java/index.html?org/apache/spark/api/java/function/package-summary.html>
   package."
   - "Spark 2.0.1 works with Python 2.6+ or Python 3.4+. It can use the
   standard CPython interpreter, so C libraries like NumPy can be used. It
   also works with PyPy 2.3+."

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Dongjoon Hyun <do...@apache.org>.
Hi, All.

It's great since it's a progress.

Then, at least, in 2017, Spark 2.2.0 will be out with JDK8 and Scala 2.11/2.12, right?

Bests,
Dongjoon.

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Straw poll: dropping support for things like Scala 2.10

Posted by Nicholas Chammas <ni...@gmail.com>.
Agreed. Would an announcement/reminder on the dev and user lists suffice in
this case? Basically, just point out what's already been mentioned in the
2.0 release notes, and include a link there so people know what we're
referencing.
2016년 10월 25일 (화) 오후 5:32, Mark Hamstra <ma...@clearstorydata.com>님이 작성:

> You're right; so we could remove Java 7 support in 2.1.0.
>
> Both Holden and I not having the facts immediately to mind does suggest,
> however, that we should be doing a better job of making sure that
> information about deprecated language versions is inescapably public.
> That's harder to do with a language version deprecation since using such a
> version doesn't really give you the same kind of repeated warnings that
> using a deprecated API does.
>
> On Tue, Oct 25, 2016 at 12:59 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
> No, I think our intent is that using a deprecated language version can
> generate warnings, but that it should still work; whereas once we remove
> support for a language version, then it really is ok for Spark developers
> to do things not compatible with that version and for users attempting to
> use that version to encounter errors.
>
> OK, understood.
>
> With that understanding, the first steps toward removing support for Scala
> 2.10 and/or Java 7 would be to deprecate them in 2.1.0. Actual removal of
> support could then occur at the earliest in 2.2.0.
>
> Java 7 is already deprecated per the 2.0 release notes which I linked to. Here
> they are
> <http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations>
> again.
> ​
>
> On Tue, Oct 25, 2016 at 3:19 PM Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
> No, I think our intent is that using a deprecated language version can
> generate warnings, but that it should still work; whereas once we remove
> support for a language version, then it really is ok for Spark developers to
> do things not compatible with that version and for users attempting to use
> that version to encounter errors.
>
> With that understanding, the first steps toward removing support for Scala
> 2.10 and/or Java 7 would be to deprecate them in 2.1.0.  Actual removal of
> support could then occur at the earliest in 2.2.0.
>
> On Tue, Oct 25, 2016 at 12:13 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
> FYI: Support for both Python 2.6 and Java 7 was deprecated in 2.0 (see release
> notes <http://spark.apache.org/releases/spark-release-2-0-0.html> under
> Deprecations). The deprecation notice didn't offer a specific timeline for
> completely dropping support other than to say they "might be removed in
> future versions of Spark 2.x".
>
> Not sure what the distinction between deprecating and dropping support is
> for language versions, since in both cases it seems like it's OK to do
> things not compatible with the deprecated versions.
>
> Nick
>
>
> On Tue, Oct 25, 2016 at 11:50 AM Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
> I'd also like to add Python 2.6 to the list of things. We've considered
> dropping it before but never followed through to the best of my knowledge
> (although on mobile right now so can't double check).
>
> On Tuesday, October 25, 2016, Sean Owen <so...@cloudera.com> wrote:
>
> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>
>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Mark Hamstra <ma...@clearstorydata.com>.
You're right; so we could remove Java 7 support in 2.1.0.

Both Holden and I not having the facts immediately to mind does suggest,
however, that we should be doing a better job of making sure that
information about deprecated language versions is inescapably public.
That's harder to do with a language version deprecation since using such a
version doesn't really give you the same kind of repeated warnings that
using a deprecated API does.

On Tue, Oct 25, 2016 at 12:59 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> No, I think our intent is that using a deprecated language version can
> generate warnings, but that it should still work; whereas once we remove
> support for a language version, then it really is ok for Spark developers
> to do things not compatible with that version and for users attempting to
> use that version to encounter errors.
>
> OK, understood.
>
> With that understanding, the first steps toward removing support for Scala
> 2.10 and/or Java 7 would be to deprecate them in 2.1.0. Actual removal of
> support could then occur at the earliest in 2.2.0.
>
> Java 7 is already deprecated per the 2.0 release notes which I linked to. Here
> they are
> <http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations>
> again.
> ​
>
> On Tue, Oct 25, 2016 at 3:19 PM Mark Hamstra <ma...@clearstorydata.com>
> wrote:
>
>> No, I think our intent is that using a deprecated language version can
>> generate warnings, but that it should still work; whereas once we remove
>> support for a language version, then it really is ok for Spark developers to
>> do things not compatible with that version and for users attempting to use
>> that version to encounter errors.
>>
>> With that understanding, the first steps toward removing support for
>> Scala 2.10 and/or Java 7 would be to deprecate them in 2.1.0.  Actual
>> removal of support could then occur at the earliest in 2.2.0.
>>
>> On Tue, Oct 25, 2016 at 12:13 PM, Nicholas Chammas <
>> nicholas.chammas@gmail.com> wrote:
>>
>> FYI: Support for both Python 2.6 and Java 7 was deprecated in 2.0 (see release
>> notes <http://spark.apache.org/releases/spark-release-2-0-0.html> under
>> Deprecations). The deprecation notice didn't offer a specific timeline for
>> completely dropping support other than to say they "might be removed in
>> future versions of Spark 2.x".
>>
>> Not sure what the distinction between deprecating and dropping support is
>> for language versions, since in both cases it seems like it's OK to do
>> things not compatible with the deprecated versions.
>>
>> Nick
>>
>>
>> On Tue, Oct 25, 2016 at 11:50 AM Holden Karau <ho...@pigscanfly.ca>
>> wrote:
>>
>> I'd also like to add Python 2.6 to the list of things. We've considered
>> dropping it before but never followed through to the best of my knowledge
>> (although on mobile right now so can't double check).
>>
>> On Tuesday, October 25, 2016, Sean Owen <so...@cloudera.com> wrote:
>>
>> I'd like to gauge where people stand on the issue of dropping support for
>> a few things that were considered for 2.0.
>>
>> First: Scala 2.10. We've seen a number of build breakages this week
>> because the PR builder only tests 2.11. No big deal at this stage, but, it
>> did cause me to wonder whether it's time to plan to drop 2.10 support,
>> especially with 2.12 coming soon.
>>
>> Next, Java 7. It's reasonably old and out of public updates at this
>> stage. It's not that painful to keep supporting, to be honest. It would
>> simplify some bits of code, some scripts, some testing.
>>
>> Hadoop versions: I think the the general argument is that most anyone
>> would be using, at the least, 2.6, and it would simplify some code that has
>> to reflect to use not-even-that-new APIs. It would remove some moderate
>> complexity in the build.
>>
>>
>> "When" is a tricky question. Although it's a little aggressive for minor
>> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
>> out of the question, though coming soon. What about ... 2.2.0?
>>
>>
>> Although I tend to favor dropping support, I'm mostly asking for current
>> opinions.
>>
>>
>>
>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>>
>>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Nicholas Chammas <ni...@gmail.com>.
No, I think our intent is that using a deprecated language version can
generate warnings, but that it should still work; whereas once we remove
support for a language version, then it really is ok for Spark developers
to do things not compatible with that version and for users attempting to
use that version to encounter errors.

OK, understood.

With that understanding, the first steps toward removing support for Scala
2.10 and/or Java 7 would be to deprecate them in 2.1.0. Actual removal of
support could then occur at the earliest in 2.2.0.

Java 7 is already deprecated per the 2.0 release notes which I linked to. Here
they are
<http://spark.apache.org/releases/spark-release-2-0-0.html#deprecations>
again.
​

On Tue, Oct 25, 2016 at 3:19 PM Mark Hamstra <ma...@clearstorydata.com>
wrote:

> No, I think our intent is that using a deprecated language version can
> generate warnings, but that it should still work; whereas once we remove
> support for a language version, then it really is ok for Spark developers to
> do things not compatible with that version and for users attempting to use
> that version to encounter errors.
>
> With that understanding, the first steps toward removing support for Scala
> 2.10 and/or Java 7 would be to deprecate them in 2.1.0.  Actual removal of
> support could then occur at the earliest in 2.2.0.
>
> On Tue, Oct 25, 2016 at 12:13 PM, Nicholas Chammas <
> nicholas.chammas@gmail.com> wrote:
>
> FYI: Support for both Python 2.6 and Java 7 was deprecated in 2.0 (see release
> notes <http://spark.apache.org/releases/spark-release-2-0-0.html> under
> Deprecations). The deprecation notice didn't offer a specific timeline for
> completely dropping support other than to say they "might be removed in
> future versions of Spark 2.x".
>
> Not sure what the distinction between deprecating and dropping support is
> for language versions, since in both cases it seems like it's OK to do
> things not compatible with the deprecated versions.
>
> Nick
>
>
> On Tue, Oct 25, 2016 at 11:50 AM Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
> I'd also like to add Python 2.6 to the list of things. We've considered
> dropping it before but never followed through to the best of my knowledge
> (although on mobile right now so can't double check).
>
> On Tuesday, October 25, 2016, Sean Owen <so...@cloudera.com> wrote:
>
> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Mark Hamstra <ma...@clearstorydata.com>.
No, I think our intent is that using a deprecated language version can
generate warnings, but that it should still work; whereas once we remove
support for a language version, then it really is ok for Spark developers to
do things not compatible with that version and for users attempting to use
that version to encounter errors.

With that understanding, the first steps toward removing support for Scala
2.10 and/or Java 7 would be to deprecate them in 2.1.0.  Actual removal of
support could then occur at the earliest in 2.2.0.

On Tue, Oct 25, 2016 at 12:13 PM, Nicholas Chammas <
nicholas.chammas@gmail.com> wrote:

> FYI: Support for both Python 2.6 and Java 7 was deprecated in 2.0 (see release
> notes <http://spark.apache.org/releases/spark-release-2-0-0.html> under
> Deprecations). The deprecation notice didn't offer a specific timeline for
> completely dropping support other than to say they "might be removed in
> future versions of Spark 2.x".
>
> Not sure what the distinction between deprecating and dropping support is
> for language versions, since in both cases it seems like it's OK to do
> things not compatible with the deprecated versions.
>
> Nick
>
>
> On Tue, Oct 25, 2016 at 11:50 AM Holden Karau <ho...@pigscanfly.ca>
> wrote:
>
>> I'd also like to add Python 2.6 to the list of things. We've considered
>> dropping it before but never followed through to the best of my knowledge
>> (although on mobile right now so can't double check).
>>
>> On Tuesday, October 25, 2016, Sean Owen <so...@cloudera.com> wrote:
>>
>> I'd like to gauge where people stand on the issue of dropping support for
>> a few things that were considered for 2.0.
>>
>> First: Scala 2.10. We've seen a number of build breakages this week
>> because the PR builder only tests 2.11. No big deal at this stage, but, it
>> did cause me to wonder whether it's time to plan to drop 2.10 support,
>> especially with 2.12 coming soon.
>>
>> Next, Java 7. It's reasonably old and out of public updates at this
>> stage. It's not that painful to keep supporting, to be honest. It would
>> simplify some bits of code, some scripts, some testing.
>>
>> Hadoop versions: I think the the general argument is that most anyone
>> would be using, at the least, 2.6, and it would simplify some code that has
>> to reflect to use not-even-that-new APIs. It would remove some moderate
>> complexity in the build.
>>
>>
>> "When" is a tricky question. Although it's a little aggressive for minor
>> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
>> out of the question, though coming soon. What about ... 2.2.0?
>>
>>
>> Although I tend to favor dropping support, I'm mostly asking for current
>> opinions.
>>
>>
>>
>> --
>> Cell : 425-233-8271 <(425)%20233-8271>
>> Twitter: https://twitter.com/holdenkarau
>>
>>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Nicholas Chammas <ni...@gmail.com>.
FYI: Support for both Python 2.6 and Java 7 was deprecated in 2.0 (see release
notes <http://spark.apache.org/releases/spark-release-2-0-0.html> under
Deprecations). The deprecation notice didn't offer a specific timeline for
completely dropping support other than to say they "might be removed in
future versions of Spark 2.x".

Not sure what the distinction between deprecating and dropping support is
for language versions, since in both cases it seems like it's OK to do
things not compatible with the deprecated versions.

Nick


On Tue, Oct 25, 2016 at 11:50 AM Holden Karau <ho...@pigscanfly.ca> wrote:

> I'd also like to add Python 2.6 to the list of things. We've considered
> dropping it before but never followed through to the best of my knowledge
> (although on mobile right now so can't double check).
>
> On Tuesday, October 25, 2016, Sean Owen <so...@cloudera.com> wrote:
>
> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>
>
>
> --
> Cell : 425-233-8271 <(425)%20233-8271>
> Twitter: https://twitter.com/holdenkarau
>
>

Re: Straw poll: dropping support for things like Scala 2.10

Posted by Holden Karau <ho...@pigscanfly.ca>.
I'd also like to add Python 2.6 to the list of things. We've considered
dropping it before but never followed through to the best of my knowledge
(although on mobile right now so can't double check).

On Tuesday, October 25, 2016, Sean Owen <so...@cloudera.com> wrote:

> I'd like to gauge where people stand on the issue of dropping support for
> a few things that were considered for 2.0.
>
> First: Scala 2.10. We've seen a number of build breakages this week
> because the PR builder only tests 2.11. No big deal at this stage, but, it
> did cause me to wonder whether it's time to plan to drop 2.10 support,
> especially with 2.12 coming soon.
>
> Next, Java 7. It's reasonably old and out of public updates at this stage.
> It's not that painful to keep supporting, to be honest. It would simplify
> some bits of code, some scripts, some testing.
>
> Hadoop versions: I think the the general argument is that most anyone
> would be using, at the least, 2.6, and it would simplify some code that has
> to reflect to use not-even-that-new APIs. It would remove some moderate
> complexity in the build.
>
>
> "When" is a tricky question. Although it's a little aggressive for minor
> releases, I think these will all happen before 3.x regardless. 2.1.0 is not
> out of the question, though coming soon. What about ... 2.2.0?
>
>
> Although I tend to favor dropping support, I'm mostly asking for current
> opinions.
>


-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau