You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Sean Owen <sr...@apache.org> on 2019/12/01 18:50:07 UTC

Status of Scala 2.13 support

As you can see, I've been working on Scala 2.13 support. The umbrella
is https://issues.apache.org/jira/browse/SPARK-25075 I wanted to lay
out status and strategy.

This will not be done for 3.0. At the least, there are a few key
dependencies (Chill, Kafka) that aren't published for 2.13, and at
least one change that will need removing an API deprecated as of 3.0.
Realistically: maybe Spark 3.1. I don't yet think it's pressing.


Making the change is difficult as it's hard to understand the extent
of the necessary changes until the whole thing minimally compiles for
2.13. I have gotten essentially that far in a local clone. The good
news is I don't see any obvious hard blockers, but the changes add up
to thousands of line in 200+ files.


What do we need to do for 3.0? any changes that entail breaking a
public API, ideally. The biggest issue there comes from extensive
changes to the Scala collection hierarchy mean that the types of many
public APIs that return a Seq, Map, TraversableOnce, etc _will_
actually change types in 2.13 (become immutable). See:
https://issues.apache.org/jira/browse/SPARK-27683 and
https://issues.apache.org/jira/browse/SPARK-29292 as the main
examples.

In both cases, keeping the exact same public type would require much
bigger changes. These are the type of changes that all applications
face when migrating to 2.13 though. 2.12 and 2.13 apps were never
meant to be binary-compatible. So, in both cases we're not changing
these, to avoid a lot of change and parallel source trees.

I _think_ we're done with any other must-do changes for 3.0, therefore.


What _can_ we do for 3.0? small changes that don't affect the 2.12
build are OK, and that's what you see in pull requests going in at the
moment. The big question is whether we want to do the large change for
https://issues.apache.org/jira/browse/SPARK-29292 before 3.0. It will
mean adding a ton of ".toSeq" and ".toMap" calls to make mutable
collections immutable when passed to methods. In theory, it won't
affect behavior. We'll have to see if it does in practice.

The rest will have to wait until after 3.0, I believe, including even
testing the 2.13 build, which will probably turn up some more issues.


Thoughts on approach?

---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscribe@spark.apache.org


Re: Status of Scala 2.13 support

Posted by Dongjoon Hyun <do...@gmail.com>.
Thank you for sharing the status, Sean.

Given the current circumstance, our status and approach sounds realistic to
me.

+1 for continuing after cutting `branch-3.0`.

Bests,
Dongjoon.


On Sun, Dec 1, 2019 at 10:50 AM Sean Owen <sr...@apache.org> wrote:

> As you can see, I've been working on Scala 2.13 support. The umbrella
> is https://issues.apache.org/jira/browse/SPARK-25075 I wanted to lay
> out status and strategy.
>
> This will not be done for 3.0. At the least, there are a few key
> dependencies (Chill, Kafka) that aren't published for 2.13, and at
> least one change that will need removing an API deprecated as of 3.0.
> Realistically: maybe Spark 3.1. I don't yet think it's pressing.
>
>
> Making the change is difficult as it's hard to understand the extent
> of the necessary changes until the whole thing minimally compiles for
> 2.13. I have gotten essentially that far in a local clone. The good
> news is I don't see any obvious hard blockers, but the changes add up
> to thousands of line in 200+ files.
>
>
> What do we need to do for 3.0? any changes that entail breaking a
> public API, ideally. The biggest issue there comes from extensive
> changes to the Scala collection hierarchy mean that the types of many
> public APIs that return a Seq, Map, TraversableOnce, etc _will_
> actually change types in 2.13 (become immutable). See:
> https://issues.apache.org/jira/browse/SPARK-27683 and
> https://issues.apache.org/jira/browse/SPARK-29292 as the main
> examples.
>
> In both cases, keeping the exact same public type would require much
> bigger changes. These are the type of changes that all applications
> face when migrating to 2.13 though. 2.12 and 2.13 apps were never
> meant to be binary-compatible. So, in both cases we're not changing
> these, to avoid a lot of change and parallel source trees.
>
> I _think_ we're done with any other must-do changes for 3.0, therefore.
>
>
> What _can_ we do for 3.0? small changes that don't affect the 2.12
> build are OK, and that's what you see in pull requests going in at the
> moment. The big question is whether we want to do the large change for
> https://issues.apache.org/jira/browse/SPARK-29292 before 3.0. It will
> mean adding a ton of ".toSeq" and ".toMap" calls to make mutable
> collections immutable when passed to methods. In theory, it won't
> affect behavior. We'll have to see if it does in practice.
>
> The rest will have to wait until after 3.0, I believe, including even
> testing the 2.13 build, which will probably turn up some more issues.
>
>
> Thoughts on approach?
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: dev-unsubscribe@spark.apache.org
>
>