You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (Jira)" <ji...@apache.org> on 2019/08/22 01:33:00 UTC

[jira] [Commented] (SPARK-14643) Remove overloaded methods which become ambiguous in Scala 2.12

    [ https://issues.apache.org/jira/browse/SPARK-14643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16912842#comment-16912842 ] 

Sean Owen commented on SPARK-14643:
-----------------------------------

I took another look at this. I tried implementing Josh's proposal, and while it begins to work for map(), I found I quickly ran into some complications. First is the visibility of the implicit conversion from Function1 to MapFunction; it requires importing from org.apache.spark.sql._ now for Scala users, no? Second, things like mapPartitions and MapPartitionsFunction will require a round-trip conversion to/from Java Iterators / Scala Iterators, which is a little overhead, to pipe Scala users through the Java-specific overload. I also had trouble getting that to work in cases where the mapPartitions returns a Iterator on primitive type, which Java iterators won't support. They may be solvable, but it was getting messy and not just in Dataset, and that makes me uneasy.

Right now, we have already kind of had Java users eat this problem if they compile vs 2.12, and, I haven't heard much about it. They have to cast their lambdas to MapFunction et al to disambiguate. This isn't great, but not the end of the world.

Why not just have Java callers call the Function1 overload that exists? delete the Java-specific overload? I get that it means they depend on a Scala class and that's a complication, but lambdas will hide that. Now that we require Java 8, and can accept a breaking change in Spark 3, if I'm reading [~joshrosen]'s doc correctly, that's viable? Well, they'd have to for example convert to Scala Iterators in the case of mapPartitions, which is the flip-side to the problem above. That's quite hard for Java users. That is, map() works out just fine, not so much mapPartitions() in Java.

I'm inclined to say... leave it? it's a minor inconvenience for Java users right now, and there are already minor inconveniences for Java users calling this Scala-based system ({{$MODULE$}} anyone?)

> Remove overloaded methods which become ambiguous in Scala 2.12
> --------------------------------------------------------------
>
>                 Key: SPARK-14643
>                 URL: https://issues.apache.org/jira/browse/SPARK-14643
>             Project: Spark
>          Issue Type: Task
>          Components: Build, Project Infra
>    Affects Versions: 2.4.0
>            Reporter: Josh Rosen
>            Assignee: Josh Rosen
>            Priority: Major
>
> Spark 1.x's Dataset API runs into subtle source incompatibility problems for Java 8 and Scala 2.12 users when Spark is built against Scala 2.12. In a nutshell, the current API has overloaded methods whose signatures are ambiguous when resolving calls that use the Java 8 lambda syntax (only if Spark is build against Scala 2.12).
> This issue is somewhat subtle, so there's a full writeup at https://docs.google.com/document/d/1P_wmH3U356f079AYgSsN53HKixuNdxSEvo8nw_tgLgM/edit?usp=sharing which describes the exact circumstances under which the current APIs are problematic. The writeup also proposes a solution which involves the removal of certain overloads only in Scala 2.12 builds of Spark and the introduction of implicit conversions for retaining source compatibility.
> We don't need to implement any of these changes until we add Scala 2.12 support since the changes must only be applied when building against Scala 2.12 and will be done via traits + shims which are mixed in via per-Scala-version source directories (like how we handle the Scala-version-specific parts of the REPL). For now, this JIRA acts as a placeholder so that the parent JIRA reflects the complete set of tasks which need to be finished for 2.12 support.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org