You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/01/19 11:11:26 UTC

[jira] [Updated] (SPARK-19287) JavaPairRDD flatMapValues requires function returning Iterable, not Iterator

     [ https://issues.apache.org/jira/browse/SPARK-19287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sean Owen updated SPARK-19287:
------------------------------
    Target Version/s: 3.0.0

For the moment, targeting this to a hypothetical future Spark 3.x, because at least, this can be fixed in a major release. We may do something earlier for 2.x.

> JavaPairRDD flatMapValues requires function returning Iterable, not Iterator
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-19287
>                 URL: https://issues.apache.org/jira/browse/SPARK-19287
>             Project: Spark
>          Issue Type: Bug
>          Components: Java API
>    Affects Versions: 2.1.1
>            Reporter: Sean Owen
>            Priority: Minor
>
> SPARK-3369 corrected an old oversight in the Java API, wherein {{FlatMapFunction}} required an {{Iterable}} rather than {{Iterator}}. As reported by [~akrim], it seems that this same type of problem was overlooked also in {{JavaPairRDD}} (https://github.com/apache/spark/blob/6c00c069e3c3f5904abd122cea1d56683031cca0/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677 ):
> {code}
> def flatMapValues[U](f: JFunction[V, java.lang.Iterable[U]]): JavaPairRDD[K, U] =
> {code}
> As in {{PairRDDFunctions.scala}}, whose {{flatMapValues}} operates on {{TraversableOnce}}, this should really take a function that returns an {{Iterator}} -- really, {{FlatMapFunction}}.
> We can easily add an overload and deprecate the existing method.
> {code}
> def flatMapValues[U](f: FlatMapFunction[V, U]): JavaPairRDD[K, U]
> {code}
> This is source- and binary-backwards-compatible, in Java 7. It's binary-backwards-compatible in Java 8, but not source-compatible. The following natural usage with Java 8 lambdas becomes ambiguous and won't compile -- Java won't figure out which to implement even based on the return type unfortunately:
> {code}
> JavaPairRDD<Integer, String> pairRDD = ...
> JavaPairRDD<Integer, Integer> mappedRDD = 
>   pairRDD.flatMapValues(s -> Arrays.asList(s.length()).iterator());
> {code}
> It can be resolved by explicitly casting the lambda.
> We can at least document this. One day in Spark 3.x this can just be changed outright.
> It's conceivable to resolve this by making the new method called "flatMapValues2" or something ugly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org