You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2017/01/19 11:09:27 UTC

[jira] [Created] (SPARK-19287) JavaPairRDD flatMapValues requires function returning Iterable, not Iterator

Sean Owen created SPARK-19287:
---------------------------------

             Summary: JavaPairRDD flatMapValues requires function returning Iterable, not Iterator
                 Key: SPARK-19287
                 URL: https://issues.apache.org/jira/browse/SPARK-19287
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 2.1.1
            Reporter: Sean Owen
            Priority: Minor


SPARK-3369 corrected an old oversight in the Java API, wherein {{FlatMapFunction}} required an {{Iterable}} rather than {{Iterator}}. As reported by [~akrim], it seems that this same type of problem was overlooked also in {{JavaPairRDD}} (https://github.com/apache/spark/blob/6c00c069e3c3f5904abd122cea1d56683031cca0/core/src/main/scala/org/apache/spark/api/java/JavaPairRDD.scala#L677 ):

{code}
def flatMapValues[U](f: JFunction[V, java.lang.Iterable[U]]): JavaPairRDD[K, U] =
{code}

As in {{PairRDDFunctions.scala}}, whose {{flatMapValues}} operates on {{TraversableOnce}}, this should really take a function that returns an {{Iterator}} -- really, {{FlatMapFunction}}.

We can easily add an overload and deprecate the existing method.

{code}
def flatMapValues[U](f: FlatMapFunction[V, U]): JavaPairRDD[K, U]
{code}

This is source- and binary-backwards-compatible, in Java 7. It's binary-backwards-compatible in Java 8, but not source-compatible. The following natural usage with Java 8 lambdas becomes ambiguous and won't compile -- Java won't figure out which to implement even based on the return type unfortunately:

{code}
JavaPairRDD<Integer, String> pairRDD = ...
JavaPairRDD<Integer, Integer> mappedRDD = 
  pairRDD.flatMapValues(s -> Arrays.asList(s.length()).iterator());
{code}

It can be resolved by explicitly casting the lambda.

We can at least document this. One day in Spark 3.x this can just be changed outright.

It's conceivable to resolve this by making the new method called "flatMapValues2" or something ugly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org