You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@spark.apache.org by Olivier Girardot <o....@lateral-thoughts.com> on 2015/04/17 15:07:06 UTC

[Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Hi everyone,
I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
reproduce it in a small test case close to the actual documentation
<https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection>,
so sorry for the long mail, but this is "Java" :

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.DataFrame;
import org.apache.spark.sql.SQLContext;

import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;

class Movie implements Serializable {
    private int id;
    private String name;

    public Movie(int id, String name) {
        this.id = id;
        this.name = name;
    }

    public int getId() {
        return id;
    }

    public void setId(int id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

public class SparkSQLTest {
    public static void main(String[] args) {
        SparkConf conf = new SparkConf();
        conf.setAppName("My Application");
        conf.setMaster("local");
        JavaSparkContext sc = new JavaSparkContext(conf);

        ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
        movieArrayList.add(new Movie(1, "Indiana Jones"));

        JavaRDD<Movie> movies = sc.parallelize(movieArrayList);

        SQLContext sqlContext = new SQLContext(sc);
        DataFrame frame = sqlContext.applySchema(movies, Movie.class);
        frame.registerTempTable("movies");

        sqlContext.sql("select name from movies")

*                .map(row -> row.getString(0)) // this is what i would
expect to work *                .collect();
    }
}


But this does not compile, here's the compilation error :

[ERROR]
/Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
method map in class org.apache.spark.sql.DataFrame cannot be applied to
given types;
[ERROR] *required:
scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
[ERROR]* found: (row)->"Na[...]ng(0) *
[ERROR] *reason: cannot infer type-variable(s) R *
[ERROR] *(actual and formal argument lists differ in length) *
[ERROR]
/Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
method map in class org.apache.spark.sql.DataFrame cannot be applied to
given types;
[ERROR] required:
scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
[ERROR] found: (row)->row[...]ng(0)
[ERROR] reason: cannot infer type-variable(s) R
[ERROR] (actual and formal argument lists differ in length)
[ERROR] -> [Help 1]

Because in the DataFrame the *map *method is defined as :

[image: Images intégrées 1]

And once this is translated to bytecode the actual Java signature uses a
Function1 and adds a ClassTag parameter.
I can try to go around this and use the scala.reflect.ClassTag$ like that :

ClassTag$.MODULE$.apply(String.class)

To get the second ClassTag parameter right, but then instantiating a
java.util.Function or using the Java 8 lambdas fail to work, and if I
try to instantiate a proper scala Function1... well this is a world of
pain.

This is a regression introduced by the 1.3.x DataFrame because
JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
not callable with JFunctions), I can open a Jira if you want ?

Regards,

-- 
*Olivier Girardot* | Associé
o.girardot@lateral-thoughts.com
+33 6 24 09 17 94

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

another PR I guess :) here's the associated Jira
https://issues.apache.org/jira/browse/SPARK-6988

Le ven. 17 avr. 2015 à 23:00, Reynold Xin <rx...@databricks.com> a écrit :

> No there isn't a convention. Although if you want to show java 8, you
> should also show java 6/7 syntax since there are still more 7 users than 8.
>
>
> On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
>> Is there any convention *not* to show java 8 versions in the
>> documentation ?
>>
>> Le ven. 17 avr. 2015 à 21:39, Reynold Xin <rx...@databricks.com> a écrit :
>>
>>> Please do! Thanks.
>>>
>>>
>>> On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot <
>>> o.girardot@lateral-thoughts.com> wrote:
>>>
>>>> Ok, do you want me to open a pull request to fix the dedicated
>>>> documentation ?
>>>>
>>>> Le ven. 17 avr. 2015 à 18:14, Reynold Xin <rx...@databricks.com> a
>>>> écrit :
>>>>
>>>>> I think in 1.3 and above, you'd need to do
>>>>>
>>>>> .sql(...).javaRDD().map(..)
>>>>>
>>>>> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot <
>>>>> o.girardot@lateral-thoughts.com> wrote:
>>>>>
>>>>>> Yes thanks !
>>>>>>
>>>>>> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :
>>>>>>
>>>>>> > The image didn't go through.
>>>>>> >
>>>>>> > I think you were referring to:
>>>>>> >   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>>>>>> >
>>>>>> > Cheers
>>>>>> >
>>>>>> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
>>>>>> > o.girardot@lateral-thoughts.com> wrote:
>>>>>> >
>>>>>> > > Hi everyone,
>>>>>> > > I had an issue trying to use Spark SQL from Java (8 or 7), I
>>>>>> tried to
>>>>>> > > reproduce it in a small test case close to the actual
>>>>>> documentation
>>>>>> > > <
>>>>>> >
>>>>>> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>>>>> > >,
>>>>>> > > so sorry for the long mail, but this is "Java" :
>>>>>> > >
>>>>>> > > import org.apache.spark.api.java.JavaRDD;
>>>>>> > > import org.apache.spark.api.java.JavaSparkContext;
>>>>>> > > import org.apache.spark.sql.DataFrame;
>>>>>> > > import org.apache.spark.sql.SQLContext;
>>>>>> > >
>>>>>> > > import java.io.Serializable;
>>>>>> > > import java.util.ArrayList;
>>>>>> > > import java.util.Arrays;
>>>>>> > > import java.util.List;
>>>>>> > >
>>>>>> > > class Movie implements Serializable {
>>>>>> > >     private int id;
>>>>>> > >     private String name;
>>>>>> > >
>>>>>> > >     public Movie(int id, String name) {
>>>>>> > >         this.id = id;
>>>>>> > >         this.name = name;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public int getId() {
>>>>>> > >         return id;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public void setId(int id) {
>>>>>> > >         this.id = id;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public String getName() {
>>>>>> > >         return name;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public void setName(String name) {
>>>>>> > >         this.name = name;
>>>>>> > >     }
>>>>>> > > }
>>>>>> > >
>>>>>> > > public class SparkSQLTest {
>>>>>> > >     public static void main(String[] args) {
>>>>>> > >         SparkConf conf = new SparkConf();
>>>>>> > >         conf.setAppName("My Application");
>>>>>> > >         conf.setMaster("local");
>>>>>> > >         JavaSparkContext sc = new JavaSparkContext(conf);
>>>>>> > >
>>>>>> > >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>>>>>> > >         movieArrayList.add(new Movie(1, "Indiana Jones"));
>>>>>> > >
>>>>>> > >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>>>>>> > >
>>>>>> > >         SQLContext sqlContext = new SQLContext(sc);
>>>>>> > >         DataFrame frame = sqlContext.applySchema(movies,
>>>>>> Movie.class);
>>>>>> > >         frame.registerTempTable("movies");
>>>>>> > >
>>>>>> > >         sqlContext.sql("select name from movies")
>>>>>> > >
>>>>>> > > *                .map(row -> row.getString(0)) // this is what i
>>>>>> would
>>>>>> > expect to work *                .collect();
>>>>>> > >     }
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > But this does not compile, here's the compilation error :
>>>>>> > >
>>>>>> > > [ERROR]
>>>>>> > >
>>>>>> >
>>>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
>>>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>>>> applied to
>>>>>> > > given types;
>>>>>> > > [ERROR] *required:
>>>>>> > >
>>>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
>>>>>> > > [ERROR]* found: (row)->"Na[...]ng(0) *
>>>>>> > > [ERROR] *reason: cannot infer type-variable(s) R *
>>>>>> > > [ERROR] *(actual and formal argument lists differ in length) *
>>>>>> > > [ERROR]
>>>>>> > >
>>>>>> >
>>>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
>>>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>>>> applied to
>>>>>> > > given types;
>>>>>> > > [ERROR] required:
>>>>>> > >
>>>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
>>>>>> > > [ERROR] found: (row)->row[...]ng(0)
>>>>>> > > [ERROR] reason: cannot infer type-variable(s) R
>>>>>> > > [ERROR] (actual and formal argument lists differ in length)
>>>>>> > > [ERROR] -> [Help 1]
>>>>>> > >
>>>>>> > > Because in the DataFrame the *map *method is defined as :
>>>>>> > >
>>>>>> > > [image: Images intégrées 1]
>>>>>> > >
>>>>>> > > And once this is translated to bytecode the actual Java signature
>>>>>> uses a
>>>>>> > > Function1 and adds a ClassTag parameter.
>>>>>> > > I can try to go around this and use the scala.reflect.ClassTag$
>>>>>> like
>>>>>> > that :
>>>>>> > >
>>>>>> > > ClassTag$.MODULE$.apply(String.class)
>>>>>> > >
>>>>>> > > To get the second ClassTag parameter right, but then
>>>>>> instantiating a
>>>>>> > java.util.Function or using the Java 8 lambdas fail to work, and if
>>>>>> I try
>>>>>> > to instantiate a proper scala Function1... well this is a world of
>>>>>> pain.
>>>>>> > >
>>>>>> > > This is a regression introduced by the 1.3.x DataFrame because
>>>>>> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and
>>>>>> are not
>>>>>> > callable with JFunctions), I can open a Jira if you want ?
>>>>>> > >
>>>>>> > > Regards,
>>>>>> > >
>>>>>> > > --
>>>>>> > > *Olivier Girardot* | Associé
>>>>>> > > o.girardot@lateral-thoughts.com
>>>>>> > > +33 6 24 09 17 94
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>
>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

and the PR: https://github.com/apache/spark/pull/5564

Thank you !

Olivier.

Le ven. 17 avr. 2015 à 23:00, Reynold Xin <rx...@databricks.com> a écrit :

> No there isn't a convention. Although if you want to show java 8, you
> should also show java 6/7 syntax since there are still more 7 users than 8.
>
>
> On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
>> Is there any convention *not* to show java 8 versions in the
>> documentation ?
>>
>> Le ven. 17 avr. 2015 à 21:39, Reynold Xin <rx...@databricks.com> a écrit :
>>
>>> Please do! Thanks.
>>>
>>>
>>> On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot <
>>> o.girardot@lateral-thoughts.com> wrote:
>>>
>>>> Ok, do you want me to open a pull request to fix the dedicated
>>>> documentation ?
>>>>
>>>> Le ven. 17 avr. 2015 à 18:14, Reynold Xin <rx...@databricks.com> a
>>>> écrit :
>>>>
>>>>> I think in 1.3 and above, you'd need to do
>>>>>
>>>>> .sql(...).javaRDD().map(..)
>>>>>
>>>>> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot <
>>>>> o.girardot@lateral-thoughts.com> wrote:
>>>>>
>>>>>> Yes thanks !
>>>>>>
>>>>>> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :
>>>>>>
>>>>>> > The image didn't go through.
>>>>>> >
>>>>>> > I think you were referring to:
>>>>>> >   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>>>>>> >
>>>>>> > Cheers
>>>>>> >
>>>>>> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
>>>>>> > o.girardot@lateral-thoughts.com> wrote:
>>>>>> >
>>>>>> > > Hi everyone,
>>>>>> > > I had an issue trying to use Spark SQL from Java (8 or 7), I
>>>>>> tried to
>>>>>> > > reproduce it in a small test case close to the actual
>>>>>> documentation
>>>>>> > > <
>>>>>> >
>>>>>> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>>>>> > >,
>>>>>> > > so sorry for the long mail, but this is "Java" :
>>>>>> > >
>>>>>> > > import org.apache.spark.api.java.JavaRDD;
>>>>>> > > import org.apache.spark.api.java.JavaSparkContext;
>>>>>> > > import org.apache.spark.sql.DataFrame;
>>>>>> > > import org.apache.spark.sql.SQLContext;
>>>>>> > >
>>>>>> > > import java.io.Serializable;
>>>>>> > > import java.util.ArrayList;
>>>>>> > > import java.util.Arrays;
>>>>>> > > import java.util.List;
>>>>>> > >
>>>>>> > > class Movie implements Serializable {
>>>>>> > >     private int id;
>>>>>> > >     private String name;
>>>>>> > >
>>>>>> > >     public Movie(int id, String name) {
>>>>>> > >         this.id = id;
>>>>>> > >         this.name = name;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public int getId() {
>>>>>> > >         return id;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public void setId(int id) {
>>>>>> > >         this.id = id;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public String getName() {
>>>>>> > >         return name;
>>>>>> > >     }
>>>>>> > >
>>>>>> > >     public void setName(String name) {
>>>>>> > >         this.name = name;
>>>>>> > >     }
>>>>>> > > }
>>>>>> > >
>>>>>> > > public class SparkSQLTest {
>>>>>> > >     public static void main(String[] args) {
>>>>>> > >         SparkConf conf = new SparkConf();
>>>>>> > >         conf.setAppName("My Application");
>>>>>> > >         conf.setMaster("local");
>>>>>> > >         JavaSparkContext sc = new JavaSparkContext(conf);
>>>>>> > >
>>>>>> > >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>>>>>> > >         movieArrayList.add(new Movie(1, "Indiana Jones"));
>>>>>> > >
>>>>>> > >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>>>>>> > >
>>>>>> > >         SQLContext sqlContext = new SQLContext(sc);
>>>>>> > >         DataFrame frame = sqlContext.applySchema(movies,
>>>>>> Movie.class);
>>>>>> > >         frame.registerTempTable("movies");
>>>>>> > >
>>>>>> > >         sqlContext.sql("select name from movies")
>>>>>> > >
>>>>>> > > *                .map(row -> row.getString(0)) // this is what i
>>>>>> would
>>>>>> > expect to work *                .collect();
>>>>>> > >     }
>>>>>> > > }
>>>>>> > >
>>>>>> > >
>>>>>> > > But this does not compile, here's the compilation error :
>>>>>> > >
>>>>>> > > [ERROR]
>>>>>> > >
>>>>>> >
>>>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
>>>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>>>> applied to
>>>>>> > > given types;
>>>>>> > > [ERROR] *required:
>>>>>> > >
>>>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
>>>>>> > > [ERROR]* found: (row)->"Na[...]ng(0) *
>>>>>> > > [ERROR] *reason: cannot infer type-variable(s) R *
>>>>>> > > [ERROR] *(actual and formal argument lists differ in length) *
>>>>>> > > [ERROR]
>>>>>> > >
>>>>>> >
>>>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
>>>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>>>> applied to
>>>>>> > > given types;
>>>>>> > > [ERROR] required:
>>>>>> > >
>>>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
>>>>>> > > [ERROR] found: (row)->row[...]ng(0)
>>>>>> > > [ERROR] reason: cannot infer type-variable(s) R
>>>>>> > > [ERROR] (actual and formal argument lists differ in length)
>>>>>> > > [ERROR] -> [Help 1]
>>>>>> > >
>>>>>> > > Because in the DataFrame the *map *method is defined as :
>>>>>> > >
>>>>>> > > [image: Images intégrées 1]
>>>>>> > >
>>>>>> > > And once this is translated to bytecode the actual Java signature
>>>>>> uses a
>>>>>> > > Function1 and adds a ClassTag parameter.
>>>>>> > > I can try to go around this and use the scala.reflect.ClassTag$
>>>>>> like
>>>>>> > that :
>>>>>> > >
>>>>>> > > ClassTag$.MODULE$.apply(String.class)
>>>>>> > >
>>>>>> > > To get the second ClassTag parameter right, but then
>>>>>> instantiating a
>>>>>> > java.util.Function or using the Java 8 lambdas fail to work, and if
>>>>>> I try
>>>>>> > to instantiate a proper scala Function1... well this is a world of
>>>>>> pain.
>>>>>> > >
>>>>>> > > This is a regression introduced by the 1.3.x DataFrame because
>>>>>> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and
>>>>>> are not
>>>>>> > callable with JFunctions), I can open a Jira if you want ?
>>>>>> > >
>>>>>> > > Regards,
>>>>>> > >
>>>>>> > > --
>>>>>> > > *Olivier Girardot* | Associé
>>>>>> > > o.girardot@lateral-thoughts.com
>>>>>> > > +33 6 24 09 17 94
>>>>>> > >
>>>>>> >
>>>>>>
>>>>>
>>>>>
>>>
>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Reynold Xin <rx...@databricks.com>.

No there isn't a convention. Although if you want to show java 8, you
should also show java 6/7 syntax since there are still more 7 users than 8.


On Fri, Apr 17, 2015 at 3:36 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Is there any convention *not* to show java 8 versions in the documentation
> ?
>
> Le ven. 17 avr. 2015 à 21:39, Reynold Xin <rx...@databricks.com> a écrit :
>
>> Please do! Thanks.
>>
>>
>> On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot <
>> o.girardot@lateral-thoughts.com> wrote:
>>
>>> Ok, do you want me to open a pull request to fix the dedicated
>>> documentation ?
>>>
>>> Le ven. 17 avr. 2015 à 18:14, Reynold Xin <rx...@databricks.com> a
>>> écrit :
>>>
>>>> I think in 1.3 and above, you'd need to do
>>>>
>>>> .sql(...).javaRDD().map(..)
>>>>
>>>> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot <
>>>> o.girardot@lateral-thoughts.com> wrote:
>>>>
>>>>> Yes thanks !
>>>>>
>>>>> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :
>>>>>
>>>>> > The image didn't go through.
>>>>> >
>>>>> > I think you were referring to:
>>>>> >   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>>>>> >
>>>>> > Cheers
>>>>> >
>>>>> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
>>>>> > o.girardot@lateral-thoughts.com> wrote:
>>>>> >
>>>>> > > Hi everyone,
>>>>> > > I had an issue trying to use Spark SQL from Java (8 or 7), I tried
>>>>> to
>>>>> > > reproduce it in a small test case close to the actual documentation
>>>>> > > <
>>>>> >
>>>>> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>>>> > >,
>>>>> > > so sorry for the long mail, but this is "Java" :
>>>>> > >
>>>>> > > import org.apache.spark.api.java.JavaRDD;
>>>>> > > import org.apache.spark.api.java.JavaSparkContext;
>>>>> > > import org.apache.spark.sql.DataFrame;
>>>>> > > import org.apache.spark.sql.SQLContext;
>>>>> > >
>>>>> > > import java.io.Serializable;
>>>>> > > import java.util.ArrayList;
>>>>> > > import java.util.Arrays;
>>>>> > > import java.util.List;
>>>>> > >
>>>>> > > class Movie implements Serializable {
>>>>> > >     private int id;
>>>>> > >     private String name;
>>>>> > >
>>>>> > >     public Movie(int id, String name) {
>>>>> > >         this.id = id;
>>>>> > >         this.name = name;
>>>>> > >     }
>>>>> > >
>>>>> > >     public int getId() {
>>>>> > >         return id;
>>>>> > >     }
>>>>> > >
>>>>> > >     public void setId(int id) {
>>>>> > >         this.id = id;
>>>>> > >     }
>>>>> > >
>>>>> > >     public String getName() {
>>>>> > >         return name;
>>>>> > >     }
>>>>> > >
>>>>> > >     public void setName(String name) {
>>>>> > >         this.name = name;
>>>>> > >     }
>>>>> > > }
>>>>> > >
>>>>> > > public class SparkSQLTest {
>>>>> > >     public static void main(String[] args) {
>>>>> > >         SparkConf conf = new SparkConf();
>>>>> > >         conf.setAppName("My Application");
>>>>> > >         conf.setMaster("local");
>>>>> > >         JavaSparkContext sc = new JavaSparkContext(conf);
>>>>> > >
>>>>> > >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>>>>> > >         movieArrayList.add(new Movie(1, "Indiana Jones"));
>>>>> > >
>>>>> > >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>>>>> > >
>>>>> > >         SQLContext sqlContext = new SQLContext(sc);
>>>>> > >         DataFrame frame = sqlContext.applySchema(movies,
>>>>> Movie.class);
>>>>> > >         frame.registerTempTable("movies");
>>>>> > >
>>>>> > >         sqlContext.sql("select name from movies")
>>>>> > >
>>>>> > > *                .map(row -> row.getString(0)) // this is what i
>>>>> would
>>>>> > expect to work *                .collect();
>>>>> > >     }
>>>>> > > }
>>>>> > >
>>>>> > >
>>>>> > > But this does not compile, here's the compilation error :
>>>>> > >
>>>>> > > [ERROR]
>>>>> > >
>>>>> >
>>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
>>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>>> applied to
>>>>> > > given types;
>>>>> > > [ERROR] *required:
>>>>> > >
>>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
>>>>> > > [ERROR]* found: (row)->"Na[...]ng(0) *
>>>>> > > [ERROR] *reason: cannot infer type-variable(s) R *
>>>>> > > [ERROR] *(actual and formal argument lists differ in length) *
>>>>> > > [ERROR]
>>>>> > >
>>>>> >
>>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
>>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>>> applied to
>>>>> > > given types;
>>>>> > > [ERROR] required:
>>>>> > >
>>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
>>>>> > > [ERROR] found: (row)->row[...]ng(0)
>>>>> > > [ERROR] reason: cannot infer type-variable(s) R
>>>>> > > [ERROR] (actual and formal argument lists differ in length)
>>>>> > > [ERROR] -> [Help 1]
>>>>> > >
>>>>> > > Because in the DataFrame the *map *method is defined as :
>>>>> > >
>>>>> > > [image: Images intégrées 1]
>>>>> > >
>>>>> > > And once this is translated to bytecode the actual Java signature
>>>>> uses a
>>>>> > > Function1 and adds a ClassTag parameter.
>>>>> > > I can try to go around this and use the scala.reflect.ClassTag$
>>>>> like
>>>>> > that :
>>>>> > >
>>>>> > > ClassTag$.MODULE$.apply(String.class)
>>>>> > >
>>>>> > > To get the second ClassTag parameter right, but then instantiating
>>>>> a
>>>>> > java.util.Function or using the Java 8 lambdas fail to work, and if
>>>>> I try
>>>>> > to instantiate a proper scala Function1... well this is a world of
>>>>> pain.
>>>>> > >
>>>>> > > This is a regression introduced by the 1.3.x DataFrame because
>>>>> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and
>>>>> are not
>>>>> > callable with JFunctions), I can open a Jira if you want ?
>>>>> > >
>>>>> > > Regards,
>>>>> > >
>>>>> > > --
>>>>> > > *Olivier Girardot* | Associé
>>>>> > > o.girardot@lateral-thoughts.com
>>>>> > > +33 6 24 09 17 94
>>>>> > >
>>>>> >
>>>>>
>>>>
>>>>
>>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

Is there any convention *not* to show java 8 versions in the documentation ?

Le ven. 17 avr. 2015 à 21:39, Reynold Xin <rx...@databricks.com> a écrit :

> Please do! Thanks.
>
>
> On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
>> Ok, do you want me to open a pull request to fix the dedicated
>> documentation ?
>>
>> Le ven. 17 avr. 2015 à 18:14, Reynold Xin <rx...@databricks.com> a écrit :
>>
>>> I think in 1.3 and above, you'd need to do
>>>
>>> .sql(...).javaRDD().map(..)
>>>
>>> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot <
>>> o.girardot@lateral-thoughts.com> wrote:
>>>
>>>> Yes thanks !
>>>>
>>>> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :
>>>>
>>>> > The image didn't go through.
>>>> >
>>>> > I think you were referring to:
>>>> >   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>>>> >
>>>> > Cheers
>>>> >
>>>> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
>>>> > o.girardot@lateral-thoughts.com> wrote:
>>>> >
>>>> > > Hi everyone,
>>>> > > I had an issue trying to use Spark SQL from Java (8 or 7), I tried
>>>> to
>>>> > > reproduce it in a small test case close to the actual documentation
>>>> > > <
>>>> >
>>>> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>>> > >,
>>>> > > so sorry for the long mail, but this is "Java" :
>>>> > >
>>>> > > import org.apache.spark.api.java.JavaRDD;
>>>> > > import org.apache.spark.api.java.JavaSparkContext;
>>>> > > import org.apache.spark.sql.DataFrame;
>>>> > > import org.apache.spark.sql.SQLContext;
>>>> > >
>>>> > > import java.io.Serializable;
>>>> > > import java.util.ArrayList;
>>>> > > import java.util.Arrays;
>>>> > > import java.util.List;
>>>> > >
>>>> > > class Movie implements Serializable {
>>>> > >     private int id;
>>>> > >     private String name;
>>>> > >
>>>> > >     public Movie(int id, String name) {
>>>> > >         this.id = id;
>>>> > >         this.name = name;
>>>> > >     }
>>>> > >
>>>> > >     public int getId() {
>>>> > >         return id;
>>>> > >     }
>>>> > >
>>>> > >     public void setId(int id) {
>>>> > >         this.id = id;
>>>> > >     }
>>>> > >
>>>> > >     public String getName() {
>>>> > >         return name;
>>>> > >     }
>>>> > >
>>>> > >     public void setName(String name) {
>>>> > >         this.name = name;
>>>> > >     }
>>>> > > }
>>>> > >
>>>> > > public class SparkSQLTest {
>>>> > >     public static void main(String[] args) {
>>>> > >         SparkConf conf = new SparkConf();
>>>> > >         conf.setAppName("My Application");
>>>> > >         conf.setMaster("local");
>>>> > >         JavaSparkContext sc = new JavaSparkContext(conf);
>>>> > >
>>>> > >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>>>> > >         movieArrayList.add(new Movie(1, "Indiana Jones"));
>>>> > >
>>>> > >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>>>> > >
>>>> > >         SQLContext sqlContext = new SQLContext(sc);
>>>> > >         DataFrame frame = sqlContext.applySchema(movies,
>>>> Movie.class);
>>>> > >         frame.registerTempTable("movies");
>>>> > >
>>>> > >         sqlContext.sql("select name from movies")
>>>> > >
>>>> > > *                .map(row -> row.getString(0)) // this is what i
>>>> would
>>>> > expect to work *                .collect();
>>>> > >     }
>>>> > > }
>>>> > >
>>>> > >
>>>> > > But this does not compile, here's the compilation error :
>>>> > >
>>>> > > [ERROR]
>>>> > >
>>>> >
>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>> applied to
>>>> > > given types;
>>>> > > [ERROR] *required:
>>>> > >
>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
>>>> > > [ERROR]* found: (row)->"Na[...]ng(0) *
>>>> > > [ERROR] *reason: cannot infer type-variable(s) R *
>>>> > > [ERROR] *(actual and formal argument lists differ in length) *
>>>> > > [ERROR]
>>>> > >
>>>> >
>>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
>>>> > > method map in class org.apache.spark.sql.DataFrame cannot be
>>>> applied to
>>>> > > given types;
>>>> > > [ERROR] required:
>>>> > >
>>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
>>>> > > [ERROR] found: (row)->row[...]ng(0)
>>>> > > [ERROR] reason: cannot infer type-variable(s) R
>>>> > > [ERROR] (actual and formal argument lists differ in length)
>>>> > > [ERROR] -> [Help 1]
>>>> > >
>>>> > > Because in the DataFrame the *map *method is defined as :
>>>> > >
>>>> > > [image: Images intégrées 1]
>>>> > >
>>>> > > And once this is translated to bytecode the actual Java signature
>>>> uses a
>>>> > > Function1 and adds a ClassTag parameter.
>>>> > > I can try to go around this and use the scala.reflect.ClassTag$ like
>>>> > that :
>>>> > >
>>>> > > ClassTag$.MODULE$.apply(String.class)
>>>> > >
>>>> > > To get the second ClassTag parameter right, but then instantiating a
>>>> > java.util.Function or using the Java 8 lambdas fail to work, and if I
>>>> try
>>>> > to instantiate a proper scala Function1... well this is a world of
>>>> pain.
>>>> > >
>>>> > > This is a regression introduced by the 1.3.x DataFrame because
>>>> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
>>>> not
>>>> > callable with JFunctions), I can open a Jira if you want ?
>>>> > >
>>>> > > Regards,
>>>> > >
>>>> > > --
>>>> > > *Olivier Girardot* | Associé
>>>> > > o.girardot@lateral-thoughts.com
>>>> > > +33 6 24 09 17 94
>>>> > >
>>>> >
>>>>
>>>
>>>
>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Reynold Xin <rx...@databricks.com>.

Please do! Thanks.


On Fri, Apr 17, 2015 at 2:36 PM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Ok, do you want me to open a pull request to fix the dedicated
> documentation ?
>
> Le ven. 17 avr. 2015 à 18:14, Reynold Xin <rx...@databricks.com> a écrit :
>
>> I think in 1.3 and above, you'd need to do
>>
>> .sql(...).javaRDD().map(..)
>>
>> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot <
>> o.girardot@lateral-thoughts.com> wrote:
>>
>>> Yes thanks !
>>>
>>> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :
>>>
>>> > The image didn't go through.
>>> >
>>> > I think you were referring to:
>>> >   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>>> >
>>> > Cheers
>>> >
>>> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
>>> > o.girardot@lateral-thoughts.com> wrote:
>>> >
>>> > > Hi everyone,
>>> > > I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
>>> > > reproduce it in a small test case close to the actual documentation
>>> > > <
>>> >
>>> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>>> > >,
>>> > > so sorry for the long mail, but this is "Java" :
>>> > >
>>> > > import org.apache.spark.api.java.JavaRDD;
>>> > > import org.apache.spark.api.java.JavaSparkContext;
>>> > > import org.apache.spark.sql.DataFrame;
>>> > > import org.apache.spark.sql.SQLContext;
>>> > >
>>> > > import java.io.Serializable;
>>> > > import java.util.ArrayList;
>>> > > import java.util.Arrays;
>>> > > import java.util.List;
>>> > >
>>> > > class Movie implements Serializable {
>>> > >     private int id;
>>> > >     private String name;
>>> > >
>>> > >     public Movie(int id, String name) {
>>> > >         this.id = id;
>>> > >         this.name = name;
>>> > >     }
>>> > >
>>> > >     public int getId() {
>>> > >         return id;
>>> > >     }
>>> > >
>>> > >     public void setId(int id) {
>>> > >         this.id = id;
>>> > >     }
>>> > >
>>> > >     public String getName() {
>>> > >         return name;
>>> > >     }
>>> > >
>>> > >     public void setName(String name) {
>>> > >         this.name = name;
>>> > >     }
>>> > > }
>>> > >
>>> > > public class SparkSQLTest {
>>> > >     public static void main(String[] args) {
>>> > >         SparkConf conf = new SparkConf();
>>> > >         conf.setAppName("My Application");
>>> > >         conf.setMaster("local");
>>> > >         JavaSparkContext sc = new JavaSparkContext(conf);
>>> > >
>>> > >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>>> > >         movieArrayList.add(new Movie(1, "Indiana Jones"));
>>> > >
>>> > >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>>> > >
>>> > >         SQLContext sqlContext = new SQLContext(sc);
>>> > >         DataFrame frame = sqlContext.applySchema(movies,
>>> Movie.class);
>>> > >         frame.registerTempTable("movies");
>>> > >
>>> > >         sqlContext.sql("select name from movies")
>>> > >
>>> > > *                .map(row -> row.getString(0)) // this is what i
>>> would
>>> > expect to work *                .collect();
>>> > >     }
>>> > > }
>>> > >
>>> > >
>>> > > But this does not compile, here's the compilation error :
>>> > >
>>> > > [ERROR]
>>> > >
>>> >
>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
>>> > > method map in class org.apache.spark.sql.DataFrame cannot be applied
>>> to
>>> > > given types;
>>> > > [ERROR] *required:
>>> > >
>>> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
>>> > > [ERROR]* found: (row)->"Na[...]ng(0) *
>>> > > [ERROR] *reason: cannot infer type-variable(s) R *
>>> > > [ERROR] *(actual and formal argument lists differ in length) *
>>> > > [ERROR]
>>> > >
>>> >
>>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
>>> > > method map in class org.apache.spark.sql.DataFrame cannot be applied
>>> to
>>> > > given types;
>>> > > [ERROR] required:
>>> > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
>>> > > [ERROR] found: (row)->row[...]ng(0)
>>> > > [ERROR] reason: cannot infer type-variable(s) R
>>> > > [ERROR] (actual and formal argument lists differ in length)
>>> > > [ERROR] -> [Help 1]
>>> > >
>>> > > Because in the DataFrame the *map *method is defined as :
>>> > >
>>> > > [image: Images intégrées 1]
>>> > >
>>> > > And once this is translated to bytecode the actual Java signature
>>> uses a
>>> > > Function1 and adds a ClassTag parameter.
>>> > > I can try to go around this and use the scala.reflect.ClassTag$ like
>>> > that :
>>> > >
>>> > > ClassTag$.MODULE$.apply(String.class)
>>> > >
>>> > > To get the second ClassTag parameter right, but then instantiating a
>>> > java.util.Function or using the Java 8 lambdas fail to work, and if I
>>> try
>>> > to instantiate a proper scala Function1... well this is a world of
>>> pain.
>>> > >
>>> > > This is a regression introduced by the 1.3.x DataFrame because
>>> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
>>> not
>>> > callable with JFunctions), I can open a Jira if you want ?
>>> > >
>>> > > Regards,
>>> > >
>>> > > --
>>> > > *Olivier Girardot* | Associé
>>> > > o.girardot@lateral-thoughts.com
>>> > > +33 6 24 09 17 94
>>> > >
>>> >
>>>
>>
>>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

Ok, do you want me to open a pull request to fix the dedicated
documentation ?

Le ven. 17 avr. 2015 à 18:14, Reynold Xin <rx...@databricks.com> a écrit :

> I think in 1.3 and above, you'd need to do
>
> .sql(...).javaRDD().map(..)
>
> On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
>> Yes thanks !
>>
>> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :
>>
>> > The image didn't go through.
>> >
>> > I think you were referring to:
>> >   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>> >
>> > Cheers
>> >
>> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
>> > o.girardot@lateral-thoughts.com> wrote:
>> >
>> > > Hi everyone,
>> > > I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
>> > > reproduce it in a small test case close to the actual documentation
>> > > <
>> >
>> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
>> > >,
>> > > so sorry for the long mail, but this is "Java" :
>> > >
>> > > import org.apache.spark.api.java.JavaRDD;
>> > > import org.apache.spark.api.java.JavaSparkContext;
>> > > import org.apache.spark.sql.DataFrame;
>> > > import org.apache.spark.sql.SQLContext;
>> > >
>> > > import java.io.Serializable;
>> > > import java.util.ArrayList;
>> > > import java.util.Arrays;
>> > > import java.util.List;
>> > >
>> > > class Movie implements Serializable {
>> > >     private int id;
>> > >     private String name;
>> > >
>> > >     public Movie(int id, String name) {
>> > >         this.id = id;
>> > >         this.name = name;
>> > >     }
>> > >
>> > >     public int getId() {
>> > >         return id;
>> > >     }
>> > >
>> > >     public void setId(int id) {
>> > >         this.id = id;
>> > >     }
>> > >
>> > >     public String getName() {
>> > >         return name;
>> > >     }
>> > >
>> > >     public void setName(String name) {
>> > >         this.name = name;
>> > >     }
>> > > }
>> > >
>> > > public class SparkSQLTest {
>> > >     public static void main(String[] args) {
>> > >         SparkConf conf = new SparkConf();
>> > >         conf.setAppName("My Application");
>> > >         conf.setMaster("local");
>> > >         JavaSparkContext sc = new JavaSparkContext(conf);
>> > >
>> > >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>> > >         movieArrayList.add(new Movie(1, "Indiana Jones"));
>> > >
>> > >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>> > >
>> > >         SQLContext sqlContext = new SQLContext(sc);
>> > >         DataFrame frame = sqlContext.applySchema(movies, Movie.class);
>> > >         frame.registerTempTable("movies");
>> > >
>> > >         sqlContext.sql("select name from movies")
>> > >
>> > > *                .map(row -> row.getString(0)) // this is what i would
>> > expect to work *                .collect();
>> > >     }
>> > > }
>> > >
>> > >
>> > > But this does not compile, here's the compilation error :
>> > >
>> > > [ERROR]
>> > >
>> >
>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
>> > > method map in class org.apache.spark.sql.DataFrame cannot be applied
>> to
>> > > given types;
>> > > [ERROR] *required:
>> > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
>> *
>> > > [ERROR]* found: (row)->"Na[...]ng(0) *
>> > > [ERROR] *reason: cannot infer type-variable(s) R *
>> > > [ERROR] *(actual and formal argument lists differ in length) *
>> > > [ERROR]
>> > >
>> >
>> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
>> > > method map in class org.apache.spark.sql.DataFrame cannot be applied
>> to
>> > > given types;
>> > > [ERROR] required:
>> > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
>> > > [ERROR] found: (row)->row[...]ng(0)
>> > > [ERROR] reason: cannot infer type-variable(s) R
>> > > [ERROR] (actual and formal argument lists differ in length)
>> > > [ERROR] -> [Help 1]
>> > >
>> > > Because in the DataFrame the *map *method is defined as :
>> > >
>> > > [image: Images intégrées 1]
>> > >
>> > > And once this is translated to bytecode the actual Java signature
>> uses a
>> > > Function1 and adds a ClassTag parameter.
>> > > I can try to go around this and use the scala.reflect.ClassTag$ like
>> > that :
>> > >
>> > > ClassTag$.MODULE$.apply(String.class)
>> > >
>> > > To get the second ClassTag parameter right, but then instantiating a
>> > java.util.Function or using the Java 8 lambdas fail to work, and if I
>> try
>> > to instantiate a proper scala Function1... well this is a world of pain.
>> > >
>> > > This is a regression introduced by the 1.3.x DataFrame because
>> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are
>> not
>> > callable with JFunctions), I can open a Jira if you want ?
>> > >
>> > > Regards,
>> > >
>> > > --
>> > > *Olivier Girardot* | Associé
>> > > o.girardot@lateral-thoughts.com
>> > > +33 6 24 09 17 94
>> > >
>> >
>>
>
>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Reynold Xin <rx...@databricks.com>.

I think in 1.3 and above, you'd need to do

.sql(...).javaRDD().map(..)

On Fri, Apr 17, 2015 at 9:22 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Yes thanks !
>
> Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :
>
> > The image didn't go through.
> >
> > I think you were referring to:
> >   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
> >
> > Cheers
> >
> > On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
> > o.girardot@lateral-thoughts.com> wrote:
> >
> > > Hi everyone,
> > > I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
> > > reproduce it in a small test case close to the actual documentation
> > > <
> >
> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
> > >,
> > > so sorry for the long mail, but this is "Java" :
> > >
> > > import org.apache.spark.api.java.JavaRDD;
> > > import org.apache.spark.api.java.JavaSparkContext;
> > > import org.apache.spark.sql.DataFrame;
> > > import org.apache.spark.sql.SQLContext;
> > >
> > > import java.io.Serializable;
> > > import java.util.ArrayList;
> > > import java.util.Arrays;
> > > import java.util.List;
> > >
> > > class Movie implements Serializable {
> > >     private int id;
> > >     private String name;
> > >
> > >     public Movie(int id, String name) {
> > >         this.id = id;
> > >         this.name = name;
> > >     }
> > >
> > >     public int getId() {
> > >         return id;
> > >     }
> > >
> > >     public void setId(int id) {
> > >         this.id = id;
> > >     }
> > >
> > >     public String getName() {
> > >         return name;
> > >     }
> > >
> > >     public void setName(String name) {
> > >         this.name = name;
> > >     }
> > > }
> > >
> > > public class SparkSQLTest {
> > >     public static void main(String[] args) {
> > >         SparkConf conf = new SparkConf();
> > >         conf.setAppName("My Application");
> > >         conf.setMaster("local");
> > >         JavaSparkContext sc = new JavaSparkContext(conf);
> > >
> > >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
> > >         movieArrayList.add(new Movie(1, "Indiana Jones"));
> > >
> > >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
> > >
> > >         SQLContext sqlContext = new SQLContext(sc);
> > >         DataFrame frame = sqlContext.applySchema(movies, Movie.class);
> > >         frame.registerTempTable("movies");
> > >
> > >         sqlContext.sql("select name from movies")
> > >
> > > *                .map(row -> row.getString(0)) // this is what i would
> > expect to work *                .collect();
> > >     }
> > > }
> > >
> > >
> > > But this does not compile, here's the compilation error :
> > >
> > > [ERROR]
> > >
> >
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
> > > method map in class org.apache.spark.sql.DataFrame cannot be applied to
> > > given types;
> > > [ERROR] *required:
> > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
> > > [ERROR]* found: (row)->"Na[...]ng(0) *
> > > [ERROR] *reason: cannot infer type-variable(s) R *
> > > [ERROR] *(actual and formal argument lists differ in length) *
> > > [ERROR]
> > >
> >
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
> > > method map in class org.apache.spark.sql.DataFrame cannot be applied to
> > > given types;
> > > [ERROR] required:
> > > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
> > > [ERROR] found: (row)->row[...]ng(0)
> > > [ERROR] reason: cannot infer type-variable(s) R
> > > [ERROR] (actual and formal argument lists differ in length)
> > > [ERROR] -> [Help 1]
> > >
> > > Because in the DataFrame the *map *method is defined as :
> > >
> > > [image: Images intégrées 1]
> > >
> > > And once this is translated to bytecode the actual Java signature uses
> a
> > > Function1 and adds a ClassTag parameter.
> > > I can try to go around this and use the scala.reflect.ClassTag$ like
> > that :
> > >
> > > ClassTag$.MODULE$.apply(String.class)
> > >
> > > To get the second ClassTag parameter right, but then instantiating a
> > java.util.Function or using the Java 8 lambdas fail to work, and if I try
> > to instantiate a proper scala Function1... well this is a world of pain.
> > >
> > > This is a regression introduced by the 1.3.x DataFrame because
> > JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not
> > callable with JFunctions), I can open a Jira if you want ?
> > >
> > > Regards,
> > >
> > > --
> > > *Olivier Girardot* | Associé
> > > o.girardot@lateral-thoughts.com
> > > +33 6 24 09 17 94
> > >
> >
>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Olivier Girardot <o....@lateral-thoughts.com>.

Yes thanks !

Le ven. 17 avr. 2015 à 16:20, Ted Yu <yu...@gmail.com> a écrit :

> The image didn't go through.
>
> I think you were referring to:
>   override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)
>
> Cheers
>
> On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
> o.girardot@lateral-thoughts.com> wrote:
>
> > Hi everyone,
> > I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
> > reproduce it in a small test case close to the actual documentation
> > <
> https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection
> >,
> > so sorry for the long mail, but this is "Java" :
> >
> > import org.apache.spark.api.java.JavaRDD;
> > import org.apache.spark.api.java.JavaSparkContext;
> > import org.apache.spark.sql.DataFrame;
> > import org.apache.spark.sql.SQLContext;
> >
> > import java.io.Serializable;
> > import java.util.ArrayList;
> > import java.util.Arrays;
> > import java.util.List;
> >
> > class Movie implements Serializable {
> >     private int id;
> >     private String name;
> >
> >     public Movie(int id, String name) {
> >         this.id = id;
> >         this.name = name;
> >     }
> >
> >     public int getId() {
> >         return id;
> >     }
> >
> >     public void setId(int id) {
> >         this.id = id;
> >     }
> >
> >     public String getName() {
> >         return name;
> >     }
> >
> >     public void setName(String name) {
> >         this.name = name;
> >     }
> > }
> >
> > public class SparkSQLTest {
> >     public static void main(String[] args) {
> >         SparkConf conf = new SparkConf();
> >         conf.setAppName("My Application");
> >         conf.setMaster("local");
> >         JavaSparkContext sc = new JavaSparkContext(conf);
> >
> >         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
> >         movieArrayList.add(new Movie(1, "Indiana Jones"));
> >
> >         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
> >
> >         SQLContext sqlContext = new SQLContext(sc);
> >         DataFrame frame = sqlContext.applySchema(movies, Movie.class);
> >         frame.registerTempTable("movies");
> >
> >         sqlContext.sql("select name from movies")
> >
> > *                .map(row -> row.getString(0)) // this is what i would
> expect to work *                .collect();
> >     }
> > }
> >
> >
> > But this does not compile, here's the compilation error :
> >
> > [ERROR]
> >
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
> > method map in class org.apache.spark.sql.DataFrame cannot be applied to
> > given types;
> > [ERROR] *required:
> > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
> > [ERROR]* found: (row)->"Na[...]ng(0) *
> > [ERROR] *reason: cannot infer type-variable(s) R *
> > [ERROR] *(actual and formal argument lists differ in length) *
> > [ERROR]
> >
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
> > method map in class org.apache.spark.sql.DataFrame cannot be applied to
> > given types;
> > [ERROR] required:
> > scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
> > [ERROR] found: (row)->row[...]ng(0)
> > [ERROR] reason: cannot infer type-variable(s) R
> > [ERROR] (actual and formal argument lists differ in length)
> > [ERROR] -> [Help 1]
> >
> > Because in the DataFrame the *map *method is defined as :
> >
> > [image: Images intégrées 1]
> >
> > And once this is translated to bytecode the actual Java signature uses a
> > Function1 and adds a ClassTag parameter.
> > I can try to go around this and use the scala.reflect.ClassTag$ like
> that :
> >
> > ClassTag$.MODULE$.apply(String.class)
> >
> > To get the second ClassTag parameter right, but then instantiating a
> java.util.Function or using the Java 8 lambdas fail to work, and if I try
> to instantiate a proper scala Function1... well this is a world of pain.
> >
> > This is a regression introduced by the 1.3.x DataFrame because
> JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not
> callable with JFunctions), I can open a Jira if you want ?
> >
> > Regards,
> >
> > --
> > *Olivier Girardot* | Associé
> > o.girardot@lateral-thoughts.com
> > +33 6 24 09 17 94
> >
>

Re: [Spark SQL] Java map/flatMap api broken with DataFrame in 1.3.{0,1}

Posted by Ted Yu <yu...@gmail.com>.

The image didn't go through.

I think you were referring to:
  override def map[R: ClassTag](f: Row => R): RDD[R] = rdd.map(f)

Cheers

On Fri, Apr 17, 2015 at 6:07 AM, Olivier Girardot <
o.girardot@lateral-thoughts.com> wrote:

> Hi everyone,
> I had an issue trying to use Spark SQL from Java (8 or 7), I tried to
> reproduce it in a small test case close to the actual documentation
> <https://spark.apache.org/docs/latest/sql-programming-guide.html#inferring-the-schema-using-reflection>,
> so sorry for the long mail, but this is "Java" :
>
> import org.apache.spark.api.java.JavaRDD;
> import org.apache.spark.api.java.JavaSparkContext;
> import org.apache.spark.sql.DataFrame;
> import org.apache.spark.sql.SQLContext;
>
> import java.io.Serializable;
> import java.util.ArrayList;
> import java.util.Arrays;
> import java.util.List;
>
> class Movie implements Serializable {
>     private int id;
>     private String name;
>
>     public Movie(int id, String name) {
>         this.id = id;
>         this.name = name;
>     }
>
>     public int getId() {
>         return id;
>     }
>
>     public void setId(int id) {
>         this.id = id;
>     }
>
>     public String getName() {
>         return name;
>     }
>
>     public void setName(String name) {
>         this.name = name;
>     }
> }
>
> public class SparkSQLTest {
>     public static void main(String[] args) {
>         SparkConf conf = new SparkConf();
>         conf.setAppName("My Application");
>         conf.setMaster("local");
>         JavaSparkContext sc = new JavaSparkContext(conf);
>
>         ArrayList<Movie> movieArrayList = new ArrayList<Movie>();
>         movieArrayList.add(new Movie(1, "Indiana Jones"));
>
>         JavaRDD<Movie> movies = sc.parallelize(movieArrayList);
>
>         SQLContext sqlContext = new SQLContext(sc);
>         DataFrame frame = sqlContext.applySchema(movies, Movie.class);
>         frame.registerTempTable("movies");
>
>         sqlContext.sql("select name from movies")
>
> *                .map(row -> row.getString(0)) // this is what i would expect to work *                .collect();
>     }
> }
>
>
> But this does not compile, here's the compilation error :
>
> [ERROR]
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/MainSQL.java:[37,47]
> method map in class org.apache.spark.sql.DataFrame cannot be applied to
> given types;
> [ERROR] *required:
> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R> *
> [ERROR]* found: (row)->"Na[...]ng(0) *
> [ERROR] *reason: cannot infer type-variable(s) R *
> [ERROR] *(actual and formal argument lists differ in length) *
> [ERROR]
> /Users/ogirardot/Documents/spark/java-project/src/main/java/org/apache/spark/SampleSHit.java:[56,17]
> method map in class org.apache.spark.sql.DataFrame cannot be applied to
> given types;
> [ERROR] required:
> scala.Function1<org.apache.spark.sql.Row,R>,scala.reflect.ClassTag<R>
> [ERROR] found: (row)->row[...]ng(0)
> [ERROR] reason: cannot infer type-variable(s) R
> [ERROR] (actual and formal argument lists differ in length)
> [ERROR] -> [Help 1]
>
> Because in the DataFrame the *map *method is defined as :
>
> [image: Images intégrées 1]
>
> And once this is translated to bytecode the actual Java signature uses a
> Function1 and adds a ClassTag parameter.
> I can try to go around this and use the scala.reflect.ClassTag$ like that :
>
> ClassTag$.MODULE$.apply(String.class)
>
> To get the second ClassTag parameter right, but then instantiating a java.util.Function or using the Java 8 lambdas fail to work, and if I try to instantiate a proper scala Function1... well this is a world of pain.
>
> This is a regression introduced by the 1.3.x DataFrame because JavaSchemaRDD used to be JavaRDDLike but DataFrame's are not (and are not callable with JFunctions), I can open a Jira if you want ?
>
> Regards,
>
> --
> *Olivier Girardot* | Associé
> o.girardot@lateral-thoughts.com
> +33 6 24 09 17 94
>