You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Jakob Odersky (JIRA)" <ji...@apache.org> on 2015/12/08 00:30:11 UTC
[jira] [Commented] (SPARK-9502) ArrayTypes incorrect for DataFrames
Java API
[ https://issues.apache.org/jira/browse/SPARK-9502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15045984#comment-15045984 ]
Jakob Odersky commented on SPARK-9502:
--------------------------------------
Not sure if this is actually an error. As I see it, an ArrayType is something "implementation specific", no assumptions on the returned collection type will be made, except that it must be traversable.
> ArrayTypes incorrect for DataFrames Java API
> --------------------------------------------
>
> Key: SPARK-9502
> URL: https://issues.apache.org/jira/browse/SPARK-9502
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 1.4.1
> Reporter: Kuldeep
> Priority: Critical
>
> With upgrade to 1.4.1 array types for DataFrames were different in our java applications. I have modified JavaApplySchemaSuite to show the problem. Mainly i have added a list field to the person class.
> {code:java}
> public static class Person implements Serializable {
> private String name;
> private int age;
> private List<String> skills;
> public String getName() {
> return name;
> }
> public void setName(String name) {
> this.name = name;
> }
> public int getAge() {
> return age;
> }
> public void setAge(int age) {
> this.age = age;
> }
> public void setSkills(List<String> skills) {
> this.skills = skills;
> }
> public List<String> getSkills() { return skills; }
> }
> @Test
> public void applySchema() {
> List<Person> personList = new ArrayList<Person>(2);
> List<String> skills = new ArrayList<String>();
> skills.add("eating");
> skills.add("sleeping");
> Person person1 = new Person();
> person1.setName("Michael");
> person1.setAge(29);
> person1.setSkills(skills);
> personList.add(person1);
> Person person2 = new Person();
> person2.setName("Yin");
> person2.setAge(28);
> person2.setSkills(skills);
> personList.add(person2);
> JavaRDD<Row> rowRDD = javaCtx.parallelize(personList).map(
> new Function<Person, Row>() {
> public Row call(Person person) throws Exception {
> return RowFactory.create(person.getName(), person.getAge(), person.getSkills());
> }
> });
> List<StructField> fields = new ArrayList<StructField>(2);
> fields.add(DataTypes.createStructField("name", DataTypes.StringType, false));
> fields.add(DataTypes.createStructField("age", DataTypes.IntegerType, false));
> fields.add(DataTypes.createStructField("skills", DataTypes.createArrayType(DataTypes.StringType), false));
> StructType schema = DataTypes.createStructType(fields);
> DataFrame df = sqlContext.applySchema(rowRDD, schema);
> df.registerTempTable("people");
> Row[] actual = sqlContext.sql("SELECT * FROM people").collect();
> System.out.println(actual[1].get(2).getClass().getName());
> System.out.println(actual[1].get(2) instanceof List);
> List<Row> expected = new ArrayList<Row>(2);
> expected.add(RowFactory.create("Michael", 29, skills));
> expected.add(RowFactory.create("Yin", 28, skills));
> Assert.assertEquals(expected, Arrays.asList(actual));
> }
> {code}
> This prints
> scala.collection.immutable.$colon$colon
> false
> java.lang.AssertionError:
> Expected :[[Michael,29,[eating, sleeping]], [Yin,28,[eating, sleeping]]]
> Actual :[[Michael,29,List(eating, sleeping)], [Yin,28,List(eating, sleeping)]]
> Not sure if this would be usable even in scala.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org