You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2017/07/05 17:43:00 UTC
[jira] [Commented] (SPARK-21316) Dataset Union output is not consistent with the column sequence

    [ https://issues.apache.org/jira/browse/SPARK-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075129#comment-16075129 ] 

Dongjoon Hyun commented on SPARK-21316:
---------------------------------------

Union assumes the schema ordering are the same for both dataset.
If you are interested with `unionByName`, please see SPARK-21043.

> Dataset Union output is not consistent with the column sequence
> ---------------------------------------------------------------
>
>                 Key: SPARK-21316
>                 URL: https://issues.apache.org/jira/browse/SPARK-21316
>             Project: Spark
>          Issue Type: Bug
>          Components: Optimizer, SQL
>    Affects Versions: 2.1.0
>            Reporter: Kaushal Prajapati
>            Priority: Critical
>              Labels: patch
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> if i take union of 2 datasets with similar schema, the output should remain same even if i change the sequence of columns while creating the dataset. 
> i am attaching the code snippet for details.
> {code:java}
> public class Person{
>   public String name;
>   public String age;
>   public Person(String name, String age) {
>     this.name = name;
>     this.age = age;
>   }
>   public String getName() {return name;}
>   public void setName(String name) {this.name = name;}
>   public String getAge() {return age;}
>   public void setAge(String age) {this.age = age;}
> }
> {code}
> {code:java}
> public class Test {
>   public static void main(String arg[]) throws Exception {
>     SparkSession spark = SparkConnection.getSpark();
>     List<Person> list1 = new ArrayList<>();
>     list1.add(new Person("kaushal", "25"));
>     list1.add(new Person("aman", "26"));
>     List<Person> list2 = new ArrayList<>();
>     list2.add(new Person("sapan", "25"));
>     list2.add(new Person("yati", "26"));
>     Dataset<Person> ds1 = spark.createDataset(list1, Encoders.bean(Person.class));
>     Dataset<Person> ds2 = spark.createDataset(list2, Encoders.bean(Person.class));
>     ds1.show();
>     ds2.show();
>     ds1.select("name","age").as(Encoders.bean(Person.class)).union(ds2).show();
>   }
> }
> {code}
> output :-
> {code:java}
> +---+-------+
> |age|   name|
> +---+-------+
> | 25|kaushal|
> | 26|   aman|
> +---+-------+
> +---+-----+
> |age| name|
> +---+-----+
> | 25|sapan|
> | 26| yati|
> +---+-----+
> +-------+-----+
> |   name|  age|
> +-------+-----+
> |kaushal|   25|
> |   aman|   26|
> |     25|sapan|
> |     26| yati|
> +-------+-----+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org