You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (JIRA)" <ji...@apache.org> on 2017/07/05 17:43:00 UTC
[jira] [Commented] (SPARK-21316) Dataset Union output is not
consistent with the column sequence
[ https://issues.apache.org/jira/browse/SPARK-21316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16075129#comment-16075129 ]
Dongjoon Hyun commented on SPARK-21316:
---------------------------------------
Union assumes the schema ordering are the same for both dataset.
If you are interested with `unionByName`, please see SPARK-21043.
> Dataset Union output is not consistent with the column sequence
> ---------------------------------------------------------------
>
> Key: SPARK-21316
> URL: https://issues.apache.org/jira/browse/SPARK-21316
> Project: Spark
> Issue Type: Bug
> Components: Optimizer, SQL
> Affects Versions: 2.1.0
> Reporter: Kaushal Prajapati
> Priority: Critical
> Labels: patch
> Original Estimate: 168h
> Remaining Estimate: 168h
>
> if i take union of 2 datasets with similar schema, the output should remain same even if i change the sequence of columns while creating the dataset.
> i am attaching the code snippet for details.
> {code:java}
> public class Person{
> public String name;
> public String age;
> public Person(String name, String age) {
> this.name = name;
> this.age = age;
> }
> public String getName() {return name;}
> public void setName(String name) {this.name = name;}
> public String getAge() {return age;}
> public void setAge(String age) {this.age = age;}
> }
> {code}
> {code:java}
> public class Test {
> public static void main(String arg[]) throws Exception {
> SparkSession spark = SparkConnection.getSpark();
> List<Person> list1 = new ArrayList<>();
> list1.add(new Person("kaushal", "25"));
> list1.add(new Person("aman", "26"));
> List<Person> list2 = new ArrayList<>();
> list2.add(new Person("sapan", "25"));
> list2.add(new Person("yati", "26"));
> Dataset<Person> ds1 = spark.createDataset(list1, Encoders.bean(Person.class));
> Dataset<Person> ds2 = spark.createDataset(list2, Encoders.bean(Person.class));
> ds1.show();
> ds2.show();
> ds1.select("name","age").as(Encoders.bean(Person.class)).union(ds2).show();
> }
> }
> {code}
> output :-
> {code:java}
> +---+-------+
> |age| name|
> +---+-------+
> | 25|kaushal|
> | 26| aman|
> +---+-------+
> +---+-----+
> |age| name|
> +---+-----+
> | 25|sapan|
> | 26| yati|
> +---+-----+
> +-------+-----+
> | name| age|
> +-------+-----+
> |kaushal| 25|
> | aman| 26|
> | 25|sapan|
> | 26| yati|
> +-------+-----+
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org