You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Wenchen Fan (JIRA)" <ji...@apache.org> on 2016/10/13 05:30:21 UTC

[jira] [Resolved] (SPARK-17867) Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name

     [ https://issues.apache.org/jira/browse/SPARK-17867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wenchen Fan resolved SPARK-17867.
---------------------------------
       Resolution: Fixed
    Fix Version/s: 2.1.0

Issue resolved by pull request 15427
[https://github.com/apache/spark/pull/15427]

> Dataset.dropDuplicates (i.e. distinct) should consider the columns with same column name
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-17867
>                 URL: https://issues.apache.org/jira/browse/SPARK-17867
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>            Reporter: Liang-Chi Hsieh
>            Assignee: Liang-Chi Hsieh
>             Fix For: 2.1.0
>
>
> We find and get the first resolved attribute from output with the given column name in Dataset.dropDuplicates. When we have the more than one columns with the same name. Other columns are put into aggregation columns, instead of grouping columns. We should fix this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org