You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/07/09 00:19:04 UTC

[jira] [Resolved] (SPARK-8908) Calling distinct() with parentheses throws error in Scala DataFrame

     [ https://issues.apache.org/jira/browse/SPARK-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin resolved SPARK-8908.
--------------------------------
       Resolution: Fixed
         Assignee: Cheolsoo Park
    Fix Version/s: 1.5.0

> Calling distinct() with parentheses throws error in Scala DataFrame
> -------------------------------------------------------------------
>
>                 Key: SPARK-8908
>                 URL: https://issues.apache.org/jira/browse/SPARK-8908
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.4.0, 1.5.0
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>            Priority: Minor
>             Fix For: 1.5.0
>
>
> To reproduce, please call {{distinct()}} on DataFrame in spark-shell. For eg,
> {code}
> scala> sqlContext.table("my_table").distinct()
> <console>:19: error: not enough arguments for method apply: (colName: String)org.apache.spark.sql.Column in class DataFrame.
> Unspecified value parameter colName.
> {code}
> This is confusing because {{distinct}} in DataFrame is an alias of {{dropDuplicates}}, and both {{dropDuplicates}} and {{dropDuplicates()}} work.
> Here is the summary-
> ||Scala code||Works||
> |DF.distinct|Y|
> |DF.distinct()|N|
> |DF.dropDuplicates|Y|
> |DF.dropDuplicates()|Y|
> Looking at the definition of {{distinct}}, it's missing {{()}}-
> {code}
> override def distinct: DataFrame = dropDuplicates()
> {code}
> As a result, what seems happening is as follows-
> {code}
> distinct()
> => dropDuplicates()()
> => DataFrame() // because dropDuplicates() returns DF
> => DataFrame.apply() // fails because apply() takes a column parameter
> {code}
> I can verify that adding {{()}} to the definition makes both {{distinct}} and {{distinct()}} work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org