You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Takeshi Yamamuro (JIRA)" <ji...@apache.org> on 2017/05/18 13:54:04 UTC

[jira] [Commented] (SPARK-20747) Distinct in Aggregate Functions

    [ https://issues.apache.org/jira/browse/SPARK-20747?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16015776#comment-16015776 ] 

Takeshi Yamamuro commented on SPARK-20747:
------------------------------------------

You mean this query below?
{code}
scala> Seq((1, 1), (1, 1), (1, 1)).toDF("a", "b").createOrReplaceTempView("t")

scala> sql("""select a, avg(distinct b) from t group by a""").show
+---+---------------+                                                           
|  a|avg(DISTINCT b)|
+---+---------------+
|  1|            1.0|
+---+---------------+
{code}
It seems these syntaxes already supported though, am I missing something?

> Distinct in Aggregate Functions
> -------------------------------
>
>                 Key: SPARK-20747
>                 URL: https://issues.apache.org/jira/browse/SPARK-20747
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 2.2.0
>            Reporter: Xiao Li
>
> {noformat}
> AVG ([DISTINCT]|[ALL] <numeric expression>)
> MAX ([DISTINCT]|[ALL] <expression>)
> MIN ([DISTINCT]|[ALL] <expression>)
> SUM ([DISTINCT]|[ALL] <numeric_expression>)
> {noformat}
> Except COUNT, the DISTINCT clause is not supported by Spark SQL



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org