You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Will Zimmerman (Jira)" <ji...@apache.org> on 2020/06/30 20:53:00 UTC

[jira] [Commented] (SPARK-28664) ORDER BY in aggregate function

    [ https://issues.apache.org/jira/browse/SPARK-28664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17148952#comment-17148952 ] 

Will Zimmerman commented on SPARK-28664:
----------------------------------------

[~yumwang] - Would this allow for the changing of Null ordering (e.g. NULL FIRST or NULL LAST)? Currently when evaluating the MIN() of a structure, NULL appears to be taken as the minimum possible value, where as it would be nice if the NULL ordering could be changed or evaluation of structs could be similar to MIN() of a value (where NULL are ignored unless that is the only option). Also, I can't find in Spark documentation what the expected behavior of NULL values is within a struct. The documention I'm referring to can be found [https://spark.apache.org/docs/3.0.0-preview/sql-ref-null-semantics.html].


{code:java}
SELECT ID 
   ,COLLECT_SET(STRUCT(x,y)) AS collection
   ,MIN(x) AS min_of_x
   ,MIN(y) AS min_of_y
   ,MIN(STRUCT(x,y)) AS min_of_collection
   ,MAX(STRUCT(x,y)) AS max_of_collection
FROM (values(1234390, 12.0, 'string_1'), (1234390, 37.4, 'string_2'), (1234390, 6.9, NULL), (1234390, 3.1, 'string_3'), (1234390, NULL, 'string_4'), (1234390, NULL, NULL)) AS d(ID,x,y)
GROUP BY 1
{code}
Result of Spark SQL query in Spark 2.4.4 can be seen below, where the desired outcome would be (3.1, string_3).

!image-2020-06-30-15-49-46-796.png!

> ORDER BY in aggregate function
> ------------------------------
>
>                 Key: SPARK-28664
>                 URL: https://issues.apache.org/jira/browse/SPARK-28664
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SQL
>    Affects Versions: 3.1.0
>            Reporter: Yuming Wang
>            Priority: Major
>         Attachments: image-2020-06-30-15-49-46-796.png
>
>
> {code:sql}
> SELECT min(x ORDER BY y) FROM (VALUES(1, NULL)) AS d(x,y);
> SELECT min(x ORDER BY y) FROM (VALUES(1, 2)) AS d(x,y);
> {code}
> https://github.com/postgres/postgres/blob/44e95b5728a4569c494fa4ea4317f8a2f50a206b/src/test/regress/sql/aggregates.sql#L978-L982



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org