You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Liang-Chi Hsieh (JIRA)" <ji...@apache.org> on 2017/06/05 09:01:04 UTC

[jira] [Commented] (SPARK-20969) last() aggregate function fails returning the right answer with ordered windows

    [ https://issues.apache.org/jira/browse/SPARK-20969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16036690#comment-16036690 ] 

Liang-Chi Hsieh commented on SPARK-20969:
-----------------------------------------

It seems to me that the second result isn't the same as the first one.

When no specified frame is given, and there is an order specification for the window, the default frame should be {{RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW}}.

So for the first result, the `last` column in the two rows with "ts = 1" shouldn't be "desc3" as you expected.



> last() aggregate function fails returning the right answer with ordered windows
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-20969
>                 URL: https://issues.apache.org/jira/browse/SPARK-20969
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.1
>            Reporter: Perrine Letellier
>
> The column on which `orderBy` is performed is considered as another column on which to partition.
> {code}
> scala> val df = sc.parallelize(List(("i1", 1, "desc1"), ("i1", 1, "desc2"), ("i1", 2, "desc3"))).toDF("id", "ts", "description")
> scala> val window = Window.partitionBy("id").orderBy(col("ts").asc)
> scala> df.withColumn("last", last(col("description")).over(window)).show
> +---+---+-----+-----+
> | id| ts| description| last|
> +---+---+-----+-----+
> | i1|  1|desc1|desc2|
> | i1|  1|desc2|desc2|
> | i1|  2|desc3|desc3|
> +---+---+-----+-----+
> {code}
> However what is expected is the same answer as if asking for `first()` with a window with descending order.
> {code}
> scala> val window = Window.partitionBy("id").orderBy(col("ts").desc)
> scala> df.withColumn("last", first(col("description")).over(window)).show
> +---+---+-----+-----+
> | id| ts| description| last|
> +---+---+-----+-----+
> | i1|  2|desc3|desc3|
> | i1|  1|desc1|desc3|
> | i1|  1|desc2|desc3|
> +---+---+-----+-----+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org