You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dongjoon Hyun (Jira)" <ji...@apache.org> on 2020/10/27 20:21:00 UTC

[jira] [Resolved] (SPARK-33260) SortExec produces incorrect results if sortOrder is a Stream

     [ https://issues.apache.org/jira/browse/SPARK-33260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dongjoon Hyun resolved SPARK-33260.
-----------------------------------
    Fix Version/s: 3.0.2
                   3.1.0
       Resolution: Fixed

Issue resolved by pull request 30160
[https://github.com/apache/spark/pull/30160]

> SortExec produces incorrect results if sortOrder is a Stream
> ------------------------------------------------------------
>
>                 Key: SPARK-33260
>                 URL: https://issues.apache.org/jira/browse/SPARK-33260
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.0.1
>            Reporter: Ankur Dave
>            Assignee: Ankur Dave
>            Priority: Major
>             Fix For: 3.1.0, 3.0.2
>
>
> The following query produces incorrect results. The query has two essential features: (1) it contains a string aggregate, resulting in a {{SortExec}} node, and (2) it contains a duplicate grouping key, causing {{RemoveRepetitionFromGroupExpressions}} to produce a sort order stored as a Stream.
> SELECT bigint_col_1, bigint_col_9, MAX(CAST(bigint_col_1 AS string))
> FROM table_4
> GROUP BY bigint_col_1, bigint_col_9, bigint_col_9
> When the sort order is stored as a {{Stream}}, the line {{ordering.map(_.child.genCode(ctx))}} in {{GenerateOrdering#createOrderKeys()}} produces unpredictable side effects to {{ctx}}. This is because {{genCode(ctx)}} modifies {{ctx}}. When {{ordering}} is a {{Stream}}, the modifications will not happen immediately as intended, but will instead occur lazily when the returned {{Stream}} is used later.
> Similar bugs have occurred at least three times in the past: https://issues.apache.org/jira/browse/SPARK-24500, https://issues.apache.org/jira/browse/SPARK-25767, https://issues.apache.org/jira/browse/SPARK-26680.
> The fix is to check if {{ordering}} is a {{Stream}} and force the modifications to happen immediately if so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org