You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Josh Rosen (JIRA)" <ji...@apache.org> on 2019/04/27 05:09:00 UTC
[jira] [Commented] (SPARK-27290) remove unneed sort under Aggregate

    [ https://issues.apache.org/jira/browse/SPARK-27290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827448#comment-16827448 ] 

Josh Rosen commented on SPARK-27290:
------------------------------------

Regarding that test case, my best guess is that SPARK-23375 was primarily concerned with removing _duplicate_ back-to-back sorts (e.g. keeping only the final sort) and that test case appears to be ensuring that we don't go too far and completely remove all sorts in cases where we still need ordering, such as a top-level ORDER BY (the test case you linked above) or where a SortMergeJoin needs sorted input.

In this issue, it sounds like you want to remove sorts because they're completely unnecessary (rather than just being redundant with other sorts).


> remove unneed sort under Aggregate
> ----------------------------------
>
>                 Key: SPARK-27290
>                 URL: https://issues.apache.org/jira/browse/SPARK-27290
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 2.4.0
>            Reporter: Xiaoju Wu
>            Priority: Minor
>
> I saw some tickets to remove unneeded sort in plan while I think there's another case in which sort is redundant:
> Sort just under an non-orderPreserving node is redundant, for example:
> {code}
> select count(*) from (select a1 from A order by a2);
> +- Aggregate
>   +- Sort
>      +- FileScan parquet
> {code}
> But one of the existing test cases is conflict with this example:
> {code}
> test("sort should not be removed when there is a node which doesn't guarantee any order") {
>    val orderedPlan = testRelation.select('a, 'b).orderBy('a.asc)   
>    val groupedAndResorted = orderedPlan.groupBy('a)(sum('a)).orderBy('a.asc)
>    val optimized = Optimize.execute(groupedAndResorted.analyze)
>    val correctAnswer = groupedAndResorted.analyze
>    comparePlans(optimized, correctAnswer) 
> }
> {code}
> Why is it designed like this? In my opinion, since Aggregate won't pass up the ordering, the below Sort is useless.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org