You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by henryr <gi...@git.apache.org> on 2018/05/04 18:00:18 UTC

[GitHub] spark issue #21049: [SPARK-23957][SQL] Remove redundant sort operators from ...

Github user henryr commented on the issue:

    https://github.com/apache/spark/pull/21049
  
    I might be a bit of a hardliner on this, but I think it's correct to eliminate the {{ORDER BY}} from common table expressions (e.g. MSSQL agrees with me, see [this link](https://docs.microsoft.com/en-us/sql/t-sql/queries/with-common-table-expression-transact-sql?view=sql-server-2017#guidelines-for-creating-and-using-common-table-expressions)).
    
    However, given the principle of least surprise, I agree it might be a good idea to at least start with scalar and nested subqueries, and leave inline views for another day. That might be a bit harder to do (I think the rule will need a whitelist of operators it's ok to eliminate sorts below), and in general I think there'll be some missed opportunities, but it's a start :)
    
    Alternatively we could extend the analyzed logical plan to explicitly mark the different subquery types (i.e. have a `InlineView` node, a `NestedSubquery` node and so on). That would make these optimizations easier to express, but I have some reservations about the semantics of introducing those nodes. What do you think @dilipbiswal / @gatorsmile ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org