You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Tejas Patil (JIRA)" <ji...@apache.org> on 2016/08/27 03:14:21 UTC

[jira] [Updated] (SPARK-17271) Planner adds un-necessary Sort even if child ordering is semantically same as required ordering

     [ https://issues.apache.org/jira/browse/SPARK-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tejas Patil updated SPARK-17271:
--------------------------------
    Description: 
Found a case when the planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253

`SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects.

eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")`

Expression in required SortOrder:

{code}
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId,
        qualifier = Some("a")
      )
{code}

Expression in child SortOrder:

{code}
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId)
{code}

Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.

  was:
Found a case when the planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253

`SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects.

eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")`

Expression in required SortOrder:

```
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId,
        qualifier = Some("a")
      )
```

Expression in child SortOrder:

```
      AttributeReference(
        name = "col1",
        dataType = LongType,
        nullable = false
      ) (exprId = exprId)
```

Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.


> Planner adds un-necessary Sort even if child ordering is semantically same as required ordering
> -----------------------------------------------------------------------------------------------
>
>                 Key: SPARK-17271
>                 URL: https://issues.apache.org/jira/browse/SPARK-17271
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.2, 2.0.0
>            Reporter: Tejas Patil
>
> Found a case when the planner is adding un-needed SORT operation due to bug in the way comparison for `SortOrder` is done at https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/EnsureRequirements.scala#L253
> `SortOrder` needs to be compared semantically because `Expression` within two `SortOrder` can be "semantically equal" but not literally equal objects.
> eg. In case of `sql("SELECT * FROM table1 a JOIN table2 b ON a.col1=b.col1")`
> Expression in required SortOrder:
> {code}
>       AttributeReference(
>         name = "col1",
>         dataType = LongType,
>         nullable = false
>       ) (exprId = exprId,
>         qualifier = Some("a")
>       )
> {code}
> Expression in child SortOrder:
> {code}
>       AttributeReference(
>         name = "col1",
>         dataType = LongType,
>         nullable = false
>       ) (exprId = exprId)
> {code}
> Notice that the output column has a qualifier but the child attribute does not but the inherent expression is the same and hence in this case we can say that the child satisfies the required sort order.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org