You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Fei Wang (JIRA)" <ji...@apache.org> on 2015/05/01 17:00:07 UTC
[jira] [Updated] (SPARK-7289) Combine Limit and Sort to avoid total
ordering
[ https://issues.apache.org/jira/browse/SPARK-7289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Fei Wang updated SPARK-7289:
----------------------------
Description:
Optimize following sql
select key from (select * from testData order by key) t limit 5
from
== Parsed Logical Plan ==
'Limit 5
'Project ['key]
'Subquery t
'Sort ['key ASC], true
'Project [*]
'UnresolvedRelation [testData], None
== Analyzed Logical Plan ==
Limit 5
Project [key#0]
Subquery t
Sort [key#0 ASC], true
Project [key#0,value#1]
Subquery testData
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Optimized Logical Plan ==
Limit 5
Project [key#0]
Sort [key#0 ASC], true
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Physical Plan ==
Limit 5
Project [key#0]
Sort [key#0 ASC], true
Exchange (RangePartitioning [key#0 ASC], 5), []
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
to
== Parsed Logical Plan ==
'Limit 5
'Project ['key]
'Subquery t
'Sort ['key ASC], true
'Project [*]
'UnresolvedRelation [testData], None
== Analyzed Logical Plan ==
Limit 5
Project [key#0]
Subquery t
Sort [key#0 ASC], true
Project [key#0,value#1]
Subquery testData
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Optimized Logical Plan ==
Project [key#0]
Limit 5
Sort [key#0 ASC], true
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Physical Plan ==
Project [key#0]
TakeOrdered 5, [key#0 ASC]
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
was:
Optimize following sql
`select key from (select * from testData limit 5) t order by key limit 5`
optimize it from
```
== Parsed Logical Plan ==
'Limit 5
'Sort ['key ASC], true
'Project ['key]
'Subquery t
'Limit 5
'Project [*]
'UnresolvedRelation [testData], None
== Analyzed Logical Plan ==
Limit 5
Sort [key#0 ASC], true
Project [key#0]
Subquery t
Limit 5
Project [key#0,value#1]
Subquery testData
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Optimized Logical Plan ==
Limit 5
Sort [key#0 ASC], true
Project [key#0]
Limit 5
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Physical Plan ==
TakeOrdered 5, [key#0 ASC]
Project [key#0]
Limit 5
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
```
to
```
== Parsed Logical Plan ==
'Limit 5
'Sort ['key ASC], true
'Project ['key]
'Subquery t
'Limit 5
'Project [*]
'UnresolvedRelation [testData], None
== Analyzed Logical Plan ==
Limit 5
Sort [key#0 ASC], true
Project [key#0]
Subquery t
Limit 5
Project [key#0,value#1]
Subquery testData
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Optimized Logical Plan ==
Limit 5
Sort [key#0 ASC], true
Project [key#0]
LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
== Physical Plan ==
TakeOrdered 5, [key#0 ASC]
Project [key#0]
PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
```
Summary: Combine Limit and Sort to avoid total ordering (was: push down sort when it's child is Limit)
> Combine Limit and Sort to avoid total ordering
> ----------------------------------------------
>
> Key: SPARK-7289
> URL: https://issues.apache.org/jira/browse/SPARK-7289
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 1.3.1
> Reporter: Fei Wang
>
> Optimize following sql
> select key from (select * from testData order by key) t limit 5
> from
> == Parsed Logical Plan ==
> 'Limit 5
> 'Project ['key]
> 'Subquery t
> 'Sort ['key ASC], true
> 'Project [*]
> 'UnresolvedRelation [testData], None
> == Analyzed Logical Plan ==
> Limit 5
> Project [key#0]
> Subquery t
> Sort [key#0 ASC], true
> Project [key#0,value#1]
> Subquery testData
> LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
> == Optimized Logical Plan ==
> Limit 5
> Project [key#0]
> Sort [key#0 ASC], true
> LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
> == Physical Plan ==
> Limit 5
> Project [key#0]
> Sort [key#0 ASC], true
> Exchange (RangePartitioning [key#0 ASC], 5), []
> PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
> to
> == Parsed Logical Plan ==
> 'Limit 5
> 'Project ['key]
> 'Subquery t
> 'Sort ['key ASC], true
> 'Project [*]
> 'UnresolvedRelation [testData], None
> == Analyzed Logical Plan ==
> Limit 5
> Project [key#0]
> Subquery t
> Sort [key#0 ASC], true
> Project [key#0,value#1]
> Subquery testData
> LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
> == Optimized Logical Plan ==
> Project [key#0]
> Limit 5
> Sort [key#0 ASC], true
> LogicalRDD [key#0,value#1], MapPartitionsRDD[1]
> == Physical Plan ==
> Project [key#0]
> TakeOrdered 5, [key#0 ASC]
> PhysicalRDD [key#0,value#1], MapPartitionsRDD[1]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org