You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by guoxiaolongzte <gi...@git.apache.org> on 2018/11/21 13:37:45 UTC
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
GitHub user guoxiaolongzte opened a pull request:
https://github.com/apache/spark/pull/23104
[SPARK-26138][SQL] LimitPushDown cross join requires maybeBushLocalLimit
## What changes were proposed in this pull request?
In LimitPushDown batch, cross join can push down the limit.
## How was this patch tested?
manual tests
Please review http://spark.apache.org/contributing.html before opening a pull request.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/guoxiaolongzte/spark SPARK-26138
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/23104.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #23104
----
commit deb18093b1a1b1c6b7e6ad1fd148448b761297ea
Author: guoxiaolong <!gxl1811588825>
Date: 2018-11-21T13:36:24Z
[SPARK-26138][SQL] LimitPushDown cross join requires maybeBushLocalLimit
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236589331
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
@cloud-fan
Please give me some advice. Thank you.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236115582
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
A = {(a, 0), (b, 1), (c, 2), (d, 0), **(e, 1), (f, 2)**}
B = {**(e, 1), (f, 2)**}
A inner join B limit 2
If there is limit 2, (a, 0), (b, 1) inner join {(e, 1), (f, 2)}, the result is empty. But the real result is not empty.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236137768
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
@guoxiaolongzte nope.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236114831
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
maybe we can match `InnerLike` when condition is empty.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236143436
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
I think, if when set spark.sql.crossJoin.enabled=true, if Inner join without condition, LeftOuter join without condition, RightOuter join without condition, FullOuter join without condition , limit should be pushed down on both sides, just like cross join limit in this PR.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23104
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:
https://github.com/apache/spark/pull/23104
Yes I tested and understood, you are right. @mgaido91
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:
https://github.com/apache/spark/pull/23104
> The title has a typo.
Sorry, it has been fixed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...
Posted by liu-zhaokun <gi...@git.apache.org>.
Github user liu-zhaokun commented on the issue:
https://github.com/apache/spark/pull/23104
@guoxiaolongzte good job
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236625590
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
if there is no join condition, I think join type doesn't matter and we can always push down limits.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:
https://github.com/apache/spark/pull/23104
OK, I will add some UTs.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236114751
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
how about inner join without condition?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/23104
The title has a typo.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:
https://github.com/apache/spark/pull/23104
Cartesian product refers to the Cartesian product of two sets X and Y in mathematics , also known as direct product , expressed as X × Y , the first object is a member of X and the second object is One of all possible ordered pairs of Y.So cross join mustpush it on the left side.
For example, A={a,b}, B={0,1,2}, then
A × B = {(a, 0), (a, 1), (a, 2), (b, 0), (b, 1), (b, 2)}
B×A={(0, a), (0, b), (1, a), (1, b), (2, a), (2, b)}
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23104
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236137426
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
+1
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/23104
Can one of the admins verify this patch?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236118983
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
inner join without condition is literally cross join.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:
https://github.com/apache/spark/pull/23104
@guoxiaolongzte still that doesn't explain why we can push to the right side too. I do believe that it is possible. If the right side contains more than N items, where N is the limit size, the output will contains the combinations of the first item from the left side and the first N items from the right side. If the right side contains less than N items, pushing the limit on its side has no effect on the result.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236137253
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
When set spark.sql.crossJoin.enabled=true,
inner join without condition, LeftOuter without condition, RightOuter without condition, FullOuter without condition, all these are iterally cross join?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r236929433
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
--- End diff --
There are two tables as follows:
CREATE TABLE `**test1**`(`id` int, `name` int);
CREATE TABLE `**test2**`(`id` int, `name` int);
test1 table data:
2,2
1,1
test2 table data:
2,2
3,3
4,4
Execute sql select * from test1 t1 **left anti join** test2 t2 on t1.id=t2.id limit 1; The result:
1,1
But
we push the limit 1 on left side, the result is not correct. Result is empty.
we push the limit 1 on right side, the result is not correct. Result is empty.
So
left anti join no need to push down limit. Similarly, left semi join is the same logic.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...
Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:
https://github.com/apache/spark/pull/23104
@cloud-fan @dongjoon-hyun @gatorsmile
Help review the code.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...
Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:
https://github.com/apache/spark/pull/23104#discussion_r235392255
--- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
@@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
val newJoin = joinType match {
case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
+ case Cross => join.copy(left = maybePushLocalLimit(exp, left))
--- End diff --
what if the other table is empty?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org