You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by guoxiaolongzte <gi...@git.apache.org> on 2018/11/21 13:37:45 UTC

[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

GitHub user guoxiaolongzte opened a pull request:

    https://github.com/apache/spark/pull/23104

    [SPARK-26138][SQL] LimitPushDown cross join requires maybeBushLocalLimit

    ## What changes were proposed in this pull request?
    
    In LimitPushDown batch, cross join can push down the limit.
    
    ## How was this patch tested?
    
    manual tests
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/guoxiaolongzte/spark SPARK-26138

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/23104.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #23104
    
----
commit deb18093b1a1b1c6b7e6ad1fd148448b761297ea
Author: guoxiaolong <!gxl1811588825>
Date:   2018-11-21T13:36:24Z

    [SPARK-26138][SQL] LimitPushDown cross join requires maybeBushLocalLimit

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236589331
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    @cloud-fan 
    Please give me some advice. Thank you.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236115582
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    A = {(a, 0), (b, 1), (c, 2), (d, 0), **(e, 1), (f, 2)**}
    B = {**(e, 1), (f, 2)**}
    
    A inner join B limit 2
    If there is limit 2, (a, 0), (b, 1) inner join {(e, 1), (f, 2)}, the result is empty. But the real result is not empty.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236137768
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    @guoxiaolongzte nope. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236114831
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    maybe we  can match `InnerLike` when condition is empty.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236143436
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    I think, if when set spark.sql.crossJoin.enabled=true, if Inner join without condition, LeftOuter join without condition, RightOuter join without condition, FullOuter join without condition , limit should be pushed down on both sides, just like cross join limit in this PR.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    Yes I tested and understood, you are right. @mgaido91


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    > The title has a typo.
    
    Sorry, it has been fixed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] Cross join requires push LocalLimit i...

Posted by liu-zhaokun <gi...@git.apache.org>.
Github user liu-zhaokun commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    @guoxiaolongzte  good job


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236625590
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    if there is no join condition, I think join type doesn't matter and we can always push down limits. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    OK,  I will add some UTs.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236114751
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    how about inner join without condition?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    The title has a typo. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    Cartesian product refers to the Cartesian product of two sets X and Y in mathematics , also known as direct product , expressed as X × Y , the first object is a member of X and the second object is One of all possible ordered pairs of Y.So cross join mustpush it on the left side.
       For example, A={a,b}, B={0,1,2}, then
       A × B = {(a, 0), (a, 1), (a, 2), (b, 0), (b, 1), (b, 2)}
       B×A={(0, a), (0, b), (1, a), (1, b), (2, a), (2, b)}


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236137426
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    +1


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    Can one of the admins verify this patch?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236118983
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    inner join without condition is literally cross join.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    @guoxiaolongzte still that doesn't explain why we can push to the right side too. I do believe that it is possible. If the right side contains more than N items, where N is the limit size, the output will contains the combinations of the first item from the left side and the first N items from the right side. If the right side contains less than N items, pushing the limit on its side has no effect on the result.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236137253
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    When set  spark.sql.crossJoin.enabled=true, 
    inner join without condition, LeftOuter without condition, RightOuter without condition, FullOuter without condition, all these are  iterally cross join?
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] Cross join requires push Local...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r236929433
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left), right = maybePushLocalLimit(exp, right))
    --- End diff --
    
    There are two tables as follows:
    CREATE TABLE `**test1**`(`id` int, `name` int);
    CREATE TABLE `**test2**`(`id` int, `name` int);
    
    test1 table data:
    2,2
    1,1
    
    test2 table data:
    2,2
    3,3
    4,4
    
    Execute sql select * from test1 t1 **left anti join** test2 t2 on t1.id=t2.id limit 1; The result:
    1,1
    
    But 
       we push the limit 1 on left side, the result is not correct. Result is empty.
       we push the limit 1 on right side, the result is not correct. Result is empty.
    
    So
        left anti join no need to push down limit. Similarly, left semi join is the same logic.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #23104: [SPARK-26138][SQL] LimitPushDown cross join requires may...

Posted by guoxiaolongzte <gi...@git.apache.org>.
Github user guoxiaolongzte commented on the issue:

    https://github.com/apache/spark/pull/23104
  
    @cloud-fan @dongjoon-hyun @gatorsmile
    Help review the code.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #23104: [SPARK-26138][SQL] LimitPushDown cross join requi...

Posted by mgaido91 <gi...@git.apache.org>.
Github user mgaido91 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23104#discussion_r235392255
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -459,6 +459,7 @@ object LimitPushDown extends Rule[LogicalPlan] {
           val newJoin = joinType match {
             case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
             case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
    +        case Cross => join.copy(left = maybePushLocalLimit(exp, left))
    --- End diff --
    
    what if the other table is empty?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org