You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/03/04 16:41:16 UTC

[GitHub] [spark] wangyum opened a new pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

wangyum opened a new pull request #31739:
URL: https://github.com/apache/spark/pull/31739


   ### What changes were proposed in this pull request?
   
   These is a `Project` between `LocalLimit` and `Join` if `Join`'s output do not match the `LocalLimit`'s output. For example:
   ```scala
   spark.sql("create table t1(a int, b int, c int) using parquet")
   spark.sql("create table t2(x int, y int, z int) using parquet")
   spark.sql("select a from t1 left join t2 on a = x and b = y limit 5").explain("extended")
   ```
   
   ```
   == Optimized Logical Plan ==
   GlobalLimit 5
   +- LocalLimit 5
      +- Project [a#0]
         +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))
            :- Project [a#0, b#1]
            :  +- Relation default.t1[a#0,b#1,c#2] parquet
            +- Project [x#3, y#4]
               +- Filter (isnotnull(x#3) AND isnotnull(y#4))
                  +- Relation default.t2[x#3,y#4,z#5] parquet
   ```
   
   Thus, `LimitPushDown` can not optimize the query. This pr fix this issue.
   
   
   ### Why are the changes needed?
   
   Fix bug.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   
   ### How was this patch tested?
   
   Unit test.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-805412344


   @maropu Updated it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r599825392



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -597,31 +623,11 @@ object LimitPushDown extends Rule[LogicalPlan] {
     // pushdown Limit.
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _))))
-    // Add extra limits below JOIN:
-    // 1. For LEFT OUTER and RIGHT OUTER JOIN, we push limits to the left and right sides,
-    //    respectively.
-    // 2. For INNER and CROSS JOIN, we push limits to both the left and right sides if join
-    //    condition is empty.
-    // 3. For LEFT SEMI and LEFT ANTI JOIN, we push limits to the left side if join condition
-    //    is empty.
-    // It's not safe to push limits below FULL OUTER JOIN in the general case without a more
-    // invasive rewrite. We also need to ensure that this limit pushdown rule will not eventually
-    // introduce limits on both sides if it is applied multiple times. Therefore:
-    //   - If one side is already limited, stack another limit on top if the new limit is smaller.
-    //     The redundant limit will be collapsed by the CombineLimits rule.
-    case LocalLimit(exp, join @ Join(left, right, joinType, conditionOpt, _)) =>
-      val newJoin = joinType match {
-        case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
-        case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
-        case _: InnerLike if conditionOpt.isEmpty =>
-          join.copy(
-            left = maybePushLocalLimit(exp, left),
-            right = maybePushLocalLimit(exp, right))
-        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
-          join.copy(left = maybePushLocalLimit(exp, left))
-        case _ => join
-      }
-      LocalLimit(exp, newJoin)
+
+    case LocalLimit(exp, join: Join) =>
+      LocalLimit(exp, pushLocalLimitThroughJoin(exp, join))
+    case LocalLimit(exp, project @ Project(_, join: Join)) =>

Review comment:
       @wangyum and @cloud-fan - I think one followup can be to pass `LocalLimit` through eligible operators. We don't add more `LocalLimit` operators, but push the existing `LocalLimit` further down in query plan.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r599153181



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -597,31 +623,11 @@ object LimitPushDown extends Rule[LogicalPlan] {
     // pushdown Limit.
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _))))
-    // Add extra limits below JOIN:
-    // 1. For LEFT OUTER and RIGHT OUTER JOIN, we push limits to the left and right sides,
-    //    respectively.
-    // 2. For INNER and CROSS JOIN, we push limits to both the left and right sides if join
-    //    condition is empty.
-    // 3. For LEFT SEMI and LEFT ANTI JOIN, we push limits to the left side if join condition
-    //    is empty.
-    // It's not safe to push limits below FULL OUTER JOIN in the general case without a more
-    // invasive rewrite. We also need to ensure that this limit pushdown rule will not eventually
-    // introduce limits on both sides if it is applied multiple times. Therefore:
-    //   - If one side is already limited, stack another limit on top if the new limit is smaller.
-    //     The redundant limit will be collapsed by the CombineLimits rule.
-    case LocalLimit(exp, join @ Join(left, right, joinType, conditionOpt, _)) =>
-      val newJoin = joinType match {
-        case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
-        case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
-        case _: InnerLike if conditionOpt.isEmpty =>
-          join.copy(
-            left = maybePushLocalLimit(exp, left),
-            right = maybePushLocalLimit(exp, right))
-        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
-          join.copy(left = maybePushLocalLimit(exp, left))
-        case _ => join
-      }
-      LocalLimit(exp, newJoin)
+
+    case LocalLimit(exp, join: Join) =>
+      LocalLimit(exp, pushLocalLimitThroughJoin(exp, join))
+    case LocalLimit(exp, project @ Project(_, join: Join)) =>

Review comment:
       This will introduce useless pushdown. For example:
   ```scala
   spark.range(200L).selectExpr("id AS a", "id AS b").write.saveAsTable("t1")
   spark.range(300L).selectExpr("id AS x", "id AS y").write.saveAsTable("t2")
   spark.sql("SELECT 1 FROM t1 INNER JOIN t2 ON a = x limit 10").explain(true)
   ```
   
   ```
   == Optimized Logical Plan ==
   GlobalLimit 10
   +- LocalLimit 10
      +- Project [1 AS 1#20]
         +- LocalLimit 10
            +- Project
               +- Join Inner, (a#16L = x#18L)
                  :- Project [a#16L]
                  :  +- Filter isnotnull(a#16L)
                  :     +- Relation default.t1[a#16L,b#17L] parquet
                  +- Project [x#18L]
                     +- Filter isnotnull(x#18L)
                        +- Relation default.t2[x#18L,y#19L] parquet
   
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- CollectLimit 10
      +- Project [1 AS 1#20]
         +- LocalLimit 10
            +- Project
               +- BroadcastHashJoin [a#16L], [x#18L], Inner, BuildLeft, false
                  :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#68]
                  :  +- Filter isnotnull(a#16L)
                  :     +- FileScan parquet default.t1[a#16L] 
                  +- Filter isnotnull(x#18L)
                     +- FileScan parquet default.t2[x#18L]
   
   ```
   
   Another example is TPC-DS q32:
   https://github.com/apache/spark/blob/66f5a42ca5d259038f0749ae2b9a04cc2f658880/sql/core/src/test/resources/tpcds/q32.sql#L1-L15
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791113900


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40366/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-805359628


   > Last question: do we push down limit through project?
   
   It seems no difference:
   ```scala
   spark.range(2000L).selectExpr("id AS a", "id AS b").write.saveAsTable("t1")
   spark.sql("select a, java_method('java.lang.Thread', 'sleep', 3000L) from t1 limit 5").show()
   spark.sql("select a, java_method('java.lang.Thread', 'sleep', 3000L) from (select * from t1 limit 5) t limit 5").show()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791934651


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40409/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791931580


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40409/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803409770






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791154989


   **[Test build #135784 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135784/testReport)** for PR 31739 at commit [`c84cd14`](https://github.com/apache/spark/commit/c84cd1445bbb23b3d75be094ccdb50761a605c6a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-805411820


   >> Why are the changes needed?
   > Fix bug.
   
   Btw, not a bug but an improvement?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809385015


   **[Test build #136658 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136658/testReport)** for PR 31739 at commit [`4c7bc54`](https://github.com/apache/spark/commit/4c7bc54c69ca347c01d84ac8d2ab54033d96aadf).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803409770


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40869/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791374608


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40391/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803389532


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40869/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r598844553



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -597,31 +623,11 @@ object LimitPushDown extends Rule[LogicalPlan] {
     // pushdown Limit.
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _))))
-    // Add extra limits below JOIN:
-    // 1. For LEFT OUTER and RIGHT OUTER JOIN, we push limits to the left and right sides,
-    //    respectively.
-    // 2. For INNER and CROSS JOIN, we push limits to both the left and right sides if join
-    //    condition is empty.
-    // 3. For LEFT SEMI and LEFT ANTI JOIN, we push limits to the left side if join condition
-    //    is empty.
-    // It's not safe to push limits below FULL OUTER JOIN in the general case without a more
-    // invasive rewrite. We also need to ensure that this limit pushdown rule will not eventually
-    // introduce limits on both sides if it is applied multiple times. Therefore:
-    //   - If one side is already limited, stack another limit on top if the new limit is smaller.
-    //     The redundant limit will be collapsed by the CombineLimits rule.
-    case LocalLimit(exp, join @ Join(left, right, joinType, conditionOpt, _)) =>
-      val newJoin = joinType match {
-        case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
-        case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
-        case _: InnerLike if conditionOpt.isEmpty =>
-          join.copy(
-            left = maybePushLocalLimit(exp, left),
-            right = maybePushLocalLimit(exp, right))
-        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
-          join.copy(left = maybePushLocalLimit(exp, left))
-        case _ => join
-      }
-      LocalLimit(exp, newJoin)
+
+    case LocalLimit(exp, join: Join) =>
+      LocalLimit(exp, pushLocalLimitThroughJoin(exp, join))
+    case LocalLimit(exp, project @ Project(_, join: Join)) =>

Review comment:
       Sounds like a good idea. We can have an allowlist of unary operators.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-790973414


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135765/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791926469


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40409/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r588802821



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt
##########
@@ -1,35 +1,37 @@
 == Physical Plan ==
-CollectLimit (31)
-+- * Project (30)
-   +- * BroadcastHashJoin Inner BuildRight (29)
-      :- * Project (27)
-      :  +- * BroadcastHashJoin Inner BuildLeft (26)
-      :     :- BroadcastExchange (22)
-      :     :  +- * Project (21)
-      :     :     +- * BroadcastHashJoin Inner BuildLeft (20)
-      :     :        :- BroadcastExchange (5)
-      :     :        :  +- * Project (4)
-      :     :        :     +- * Filter (3)
-      :     :        :        +- * ColumnarToRow (2)
-      :     :        :           +- Scan parquet default.item (1)
-      :     :        +- * Filter (19)
-      :     :           +- * HashAggregate (18)
-      :     :              +- Exchange (17)
-      :     :                 +- * HashAggregate (16)
-      :     :                    +- * Project (15)
-      :     :                       +- * BroadcastHashJoin Inner BuildRight (14)
-      :     :                          :- * Filter (8)
-      :     :                          :  +- * ColumnarToRow (7)
-      :     :                          :     +- Scan parquet default.catalog_sales (6)
-      :     :                          +- BroadcastExchange (13)
-      :     :                             +- * Project (12)
-      :     :                                +- * Filter (11)
-      :     :                                   +- * ColumnarToRow (10)
-      :     :                                      +- Scan parquet default.date_dim (9)
-      :     +- * Filter (25)
-      :        +- * ColumnarToRow (24)
-      :           +- Scan parquet default.catalog_sales (23)
-      +- ReusedExchange (28)
+CollectLimit (33)
++- * Project (32)
+   +- * LocalLimit (31)
+      +- * Project (30)
+         +- * BroadcastHashJoin Inner BuildRight (29)

Review comment:
       > Its child is Inner join and join condition is not empty. We can not pushdown limit through it.
   
   I understand that. My question is why `LocalLimit` is in the middle of two `Project`? Shouldn't it be 
   
   ```
   Project
   - Project
     - LocalLimit
       - BroadcastHashJoin
   ```
   
   I might miss something.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809654395


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-790973414


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135765/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791111934


   Kubernetes integration test status success
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40366/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791337871


   **[Test build #135809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135809/testReport)** for PR 31739 at commit [`2f5865d`](https://github.com/apache/spark/commit/2f5865d43a3eeacf76760cfd103269a33134b97a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r588800758



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt
##########
@@ -1,35 +1,37 @@
 == Physical Plan ==
-CollectLimit (31)
-+- * Project (30)
-   +- * BroadcastHashJoin Inner BuildRight (29)
-      :- * Project (27)
-      :  +- * BroadcastHashJoin Inner BuildLeft (26)
-      :     :- BroadcastExchange (22)
-      :     :  +- * Project (21)
-      :     :     +- * BroadcastHashJoin Inner BuildLeft (20)
-      :     :        :- BroadcastExchange (5)
-      :     :        :  +- * Project (4)
-      :     :        :     +- * Filter (3)
-      :     :        :        +- * ColumnarToRow (2)
-      :     :        :           +- Scan parquet default.item (1)
-      :     :        +- * Filter (19)
-      :     :           +- * HashAggregate (18)
-      :     :              +- Exchange (17)
-      :     :                 +- * HashAggregate (16)
-      :     :                    +- * Project (15)
-      :     :                       +- * BroadcastHashJoin Inner BuildRight (14)
-      :     :                          :- * Filter (8)
-      :     :                          :  +- * ColumnarToRow (7)
-      :     :                          :     +- Scan parquet default.catalog_sales (6)
-      :     :                          +- BroadcastExchange (13)
-      :     :                             +- * Project (12)
-      :     :                                +- * Filter (11)
-      :     :                                   +- * ColumnarToRow (10)
-      :     :                                      +- Scan parquet default.date_dim (9)
-      :     +- * Filter (25)
-      :        +- * ColumnarToRow (24)
-      :           +- Scan parquet default.catalog_sales (23)
-      +- ReusedExchange (28)
+CollectLimit (33)
++- * Project (32)
+   +- * LocalLimit (31)
+      +- * Project (30)
+         +- * BroadcastHashJoin Inner BuildRight (29)

Review comment:
       Its child is Inner join and join condition is not empty. We can not pushdown limit through it.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-790768088


   **[Test build #135765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135765/testReport)** for PR 31739 at commit [`b1eee39`](https://github.com/apache/spark/commit/b1eee39d0b9f6b11ea428d5ad93ff6b47983416e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791094631


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40366/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809479816


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136658/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791337871


   **[Test build #135809 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135809/testReport)** for PR 31739 at commit [`2f5865d`](https://github.com/apache/spark/commit/2f5865d43a3eeacf76760cfd103269a33134b97a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791075337


   **[Test build #135784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135784/testReport)** for PR 31739 at commit [`c84cd14`](https://github.com/apache/spark/commit/c84cd1445bbb23b3d75be094ccdb50761a605c6a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r588842455



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt
##########
@@ -1,35 +1,37 @@
 == Physical Plan ==
-CollectLimit (31)
-+- * Project (30)
-   +- * BroadcastHashJoin Inner BuildRight (29)
-      :- * Project (27)
-      :  +- * BroadcastHashJoin Inner BuildLeft (26)
-      :     :- BroadcastExchange (22)
-      :     :  +- * Project (21)
-      :     :     +- * BroadcastHashJoin Inner BuildLeft (20)
-      :     :        :- BroadcastExchange (5)
-      :     :        :  +- * Project (4)
-      :     :        :     +- * Filter (3)
-      :     :        :        +- * ColumnarToRow (2)
-      :     :        :           +- Scan parquet default.item (1)
-      :     :        +- * Filter (19)
-      :     :           +- * HashAggregate (18)
-      :     :              +- Exchange (17)
-      :     :                 +- * HashAggregate (16)
-      :     :                    +- * Project (15)
-      :     :                       +- * BroadcastHashJoin Inner BuildRight (14)
-      :     :                          :- * Filter (8)
-      :     :                          :  +- * ColumnarToRow (7)
-      :     :                          :     +- Scan parquet default.catalog_sales (6)
-      :     :                          +- BroadcastExchange (13)
-      :     :                             +- * Project (12)
-      :     :                                +- * Filter (11)
-      :     :                                   +- * ColumnarToRow (10)
-      :     :                                      +- Scan parquet default.date_dim (9)
-      :     +- * Filter (25)
-      :        +- * ColumnarToRow (24)
-      :           +- Scan parquet default.catalog_sales (23)
-      +- ReusedExchange (28)
+CollectLimit (33)
++- * Project (32)
+   +- * LocalLimit (31)
+      +- * Project (30)
+         +- * BroadcastHashJoin Inner BuildRight (29)

Review comment:
       It removed by `ColumnPruning `:
   ```
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.ColumnPruning ===
    GlobalLimit 10                                                      GlobalLimit 10
    +- LocalLimit 10                                                    +- LocalLimit 10
       +- Project [1 AS 1#10]                                              +- Project [1 AS 1#10]
          +- LocalLimit 10                                                    +- LocalLimit 10
             +- Project                                                          +- Project
                +- LocalLimit 10                                                    +- LocalLimit 10
   !               +- Join Inner, (a#6L = a#8L)                                        +- Project
   !                  :- Project [a#6L]                                                   +- Join Inner, (a#6L = a#8L)
   !                  :  +- Filter isnotnull(a#6L)                                           :- Project [a#6L]
   !                  :     +- Relation default.t1[a#6L,b#7L] parquet                        :  +- Filter isnotnull(a#6L)
   !                  +- Project [a#8L]                                                      :     +- Relation default.t1[a#6L,b#7L] parquet
   !                     +- Filter isnotnull(a#8L)                                           +- Project [a#8L]
   !                        +- Relation default.t2[a#8L,b#9L] parquet                           +- Filter isnotnull(a#8L)
   !                                                                                               +- Relation default.t2[a#8L,b#9L] parquet
    
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r587943938



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala
##########
@@ -230,4 +230,13 @@ class LimitPushdownSuite extends PlanTest {
       comparePlans(optimized, correctAnswer)
     }
   }
+
+  test("SPARK-34622 Fix Push down limit through join if join output is not match the LocalLimit") {

Review comment:
       fixed




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791913491


   **[Test build #135827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135827/testReport)** for PR 31739 at commit [`dc2deba`](https://github.com/apache/spark/commit/dc2debaab79fe720774d428039d5f34ed0f0d384).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791113900


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40366/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-797612905


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136013/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-790947181


   **[Test build #135765 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135765/testReport)** for PR 31739 at commit [`b1eee39`](https://github.com/apache/spark/commit/b1eee39d0b9f6b11ea428d5ad93ff6b47983416e).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791403394


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135809/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu closed pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
maropu closed pull request #31739:
URL: https://github.com/apache/spark/pull/31739


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809649069


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791368890


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40391/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-797611779


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40597/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791354140


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40391/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809654148


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791374608


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40391/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791155542


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135784/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803397013


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/40869/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r599338879



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -597,31 +623,11 @@ object LimitPushDown extends Rule[LogicalPlan] {
     // pushdown Limit.
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _))))
-    // Add extra limits below JOIN:
-    // 1. For LEFT OUTER and RIGHT OUTER JOIN, we push limits to the left and right sides,
-    //    respectively.
-    // 2. For INNER and CROSS JOIN, we push limits to both the left and right sides if join
-    //    condition is empty.
-    // 3. For LEFT SEMI and LEFT ANTI JOIN, we push limits to the left side if join condition
-    //    is empty.
-    // It's not safe to push limits below FULL OUTER JOIN in the general case without a more
-    // invasive rewrite. We also need to ensure that this limit pushdown rule will not eventually
-    // introduce limits on both sides if it is applied multiple times. Therefore:
-    //   - If one side is already limited, stack another limit on top if the new limit is smaller.
-    //     The redundant limit will be collapsed by the CombineLimits rule.
-    case LocalLimit(exp, join @ Join(left, right, joinType, conditionOpt, _)) =>
-      val newJoin = joinType match {
-        case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
-        case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
-        case _: InnerLike if conditionOpt.isEmpty =>
-          join.copy(
-            left = maybePushLocalLimit(exp, left),
-            right = maybePushLocalLimit(exp, right))
-        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
-          join.copy(left = maybePushLocalLimit(exp, left))
-        case _ => join
-      }
-      LocalLimit(exp, newJoin)
+
+    case LocalLimit(exp, join: Join) =>
+      LocalLimit(exp, pushLocalLimitThroughJoin(exp, join))
+    case LocalLimit(exp, project @ Project(_, join: Join)) =>

Review comment:
       @c21 @maropu @cloud-fan 
   I think `Project` is the most used, and this will not introduce useless `LocalLimit`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-797639947






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r587944814



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -597,31 +623,11 @@ object LimitPushDown extends Rule[LogicalPlan] {
     // pushdown Limit.
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _))))
-    // Add extra limits below JOIN:
-    // 1. For LEFT OUTER and RIGHT OUTER JOIN, we push limits to the left and right sides,
-    //    respectively.
-    // 2. For INNER and CROSS JOIN, we push limits to both the left and right sides if join
-    //    condition is empty.
-    // 3. For LEFT SEMI and LEFT ANTI JOIN, we push limits to the left side if join condition
-    //    is empty.
-    // It's not safe to push limits below FULL OUTER JOIN in the general case without a more
-    // invasive rewrite. We also need to ensure that this limit pushdown rule will not eventually
-    // introduce limits on both sides if it is applied multiple times. Therefore:
-    //   - If one side is already limited, stack another limit on top if the new limit is smaller.
-    //     The redundant limit will be collapsed by the CombineLimits rule.
-    case LocalLimit(exp, join @ Join(left, right, joinType, conditionOpt, _)) =>
-      val newJoin = joinType match {
-        case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
-        case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
-        case _: InnerLike if conditionOpt.isEmpty =>
-          join.copy(
-            left = maybePushLocalLimit(exp, left),
-            right = maybePushLocalLimit(exp, right))
-        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
-          join.copy(left = maybePushLocalLimit(exp, left))
-        case _ => join
-      }
-      LocalLimit(exp, newJoin)
+
+    case LocalLimit(exp, join: Join) =>
+      LocalLimit(exp, pushLocalLimitThroughJoin(exp, join))
+    case LocalLimit(exp, project @ Project(_, join: Join)) =>

Review comment:
       I change it to `case LocalLimit(exp, project @ Project(_, child)) if !child.isInstanceOf[LeafNode] =>` to avoid do some useless work if direct pushdown limit through Project:
   ```
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.LimitPushDown ===
    GlobalLimit 5                                                GlobalLimit 5
    +- LocalLimit 5                                              +- LocalLimit 5
       +- Project [a#0]                                             +- Project [a#0]
   !      +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))             +- LocalLimit 5
   !         :- LocalLimit 5                                              +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))
   !         :  +- Project [a#0, b#1]                                        :- LocalLimit 5
   !         :     +- Relation default.t1[a#0,b#1,c#2] parquet               :  +- Project [a#0, b#1]
   !         +- Project [x#3, y#4]                                           :     +- LocalLimit 5
   !            +- Filter (isnotnull(x#3) AND isnotnull(y#4))                :        +- Relation default.t1[a#0,b#1,c#2] parquet
   !               +- Relation default.t2[x#3,y#4,z#5] parquet               +- Project [x#3, y#4]
   !                                                                            +- Filter (isnotnull(x#3) AND isnotnull(y#4))
   !                                                                               +- Relation default.t2[x#3,y#4,z#5] parquet
   
   
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.CollapseProject ===
    GlobalLimit 5                                                            GlobalLimit 5
    +- LocalLimit 5                                                          +- LocalLimit 5
   !   +- Project [a#0]                                                         +- LocalLimit 5
   !      +- LocalLimit 5                                                          +- Project [a#0]
   !         +- Project [a#0]                                                         +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))
   !            +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))                         :- LocalLimit 5
   !               :- LocalLimit 5                                                       :  +- LocalLimit 5
   !               :  +- Project [a#0, b#1]                                              :     +- Project [a#0, b#1]
   !               :     +- LocalLimit 5                                                 :        +- Relation default.t1[a#0,b#1,c#2] parquet
   !               :        +- Project [a#0, b#1]                                        +- Project [x#3, y#4]
   !               :           +- Relation default.t1[a#0,b#1,c#2] parquet                  +- Filter (isnotnull(x#3) AND isnotnull(y#4))
   !               +- Project [x#3, y#4]                                                       +- Relation default.t2[x#3,y#4,z#5] parquet
   !                  +- Filter (isnotnull(x#3) AND isnotnull(y#4))          
   !                     +- Relation default.t2[x#3,y#4,z#5] parquet         
              
   === Applying Rule org.apache.spark.sql.catalyst.optimizer.EliminateLimits ===
    GlobalLimit 5                                                      GlobalLimit 5
   !+- LocalLimit 5                                                    +- LocalLimit least(5, 5)
   !   +- LocalLimit 5                                                    +- Project [a#0]
   !      +- Project [a#0]                                                   +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))
   !         +- Join LeftOuter, ((a#0 = x#3) AND (b#1 = y#4))                   :- LocalLimit least(5, 5)
   !            :- LocalLimit 5                                                 :  +- Project [a#0, b#1]
   !            :  +- LocalLimit 5                                              :     +- Relation default.t1[a#0,b#1,c#2] parquet
   !            :     +- Project [a#0, b#1]                                     +- Project [x#3, y#4]
   !            :        +- Relation default.t1[a#0,b#1,c#2] parquet               +- Filter (isnotnull(x#3) AND isnotnull(y#4))
   !            +- Project [x#3, y#4]                                                 +- Relation default.t2[x#3,y#4,z#5] parquet
   !               +- Filter (isnotnull(x#3) AND isnotnull(y#4))       
   !                  +- Relation default.t2[x#3,y#4,z#5] parquet      
    
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791075337


   **[Test build #135784 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135784/testReport)** for PR 31739 at commit [`c84cd14`](https://github.com/apache/spark/commit/c84cd1445bbb23b3d75be094ccdb50761a605c6a).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803373191


   **[Test build #136287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136287/testReport)** for PR 31739 at commit [`60520e9`](https://github.com/apache/spark/commit/60520e958043f8263daf59ac42117ca1fae243f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-797612905






----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on a change in pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
maropu commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r587914242



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/LimitPushdownSuite.scala
##########
@@ -230,4 +230,13 @@ class LimitPushdownSuite extends PlanTest {
       comparePlans(optimized, correctAnswer)
     }
   }
+
+  test("SPARK-34622 Fix Push down limit through join if join output is not match the LocalLimit") {

Review comment:
       nit: `SPARK-34622: `




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-797611779


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40597/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r588779803



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt
##########
@@ -1,35 +1,37 @@
 == Physical Plan ==
-CollectLimit (31)
-+- * Project (30)
-   +- * BroadcastHashJoin Inner BuildRight (29)
-      :- * Project (27)
-      :  +- * BroadcastHashJoin Inner BuildLeft (26)
-      :     :- BroadcastExchange (22)
-      :     :  +- * Project (21)
-      :     :     +- * BroadcastHashJoin Inner BuildLeft (20)
-      :     :        :- BroadcastExchange (5)
-      :     :        :  +- * Project (4)
-      :     :        :     +- * Filter (3)
-      :     :        :        +- * ColumnarToRow (2)
-      :     :        :           +- Scan parquet default.item (1)
-      :     :        +- * Filter (19)
-      :     :           +- * HashAggregate (18)
-      :     :              +- Exchange (17)
-      :     :                 +- * HashAggregate (16)
-      :     :                    +- * Project (15)
-      :     :                       +- * BroadcastHashJoin Inner BuildRight (14)
-      :     :                          :- * Filter (8)
-      :     :                          :  +- * ColumnarToRow (7)
-      :     :                          :     +- Scan parquet default.catalog_sales (6)
-      :     :                          +- BroadcastExchange (13)
-      :     :                             +- * Project (12)
-      :     :                                +- * Filter (11)
-      :     :                                   +- * ColumnarToRow (10)
-      :     :                                      +- Scan parquet default.date_dim (9)
-      :     +- * Filter (25)
-      :        +- * ColumnarToRow (24)
-      :           +- Scan parquet default.catalog_sales (23)
-      +- ReusedExchange (28)
+CollectLimit (33)
++- * Project (32)
+   +- * LocalLimit (31)
+      +- * Project (30)
+         +- * BroadcastHashJoin Inner BuildRight (29)

Review comment:
       I am wondering why `LocalLimit` does not push through second project? (`Project (30)`)




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791155542


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135784/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] maropu commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
maropu commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809842685


   Thanks, all! Merged to master.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803373191


   **[Test build #136287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136287/testReport)** for PR 31739 at commit [`60520e9`](https://github.com/apache/spark/commit/60520e958043f8263daf59ac42117ca1fae243f8).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791978098


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135827/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-805053581


   Last question: do we push down limit through project?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r588183649



##########
File path: sql/core/src/test/resources/tpcds-plan-stability/approved-plans-v1_4/q32.sf100/explain.txt
##########
@@ -1,35 +1,37 @@
 == Physical Plan ==
-CollectLimit (31)
-+- * Project (30)
-   +- * BroadcastHashJoin Inner BuildRight (29)
-      :- * Project (27)
-      :  +- * BroadcastHashJoin Inner BuildLeft (26)
-      :     :- BroadcastExchange (22)
-      :     :  +- * Project (21)
-      :     :     +- * BroadcastHashJoin Inner BuildLeft (20)
-      :     :        :- BroadcastExchange (5)
-      :     :        :  +- * Project (4)
-      :     :        :     +- * Filter (3)
-      :     :        :        +- * ColumnarToRow (2)
-      :     :        :           +- Scan parquet default.item (1)
-      :     :        +- * Filter (19)
-      :     :           +- * HashAggregate (18)
-      :     :              +- Exchange (17)
-      :     :                 +- * HashAggregate (16)
-      :     :                    +- * Project (15)
-      :     :                       +- * BroadcastHashJoin Inner BuildRight (14)
-      :     :                          :- * Filter (8)
-      :     :                          :  +- * ColumnarToRow (7)
-      :     :                          :     +- Scan parquet default.catalog_sales (6)
-      :     :                          +- BroadcastExchange (13)
-      :     :                             +- * Project (12)
-      :     :                                +- * Filter (11)
-      :     :                                   +- * ColumnarToRow (10)
-      :     :                                      +- Scan parquet default.date_dim (9)
-      :     +- * Filter (25)
-      :        +- * ColumnarToRow (24)
-      :           +- Scan parquet default.catalog_sales (23)
-      +- ReusedExchange (28)
+CollectLimit (33)
++- * Project (32)
+   +- * LocalLimit (31)
+      +- * Project (30)
+         +- * BroadcastHashJoin Inner BuildRight (29)

Review comment:
       @c21 @maropu Push down `LocalLimit` through `Project` can not benefit this case.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809479816


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136658/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803455382


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/136287/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791913491


   **[Test build #135827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135827/testReport)** for PR 31739 at commit [`dc2deba`](https://github.com/apache/spark/commit/dc2debaab79fe720774d428039d5f34ed0f0d384).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791388621


   **[Test build #135809 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135809/testReport)** for PR 31739 at commit [`2f5865d`](https://github.com/apache/spark/commit/2f5865d43a3eeacf76760cfd103269a33134b97a).
    * This patch **fails Spark unit tests**.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791934651


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40409/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791403394


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135809/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-790779417


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40348/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-790779417


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/40348/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on a change in pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
wangyum commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r599153181



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -597,31 +623,11 @@ object LimitPushDown extends Rule[LogicalPlan] {
     // pushdown Limit.
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _))))
-    // Add extra limits below JOIN:
-    // 1. For LEFT OUTER and RIGHT OUTER JOIN, we push limits to the left and right sides,
-    //    respectively.
-    // 2. For INNER and CROSS JOIN, we push limits to both the left and right sides if join
-    //    condition is empty.
-    // 3. For LEFT SEMI and LEFT ANTI JOIN, we push limits to the left side if join condition
-    //    is empty.
-    // It's not safe to push limits below FULL OUTER JOIN in the general case without a more
-    // invasive rewrite. We also need to ensure that this limit pushdown rule will not eventually
-    // introduce limits on both sides if it is applied multiple times. Therefore:
-    //   - If one side is already limited, stack another limit on top if the new limit is smaller.
-    //     The redundant limit will be collapsed by the CombineLimits rule.
-    case LocalLimit(exp, join @ Join(left, right, joinType, conditionOpt, _)) =>
-      val newJoin = joinType match {
-        case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
-        case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
-        case _: InnerLike if conditionOpt.isEmpty =>
-          join.copy(
-            left = maybePushLocalLimit(exp, left),
-            right = maybePushLocalLimit(exp, right))
-        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
-          join.copy(left = maybePushLocalLimit(exp, left))
-        case _ => join
-      }
-      LocalLimit(exp, newJoin)
+
+    case LocalLimit(exp, join: Join) =>
+      LocalLimit(exp, pushLocalLimitThroughJoin(exp, join))
+    case LocalLimit(exp, project @ Project(_, join: Join)) =>

Review comment:
       It will introduce useless pushdown even only allow `Join`.
   ```scala
       case LocalLimit(exp, p: Project) if p.child.isInstanceOf[Join] =>
         LocalLimit(exp, p.copy(child = maybePushLocalLimit(exp, p.child)))
   ```
   For example:
   ```scala
   spark.range(200L).selectExpr("id AS a", "id AS b").write.saveAsTable("t1")
   spark.range(300L).selectExpr("id AS x", "id AS y").write.saveAsTable("t2")
   spark.sql("SELECT 1 FROM t1 INNER JOIN t2 ON a = x limit 10").explain(true)
   ```
   
   ```
   == Optimized Logical Plan ==
   GlobalLimit 10
   +- LocalLimit 10
      +- Project [1 AS 1#20]
         +- LocalLimit 10
            +- Project
               +- Join Inner, (a#16L = x#18L)
                  :- Project [a#16L]
                  :  +- Filter isnotnull(a#16L)
                  :     +- Relation default.t1[a#16L,b#17L] parquet
                  +- Project [x#18L]
                     +- Filter isnotnull(x#18L)
                        +- Relation default.t2[x#18L,y#19L] parquet
   
   == Physical Plan ==
   AdaptiveSparkPlan isFinalPlan=false
   +- CollectLimit 10
      +- Project [1 AS 1#20]
         +- LocalLimit 10
            +- Project
               +- BroadcastHashJoin [a#16L], [x#18L], Inner, BuildLeft, false
                  :- BroadcastExchange HashedRelationBroadcastMode(List(input[0, bigint, false]),false), [id=#68]
                  :  +- Filter isnotnull(a#16L)
                  :     +- FileScan parquet default.t1[a#16L] 
                  +- Filter isnotnull(x#18L)
                     +- FileScan parquet default.t2[x#18L]
   
   ```
   
   Another example is TPC-DS q32:
   https://github.com/apache/spark/blob/66f5a42ca5d259038f0749ae2b9a04cc2f658880/sql/core/src/test/resources/tpcds/q32.sql#L1-L15
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-790768088


   **[Test build #135765 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135765/testReport)** for PR 31739 at commit [`b1eee39`](https://github.com/apache/spark/commit/b1eee39d0b9f6b11ea428d5ad93ff6b47983416e).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum edited a comment on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
wangyum edited a comment on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-805359628


   > Last question: do we push down limit through project?
   
   It seems push down limit through project no benefit:
   ```scala
   spark.range(2000L).selectExpr("id AS a", "id AS b").write.saveAsTable("t1")
   spark.sql("select a, java_method('java.lang.Thread', 'sleep', 3000L) from t1 limit 5").show()
   spark.sql("select a, java_method('java.lang.Thread', 'sleep', 3000L) from (select * from t1 limit 5) t limit 5").show()
   ```
   
   ```
   == Optimized Logical Plan ==
   GlobalLimit 5
   +- LocalLimit 5
      +- Project [a#16L, java_method(java.lang.Thread, sleep, 3000) AS java_method(java.lang.Thread, sleep, 3000)#18]
         +- Relation default.t1[a#16L,b#17L] parquet
   ```
   
   ```
   == Optimized Logical Plan ==
   Project [a#16L, java_method(java.lang.Thread, sleep, 3000) AS java_method(java.lang.Thread, sleep, 3000)#21]
   +- GlobalLimit 5
      +- LocalLimit 5
         +- Project [a#16L]
            +- Relation default.t1[a#16L,b#17L] parquet
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791978098


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/135827/
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] c21 commented on a change in pull request #31739: [SPARK-34622][SQL] Fix push down limit through join

Posted by GitBox <gi...@apache.org>.
c21 commented on a change in pull request #31739:
URL: https://github.com/apache/spark/pull/31739#discussion_r587844932



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -597,31 +623,11 @@ object LimitPushDown extends Rule[LogicalPlan] {
     // pushdown Limit.
     case LocalLimit(exp, u: Union) =>
       LocalLimit(exp, u.copy(children = u.children.map(maybePushLocalLimit(exp, _))))
-    // Add extra limits below JOIN:
-    // 1. For LEFT OUTER and RIGHT OUTER JOIN, we push limits to the left and right sides,
-    //    respectively.
-    // 2. For INNER and CROSS JOIN, we push limits to both the left and right sides if join
-    //    condition is empty.
-    // 3. For LEFT SEMI and LEFT ANTI JOIN, we push limits to the left side if join condition
-    //    is empty.
-    // It's not safe to push limits below FULL OUTER JOIN in the general case without a more
-    // invasive rewrite. We also need to ensure that this limit pushdown rule will not eventually
-    // introduce limits on both sides if it is applied multiple times. Therefore:
-    //   - If one side is already limited, stack another limit on top if the new limit is smaller.
-    //     The redundant limit will be collapsed by the CombineLimits rule.
-    case LocalLimit(exp, join @ Join(left, right, joinType, conditionOpt, _)) =>
-      val newJoin = joinType match {
-        case RightOuter => join.copy(right = maybePushLocalLimit(exp, right))
-        case LeftOuter => join.copy(left = maybePushLocalLimit(exp, left))
-        case _: InnerLike if conditionOpt.isEmpty =>
-          join.copy(
-            left = maybePushLocalLimit(exp, left),
-            right = maybePushLocalLimit(exp, right))
-        case LeftSemi | LeftAnti if conditionOpt.isEmpty =>
-          join.copy(left = maybePushLocalLimit(exp, left))
-        case _ => join
-      }
-      LocalLimit(exp, newJoin)
+
+    case LocalLimit(exp, join: Join) =>
+      LocalLimit(exp, pushLocalLimitThroughJoin(exp, join))
+    case LocalLimit(exp, project @ Project(_, join: Join)) =>

Review comment:
       Thanks @wangyum for adding this. I think there might be a list of operators safe to push through besides `Project` - e.g. `Sort`, `RepartitionByExpression`, `ScriptTransformation`, etc.
   
   Shall we add push down through `Project` separately? It should not be only restricted to `Project(Join)` right? It can be `Project(OtherOperator)` as well?
   
   ```
   case LocalLimit(exp, p: Project) =>
     LocalLimit(exp, p.copy(maybePushLocalLimit(exp, _)))
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] wangyum commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
wangyum commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-797628725


   cc @cloud-fan @dongjoon-hyun


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-809654395


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41240/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-791970655


   **[Test build #135827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/135827/testReport)** for PR 31739 at commit [`dc2deba`](https://github.com/apache/spark/commit/dc2debaab79fe720774d428039d5f34ed0f0d384).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #31739: [SPARK-34622][SQL] Push down limit through Project with Join

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #31739:
URL: https://github.com/apache/spark/pull/31739#issuecomment-803452192


   **[Test build #136287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/136287/testReport)** for PR 31739 at commit [`60520e9`](https://github.com/apache/spark/commit/60520e958043f8263daf59ac42117ca1fae243f8).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org