You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by KaiXinXiaoLei <gi...@git.apache.org> on 2018/02/25 07:21:03 UTC

[GitHub] spark pull request #20670: add constranits

GitHub user KaiXinXiaoLei opened a pull request:

    https://github.com/apache/spark/pull/20670

    add constranits

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    I run a sql: `select ls.cs_order_number from ls left semi join catalog_sales cs on ls.cs_order_number = cs.cs_order_number`, The `ls` table is a small table ,and the number is one. The `catalog_sales` table is a big table,  and the number is 10 billion. The task will be hang up. And i find the many null values of `cs_order_number` in the `catalog_sales` table. I think the null value should be removed in the logical plan.
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/KaiXinXiaoLei/spark Spark-23405

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/20670.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20670
    
----
commit 705ed462bb307871e65199ce02576f12d60d2176
Author: KaiXinXiaoLei <58...@...>
Date:   2018-02-25T06:06:39Z

    add constranits

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87727 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87727/testReport)** for PR 20670 at commit [`f7d764e`](https://github.com/apache/spark/commit/f7d764efa435327ba34e829da53c16a6ec16f403).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87727 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87727/testReport)** for PR 20670 at commit [`f7d764e`](https://github.com/apache/spark/commit/f7d764efa435327ba34e829da53c16a6ec16f403).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87766/testReport)** for PR 20670 at commit [`b3f2ade`](https://github.com/apache/spark/commit/b3f2ade5f1dc2ad3349f4dc21fe353590e8bbbfd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1035/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1140/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87817/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1144/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1109/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87691/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Also, a better title for this PR would be:
    ```
    Generate additional constraints for Join's children
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: add constranits

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87648 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87648/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    LGTM


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Add constranits

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r170529019
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
        */
       lazy val constraints: ExpressionSet = {
         if (conf.constraintPropagationEnabled) {
    +      var relevantOutPutSet: AttributeSet = outputSet
    +      constraints.foreach {
    +        case eq @ EqualTo(l: Attribute, r: Attribute) =>
    +          if (l.references.subsetOf(relevantOutPutSet)
    --- End diff --
    
    You can avoid computing each `subsetOf` twice here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1190/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87648/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    thanks, merging to master!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87726/testReport)** for PR 20670 at commit [`1e0f78a`](https://github.com/apache/spark/commit/1e0f78a50bd70a3f94382887a74cc70f7fefe3c6).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87772/testReport)** for PR 20670 at commit [`ed5c170`](https://github.com/apache/spark/commit/ed5c170c35d8786df241921ac19d95520ace3836).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Add constranits

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r170529062
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
        */
       lazy val constraints: ExpressionSet = {
         if (conf.constraintPropagationEnabled) {
    +      var relevantOutPutSet: AttributeSet = outputSet
    +      constraints.foreach {
    +        case eq @ EqualTo(l: Attribute, r: Attribute) =>
    +          if (l.references.subsetOf(relevantOutPutSet)
    +            && !r.references.subsetOf(relevantOutPutSet)) {
    +            relevantOutPutSet = relevantOutPutSet.++(r.references)
    --- End diff --
    
    Use ` ++ ` syntax, rather than write it as a method invocation.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171182870
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala ---
    @@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
     
         comparePlans(Optimize.execute(original.analyze), correct.analyze)
       }
    +
    +  test("SPARK-23405:single left-semi join, filter out nulls on either side on equi-join keys") {
    +    val x = testRelation.subquery('x)
    +    val y = testRelation.subquery('y)
    +    val originalQuery = x.join(y, LeftSemi,
    +      condition = Some("x.a".attr === "y.a".attr)).analyze
    --- End diff --
    
    nit: we can create a `val condition = Some("x.a".attr === "y.a".attr)` to reduce duplicated code


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1177/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1168/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87691 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87691/testReport)** for PR 20670 at commit [`f44a92a`](https://github.com/apache/spark/commit/f44a92ad20895a94577cf2b4de54fc320b0f934b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171201415
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -22,21 +22,30 @@ import org.apache.spark.sql.catalyst.expressions._
     
     trait QueryPlanConstraints { self: LogicalPlan =>
     
    +  /**
    +   * An [[ExpressionSet]] that contains an additional set of constraints about equality
    --- End diff --
    
    The comment is not acute, we may have various kinds of constraints.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87772 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87772/testReport)** for PR 20670 at commit [`ed5c170`](https://github.com/apache/spark/commit/ed5c170c35d8786df241921ac19d95520ace3836).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Good catch! This is a real problem, but the fix looks hacky.
    
    By definition, I think `plan.contraints` should only include constraints that refer to `plan.output`, as that's the promise a plan can make to its parent. However, join is special as `Join.condition` can refer to both of the join sides, and we add the constraints to `Join.condition`, which is kind of we are making a promise to Join's children, not parent. My proposal:
    ```
      lazy val constraints: ExpressionSet = {
        if (conf.constraintPropagationEnabled) {
          allConstraints.filter { c =>
            c.references.nonEmpty && c.references.subsetOf(outputSet) && c.deterministic
          }
        } else {
          ExpressionSet(Set.empty)
        }
      }
    
      lazy val allConstraints = ExpressionSet(validConstraints
              .union(inferAdditionalConstraints(validConstraints))
              .union(constructIsNotNullConstraints(validConstraints)))
    ```
    Then we can call `plan.allConstraints` when inferring contraints for join.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87804 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87804/testReport)** for PR 20670 at commit [`023f2f7`](https://github.com/apache/spark/commit/023f2f709db484d82cde22b00db0bad33ac72279).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    @gatorsmile   thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171102033
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -23,20 +23,23 @@ import org.apache.spark.sql.catalyst.expressions._
     trait QueryPlanConstraints { self: LogicalPlan =>
     
       /**
    -   * An [[ExpressionSet]] that contains invariants about the rows output by this operator. For
    -   * example, if this set contains the expression `a = 2` then that expression is guaranteed to
    -   * evaluate to `true` for all rows produced.
    -   */
    +    * An [[ExpressionSet]] that contains an additional set of constraints about equality
    +    * constraints and `isNotNull` constraints.
    +    */
    +  lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
    --- End diff --
    
    This should also be guarded by `constraintPropagationEnabled`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    @cloud-fan @srowen  @jiangxb1987  i have changed the code and title , please help me review. Thanks.
    
     


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87836/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Add constranits

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r170528989
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -29,12 +29,26 @@ trait QueryPlanConstraints { self: LogicalPlan =>
        */
       lazy val constraints: ExpressionSet = {
         if (conf.constraintPropagationEnabled) {
    +      var relevantOutPutSet: AttributeSet = outputSet
    +      constraints.foreach {
    +        case eq @ EqualTo(l: Attribute, r: Attribute) =>
    --- End diff --
    
    `eq` isn't used


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87648 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87648/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171463022
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._
     
     trait QueryPlanConstraints { self: LogicalPlan =>
     
    +  /**
    +   * An [[ExpressionSet]] that contains an additional set of constraints, such as equality
    +   * constraints and `isNotNull` constraints, etc.
    +   */
    +  lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
    --- End diff --
    
    We still need `if (conf.constraintPropagationEnabled)`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1110/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171276798
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala ---
    @@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
     
         comparePlans(Optimize.execute(original.analyze), correct.analyze)
       }
    +
    +  test("SPARK-23405: left-semi equal-join should filter out null join keys on both sides") {
    +    val x = testRelation.subquery('x)
    +    val y = testRelation.subquery('y)
    +    val condition = Some("x.a".attr === "y.a".attr)
    +    val originalQuery = x.join(y, LeftSemi, condition).analyze
    +    val left = x.where(IsNotNull('a))
    +    val right = y.where(IsNotNull('a))
    +    val correctAnswer = left.join(right, LeftSemi, condition)
    +        .analyze
    --- End diff --
    
    this doesn't need to be in a new line


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87804 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87804/testReport)** for PR 20670 at commit [`023f2f7`](https://github.com/apache/spark/commit/023f2f709db484d82cde22b00db0bad33ac72279).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    LGTM only nits


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87772/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/20670


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r170840898
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -27,16 +27,15 @@ trait QueryPlanConstraints { self: LogicalPlan =>
        * example, if this set contains the expression `a = 2` then that expression is guaranteed to
        * evaluate to `true` for all rows produced.
    --- End diff --
    
    The comment belongs to `constraints` not `allConstraints`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171182102
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -22,21 +22,30 @@ import org.apache.spark.sql.catalyst.expressions._
     
     trait QueryPlanConstraints { self: LogicalPlan =>
     
    +  /**
    +   * An [[ExpressionSet]] that contains an additional set of constraints about equality
    +   * constraints and `isNotNull` constraints.
    +   */
    +  lazy val allConstraints: ExpressionSet = {
    +    if (conf.constraintPropagationEnabled) {
    +      ExpressionSet(validConstraints
    +        .union(inferAdditionalConstraints(validConstraints))
    +        .union(constructIsNotNullConstraints(validConstraints)))
    +    } else {
    +      ExpressionSet(Set.empty)
    +    }
    +  }
    +
       /**
        * An [[ExpressionSet]] that contains invariants about the rows output by this operator. For
        * example, if this set contains the expression `a = 2` then that expression is guaranteed to
        * evaluate to `true` for all rows produced.
        */
       lazy val constraints: ExpressionSet = {
         if (conf.constraintPropagationEnabled) {
    --- End diff --
    
    now we don't need this if.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    This is still lacking detail about 'why'. It's not my area either. I think you should not have reopened this. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171462811
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala ---
    @@ -22,21 +22,24 @@ import org.apache.spark.sql.catalyst.expressions._
     
     trait QueryPlanConstraints { self: LogicalPlan =>
     
    +  /**
    +   * An [[ExpressionSet]] that contains an additional set of constraints, such as equality
    +   * constraints and `isNotNull` constraints, etc.
    +   */
    +  lazy val allConstraints: ExpressionSet = ExpressionSet(validConstraints
    +        .union(inferAdditionalConstraints(validConstraints))
    +        .union(constructIsNotNullConstraints(validConstraints)))
    --- End diff --
    
    Nit: indents


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87727/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87766/testReport)** for PR 20670 at commit [`b3f2ade`](https://github.com/apache/spark/commit/b3f2ade5f1dc2ad3349f4dc21fe353590e8bbbfd).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    LGTM except several minor comments



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87836/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87726/testReport)** for PR 20670 at commit [`1e0f78a`](https://github.com/apache/spark/commit/1e0f78a50bd70a3f94382887a74cc70f7fefe3c6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #20670: [SPARK-23405] Generate additional constraints for...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/20670#discussion_r171182439
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/InferFiltersFromConstraintsSuite.scala ---
    @@ -192,4 +192,17 @@ class InferFiltersFromConstraintsSuite extends PlanTest {
     
         comparePlans(Optimize.execute(original.analyze), correct.analyze)
       }
    +
    +  test("SPARK-23405:single left-semi join, filter out nulls on either side on equi-join keys") {
    --- End diff --
    
    nit: `SPARK-23405: left-semi equa-join should filter out null join keys on both sides`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87651 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87651/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Agree with that @cloud-fan proposed to have constraints for a plan and the children. However, that requires a relative wider change as well as a find set of test cases, please don't be hesitate to ask for help if you run into any issues working on this.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87766/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by kiszk <gi...@git.apache.org>.
Github user kiszk commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87817/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).
     * This patch **fails from timeout after a configured wait of \`300m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87726/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87691 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87691/testReport)** for PR 20670 at commit [`f44a92a`](https://github.com/apache/spark/commit/f44a92ad20895a94577cf2b4de54fc320b0f934b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87651/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87651 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87651/testReport)** for PR 20670 at commit [`705ed46`](https://github.com/apache/spark/commit/705ed462bb307871e65199ce02576f12d60d2176).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    LGTM except we should add a test


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    @srowen i redescribe the problem. Now i hive a small table `ls` with one row , and a big table `catalog_sales` with One hundred billion rows. And in the big table, the non null value about `cs_order_number`  field has  one million.  
    Then i join this tables with the query:`select ls.cs_order_number from ls left semi join catalog_sales cs on ls.cs_order_number = cs.cs_order_number`. My job is running, and there has been a data skew. Then i find the null value cause this phenomenon. 
    The join condition is `ls.cs_order_number = cs.cs_order_number`.  In the Optimized Logical Plan, the left table has "Filter isnotnull(cs_order_number#1)" action, so i think the right table should have “Filter isnotnull” action. Then the right table will filter null value firstly , and  join with left table secondly. So the data skew will not be caused by null value.
     Using this idea, my sql runs success.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    **[Test build #87817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87817/testReport)** for PR 20670 at commit [`709ed39`](https://github.com/apache/spark/commit/709ed39052a032d0dc2258b2c637ab107d4b4df7).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1038/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by jiangxb1987 <gi...@git.apache.org>.
Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    You shall also add test cases.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87836/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1079/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Generate additional constraints for Join's...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87804/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    @SparkQA i think this error is not caused by my patch. please ok to test.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #20670: [SPARK-23405] Add constranits

Posted by KaiXinXiaoLei <gi...@git.apache.org>.
Github user KaiXinXiaoLei commented on the issue:

    https://github.com/apache/spark/pull/20670
  
    @srowen   @wangyum help me review, thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org