You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by sameeragarwal <gi...@git.apache.org> on 2016/01/20 09:17:38 UTC

[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/10844

    [WIP][SQL] Initial support for constraint propagation in SparkSQL

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark constraints

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10844
    
----
commit 8d050df82225b36d06f5b29fff9df2a49b39f551
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-01-19T08:48:17Z

    initial framework

commit fed48b84e4e40a58bddb1ee742bda738a0b63ab1
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-01-19T19:28:21Z

    Initial set of constraints

commit 76d27275d9cc0452da2d2fd0c765b8fc5d586474
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-01-20T02:03:18Z

    Constraint propagation in Set and Binary operators

commit 67e138d480b97f78e00229b2e407eececd56a599
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-01-20T02:53:57Z

    modify test

commit f7251dd8d6d7b132f094fa47da8a057b73f23631
Author: Sameer Agarwal <sa...@databricks.com>
Date:   2016-01-20T08:04:54Z

    fix tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-175375684
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50161/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173126993
  
    **[Test build #49770 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49770/consoleFull)** for PR 10844 at commit [`f7251dd`](https://github.com/apache/spark/commit/f7251dd8d6d7b132f094fa47da8a057b73f23631).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178329510
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176958046
  
    **[Test build #50404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50404/consoleFull)** for PR 10844 at commit [`8c6bb70`](https://github.com/apache/spark/commit/8c6bb702e90346ec3189857c70e8579d431bb91c).
     * This patch **fails Spark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177051182
  
    Thanks @marmbrus, all comments addressed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177062472
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50429/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51549492
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -157,6 +178,23 @@ case class Union(children: Seq[LogicalPlan]) extends LogicalPlan {
         val sizeInBytes = children.map(_.statistics.sizeInBytes).sum
         Statistics(sizeInBytes = sizeInBytes)
       }
    +
    +  def rewriteConstraints(
    +      planA: LogicalPlan,
    --- End diff --
    
    Maybe give it a more informative name? (how about `reference`?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173126739
  
    **[Test build #49770 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49770/consoleFull)** for PR 10844 at commit [`f7251dd`](https://github.com/apache/spark/commit/f7251dd8d6d7b132f094fa47da8a057b73f23631).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51549804
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +218,29 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    joinType match {
    +      case Inner if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(splitConjunctivePredicates(condition.get).toSet)
    --- End diff --
    
    Seems this `get` may cause problem when `condition` is a `None`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51506239
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +221,46 @@ case class Join(
         }
       }
     
    +  private def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    --- End diff --
    
    That's a great point. Moved the common logic to QueryPlan.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-175354244
  
    **[Test build #50161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50161/consoleFull)** for PR 10844 at commit [`f15ef96`](https://github.com/apache/spark/commit/f15ef96657603b79a815853fba991835fe3ca50f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178291029
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51482277
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +221,46 @@ case class Join(
         }
       }
     
    +  private def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    splitConjunctivePredicates(condition).map {
    +      case EqualTo(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case GreaterThan(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case GreaterThanOrEqual(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case LessThan(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case LessThanOrEqual(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case _ =>
    +        Set.empty[Expression]
    +    }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    +  }
    +
    +  override protected def validConstraints: Set[Expression] = {
    +    joinType match {
    +      case Inner if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(constructIsNotNullConstraints(condition.get))
    --- End diff --
    
    We should also be including the split form of the condition here and below, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51330946
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +229,55 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +      splitConjunctivePredicates(condition).map {
    +        case EqualTo(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case _ =>
    +          Set.empty[Expression]
    +      }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    +    }
    +
    +    def constructIsNullConstraints(plan: LogicalPlan): Set[Expression] = {
    +      plan.output.map(IsNull).toSet
    +    }
    +
    +    (joinType match {
    +      case Inner if condition.isDefined =>
    +        getRelevantConstraints(left)
    +          .union(getRelevantConstraints(right))
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftSemi if condition.isDefined =>
    +        getRelevantConstraints(left)
    +          .union(getRelevantConstraints(right))
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftOuter =>
    +        getRelevantConstraints(left)
    +          .union(constructIsNullConstraints(right))
    +      case RightOuter =>
    +        getRelevantConstraints(right)
    +          .union(constructIsNullConstraints(left))
    +      case FullOuter =>
    +        constructIsNullConstraints(left)
    +          .union(constructIsNullConstraints(right))
    +      case _ =>
    --- End diff --
    
    What types of joins are we not handling?  It might be better to get an exception if we add a new type and its not handled, but I'm not sure.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51068376
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -17,16 +17,31 @@
     
     package org.apache.spark.sql.catalyst.plans
     
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn}
    +import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.trees.TreeNode
     import org.apache.spark.sql.types.{DataType, StructType}
     
    -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {
    +abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
    +  extends TreeNode[PlanType] with PredicateHelper {
       self: PlanType =>
     
       def output: Seq[Attribute]
     
       /**
    +   * Extracts the output property from a given child.
    +   */
    +  def extractConstraintsFromChild(child: QueryPlan[PlanType]): Set[Expression] = {
    --- End diff --
    
    `protected`?
    
    Also I'm not sure I get the scala doc.  Maybe `getReleventContraints` is a better name?  It is taking the constraints and removing those that don't apply anymore because we removed columns right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178280862
  
    @marmbrus comments addressed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51330321
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -17,16 +17,34 @@
     
     package org.apache.spark.sql.catalyst.plans
     
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn}
    +import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.trees.TreeNode
     import org.apache.spark.sql.types.{DataType, StructType}
     
    -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {
    +abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
    +  extends TreeNode[PlanType] with PredicateHelper {
       self: PlanType =>
     
       def output: Seq[Attribute]
     
       /**
    +   * Extracts the relevant per-row constraints from a given child while removing those that
    +   * don't apply anymore.
    +   */
    +  protected def getRelevantConstraints(child: QueryPlan[PlanType]): Set[Expression] = {
    +    child.constraints.filter(_.references.subsetOf(outputSet))
    +  }
    +
    +  /**
    +   * A sequence of expressions that describes the data property of the output rows of this
    +   * operator. For example, if the output of this operator is column `a`, an example `constraints`
    +   * can be `Set(a > 10, a < 20)`.
    +   */
    +  lazy val constraints: Set[Expression] = validConstraints
    --- End diff --
    
    I would consider making this `getRelevantConstraints(validConstraints)` so that each implementor of `validContraints` does't have to remember to do the filter / canonicalization.  They can just focus on augmenting or passing through constraints from children based on the operators semantics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178902501
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50603/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51647631
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -301,10 +301,12 @@ abstract class LeafNode extends LogicalPlan {
     /**
      * A logical plan node with single child.
      */
    -abstract class UnaryNode extends LogicalPlan {
    +abstract class UnaryNode extends LogicalPlan with PredicateHelper {
    --- End diff --
    
    Sounds good, added this only for filter and join that needed `splitConjunctivePredicates`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-175375555
  
    **[Test build #50161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50161/consoleFull)** for PR 10844 at commit [`f15ef96`](https://github.com/apache/spark/commit/f15ef96657603b79a815853fba991835fe3ca50f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177034115
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50417/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-175375683
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173126997
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177062470
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51331081
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +229,55 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +      splitConjunctivePredicates(condition).map {
    +        case EqualTo(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case _ =>
    +          Set.empty[Expression]
    +      }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    --- End diff --
    
    Is that `toSet` required?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176671555
  
    **[Test build #50371 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50371/consoleFull)** for PR 10844 at commit [`53be837`](https://github.com/apache/spark/commit/53be8379ef7f2b86bcd4398361b6748abe2800d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51068566
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -146,6 +172,26 @@ case class Union(children: Seq[LogicalPlan]) extends LogicalPlan {
         val sizeInBytes = children.map(_.statistics.sizeInBytes).sum
         Statistics(sizeInBytes = sizeInBytes)
       }
    +
    +  override def extractConstraintsFromChild(child: QueryPlan[LogicalPlan]): Set[Expression] = {
    +    child.constraints.filter(_.references.subsetOf(child.outputSet))
    +  }
    +
    +  def rewriteConstraints(
    +      planA: LogicalPlan,
    +      planB: LogicalPlan,
    +      constraints: Set[Expression]): Set[Expression] = {
    +    require(planA.output.size == planB.output.size)
    +    val attributeRewrites = AttributeMap(planB.output.zip(planA.output))
    +    constraints.map(_ transform {
    +      case a: Attribute => attributeRewrites(a)
    +    })
    +  }
    +
    +  override def constraints: Set[Expression] = {
    +    children.map(child => rewriteConstraints(children.head, child,
    +      extractConstraintsFromChild(child))).reduce(_ intersect _)
    --- End diff --
    
    same style nit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178873147
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50602/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173166056
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49775/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51506200
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -89,9 +89,27 @@ case class Generate(
     
     case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode {
       override def output: Seq[Attribute] = child.output
    +
    +  override protected def validConstraints: Set[Expression] = {
    +    val newConstraint = splitConjunctivePredicates(condition)
    +      .filter(_.references.subsetOf(outputSet))
    --- End diff --
    
    yes, removed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-174926676
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50083/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177034114
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51068499
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -88,6 +88,12 @@ case class Generate(
     
     case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode {
       override def output: Seq[Attribute] = child.output
    +
    +  override def constraints: Set[Expression] = {
    +    val newConstraint = splitConjunctivePredicates(condition).filter(
    +      _.references.subsetOf(outputSet)).toSet
    --- End diff --
    
    style nit: we typically avoid breaking in the middle of a function call and instead prefer to break in between calls (always pick the highest syntactic level)
    
    ```scala
    val newConstraint = splitConjunctivePredicates(condition)
      .filter(_.references.subsetOf(outputSet))
      .toSet
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51330688
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +229,55 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    --- End diff --
    
    We typically avoid nested methods unless you are actually closing over state from the nested scope in the function.  I'd consider just making this a private method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51550418
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -17,16 +17,62 @@
     
     package org.apache.spark.sql.catalyst.plans
     
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn}
    +import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.trees.TreeNode
     import org.apache.spark.sql.types.{DataType, StructType}
     
    -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {
    +abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
    +  extends TreeNode[PlanType] with PredicateHelper {
       self: PlanType =>
     
       def output: Seq[Attribute]
     
       /**
    +   * Extracts the relevant constraints from a given set of constraints based on the attributes that
    +   * appear in the [[outputSet]].
    +   */
    +  protected def getRelevantConstraints(constraints: Set[Expression]): Set[Expression] = {
    +    constraints
    +      .union(constructIsNotNullConstraints(constraints))
    +      .filter(constraint =>
    +        constraint.references.nonEmpty && constraint.references.subsetOf(outputSet))
    +  }
    +
    +  private def constructIsNotNullConstraints(constraints: Set[Expression]): Set[Expression] = {
    --- End diff --
    
    Add scala doc?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173127001
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/49770/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176958115
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51338060
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -17,16 +17,34 @@
     
     package org.apache.spark.sql.catalyst.plans
     
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn}
    +import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.trees.TreeNode
     import org.apache.spark.sql.types.{DataType, StructType}
     
    -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {
    +abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
    +  extends TreeNode[PlanType] with PredicateHelper {
       self: PlanType =>
     
       def output: Seq[Attribute]
     
       /**
    +   * Extracts the relevant per-row constraints from a given child while removing those that
    +   * don't apply anymore.
    +   */
    +  protected def getRelevantConstraints(child: QueryPlan[PlanType]): Set[Expression] = {
    +    child.constraints.filter(_.references.subsetOf(outputSet))
    +  }
    +
    +  /**
    +   * A sequence of expressions that describes the data property of the output rows of this
    +   * operator. For example, if the output of this operator is column `a`, an example `constraints`
    +   * can be `Set(a > 10, a < 20)`.
    +   */
    +  lazy val constraints: Set[Expression] = validConstraints
    --- End diff --
    
    Great idea, added.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173134987
  
    **[Test build #49775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49775/consoleFull)** for PR 10844 at commit [`04ff99a`](https://github.com/apache/spark/commit/04ff99ab96957f57c74ae24835b4bfcdd27e06b8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51549824
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +218,29 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    joinType match {
    +      case Inner if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(splitConjunctivePredicates(condition.get).toSet)
    +      case LeftSemi if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(splitConjunctivePredicates(condition.get).toSet)
    --- End diff --
    
    Same here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-174926239
  
    **[Test build #50083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50083/consoleFull)** for PR 10844 at commit [`7fb2f9c`](https://github.com/apache/spark/commit/7fb2f9ce01aafe45d720212cb13120eb693a6c06).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51338436
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +229,55 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +      splitConjunctivePredicates(condition).map {
    +        case EqualTo(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case _ =>
    +          Set.empty[Expression]
    +      }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    --- End diff --
    
    had to add it to make the compiler happy :sob: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177017939
  
    **[Test build #50417 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50417/consoleFull)** for PR 10844 at commit [`8c6bb70`](https://github.com/apache/spark/commit/8c6bb702e90346ec3189857c70e8579d431bb91c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51330387
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -17,16 +17,34 @@
     
     package org.apache.spark.sql.catalyst.plans
     
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn}
    +import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.trees.TreeNode
     import org.apache.spark.sql.types.{DataType, StructType}
     
    -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {
    +abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
    +  extends TreeNode[PlanType] with PredicateHelper {
       self: PlanType =>
     
       def output: Seq[Attribute]
     
       /**
    +   * Extracts the relevant per-row constraints from a given child while removing those that
    +   * don't apply anymore.
    +   */
    +  protected def getRelevantConstraints(child: QueryPlan[PlanType]): Set[Expression] = {
    +    child.constraints.filter(_.references.subsetOf(outputSet))
    +  }
    +
    +  /**
    +   * A sequence of expressions that describes the data property of the output rows of this
    +   * operator. For example, if the output of this operator is column `a`, an example `constraints`
    +   * can be `Set(a > 10, a < 20)`.
    +   */
    +  lazy val constraints: Set[Expression] = validConstraints
    +
    +  protected def validConstraints: Set[Expression] = Set.empty
    --- End diff --
    
    Can we add doc here suggesting the child classes override this method along with the semantics (i.e. if we go with the suggestion above you are supposed to output any constraint that might be valid and it will canonicalized and filtered automatically).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178873146
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51549912
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +218,29 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    joinType match {
    +      case Inner if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(splitConjunctivePredicates(condition.get).toSet)
    +      case LeftSemi if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    --- End diff --
    
    For left semi join, maybe it is not necessary to union the constrains from the right side since we will not output any columns from the right table?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-179032881
  
    Thanks!  Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173787321
  
    @hvanhovell added, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173275866
  
    Could you add the id of the JIRA ticket to the titlle?
    
    Could you also add a description, explaining why we want this? Seems cool though!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173165774
  
    **[Test build #49775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/49775/consoleFull)** for PR 10844 at commit [`04ff99a`](https://github.com/apache/spark/commit/04ff99ab96957f57c74ae24835b4bfcdd27e06b8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178902239
  
    **[Test build #50603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50603/consoleFull)** for PR 10844 at commit [`2bd2735`](https://github.com/apache/spark/commit/2bd27356a58eefc52b1f9529e7c708e7cbc4beee).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] `
      * `abstract class UnaryNode extends LogicalPlan `
      * `case class Filter(condition: Expression, child: LogicalPlan)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51548751
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala ---
    @@ -301,10 +301,12 @@ abstract class LeafNode extends LogicalPlan {
     /**
      * A logical plan node with single child.
      */
    -abstract class UnaryNode extends LogicalPlan {
    +abstract class UnaryNode extends LogicalPlan with PredicateHelper {
    --- End diff --
    
    Looks like we do not need `PredicateHelper` at here? Maybe it is better to just `with PredicateHelper` for `Filter`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51482488
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.plans
    +
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.sql.catalyst.analysis._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.logical._
    +
    +class ConstraintPropagationSuite extends SparkFunSuite {
    +
    +  private def resolveColumn(tr: LocalRelation, columnName: String): Expression =
    +    tr.analyze.resolveQuoted(columnName, caseInsensitiveResolution).get
    +
    +  private def verifyConstraints(a: Set[Expression], b: Set[Expression]): Unit = {
    +    assert(a.forall(i => b.map(_.semanticEquals(i)).reduce(_ || _)))
    +    assert(b.forall(i => a.map(_.semanticEquals(i)).reduce(_ || _)))
    --- End diff --
    
    I would make this function manually call `fail` with the condition that we can't find, and also differentiate between `missing` and `found but not expected`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51338405
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +229,55 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +      splitConjunctivePredicates(condition).map {
    +        case EqualTo(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case _ =>
    +          Set.empty[Expression]
    +      }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    +    }
    +
    +    def constructIsNullConstraints(plan: LogicalPlan): Set[Expression] = {
    +      plan.output.map(IsNull).toSet
    +    }
    +
    +    (joinType match {
    +      case Inner if condition.isDefined =>
    +        getRelevantConstraints(left)
    +          .union(getRelevantConstraints(right))
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftSemi if condition.isDefined =>
    +        getRelevantConstraints(left)
    +          .union(getRelevantConstraints(right))
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftOuter =>
    +        getRelevantConstraints(left)
    +          .union(constructIsNullConstraints(right))
    +      case RightOuter =>
    +        getRelevantConstraints(right)
    +          .union(constructIsNullConstraints(left))
    --- End diff --
    
    Your concern is completely valid -- I was abusing the `isNull` constraint to infer the latter (i.e., `might be null`). Removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178291758
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51330838
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +229,55 @@ case class Join(
         }
       }
     
    +  override protected def validConstraints: Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +      splitConjunctivePredicates(condition).map {
    +        case EqualTo(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case GreaterThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThan(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case LessThanOrEqual(l, r) =>
    +          Set(IsNotNull(l), IsNotNull(r))
    +        case _ =>
    +          Set.empty[Expression]
    +      }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    +    }
    +
    +    def constructIsNullConstraints(plan: LogicalPlan): Set[Expression] = {
    +      plan.output.map(IsNull).toSet
    +    }
    +
    +    (joinType match {
    +      case Inner if condition.isDefined =>
    +        getRelevantConstraints(left)
    +          .union(getRelevantConstraints(right))
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftSemi if condition.isDefined =>
    +        getRelevantConstraints(left)
    +          .union(getRelevantConstraints(right))
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftOuter =>
    +        getRelevantConstraints(left)
    +          .union(constructIsNullConstraints(right))
    +      case RightOuter =>
    +        getRelevantConstraints(right)
    +          .union(constructIsNullConstraints(left))
    --- End diff --
    
    Maybe I'm not following what this does, but we don't know that the columns from the left _are_ `null`, they just might be `null`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176697402
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50371/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176951259
  
    **[Test build #50404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50404/consoleFull)** for PR 10844 at commit [`8c6bb70`](https://github.com/apache/spark/commit/8c6bb702e90346ec3189857c70e8579d431bb91c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178329512
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50518/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178295577
  
    **[Test build #50518 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50518/consoleFull)** for PR 10844 at commit [`b52742a`](https://github.com/apache/spark/commit/b52742a93581b6aa899b652cf1a346960a2888b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178868815
  
    comments addressed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SQL] Initial support for constraint prop...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173166051
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178902496
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178291030
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50514/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176948895
  
    Thanks @marmbrus, all comments addressed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10844


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51506264
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/ConstraintPropagationSuite.scala ---
    @@ -0,0 +1,138 @@
    +/*
    + * Licensed to the Apache Software Foundation (ASF) under one or more
    + * contributor license agreements.  See the NOTICE file distributed with
    + * this work for additional information regarding copyright ownership.
    + * The ASF licenses this file to You under the Apache License, Version 2.0
    + * (the "License"); you may not use this file except in compliance with
    + * the License.  You may obtain a copy of the License at
    + *
    + *    http://www.apache.org/licenses/LICENSE-2.0
    + *
    + * Unless required by applicable law or agreed to in writing, software
    + * distributed under the License is distributed on an "AS IS" BASIS,
    + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    + * See the License for the specific language governing permissions and
    + * limitations under the License.
    + */
    +
    +package org.apache.spark.sql.catalyst.plans
    +
    +import org.apache.spark.SparkFunSuite
    +import org.apache.spark.sql.catalyst.analysis._
    +import org.apache.spark.sql.catalyst.dsl.expressions._
    +import org.apache.spark.sql.catalyst.dsl.plans._
    +import org.apache.spark.sql.catalyst.expressions._
    +import org.apache.spark.sql.catalyst.plans.logical._
    +
    +class ConstraintPropagationSuite extends SparkFunSuite {
    +
    +  private def resolveColumn(tr: LocalRelation, columnName: String): Expression =
    +    tr.analyze.resolveQuoted(columnName, caseInsensitiveResolution).get
    +
    +  private def verifyConstraints(a: Set[Expression], b: Set[Expression]): Unit = {
    +    assert(a.forall(i => b.map(_.semanticEquals(i)).reduce(_ || _)))
    +    assert(b.forall(i => a.map(_.semanticEquals(i)).reduce(_ || _)))
    --- End diff --
    
    sure, good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51647661
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -157,6 +178,23 @@ case class Union(children: Seq[LogicalPlan]) extends LogicalPlan {
         val sizeInBytes = children.map(_.statistics.sizeInBytes).sum
         Statistics(sizeInBytes = sizeInBytes)
       }
    +
    +  def rewriteConstraints(
    +      planA: LogicalPlan,
    --- End diff --
    
    renamed + added doc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-174883150
  
    **[Test build #50083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50083/consoleFull)** for PR 10844 at commit [`7fb2f9c`](https://github.com/apache/spark/commit/7fb2f9ce01aafe45d720212cb13120eb693a6c06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-173796157
  
    @sameeragarwal I just saw it. Thank you! I will do the code changes after this is merged. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51068802
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -17,16 +17,31 @@
     
     package org.apache.spark.sql.catalyst.plans
     
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn}
    +import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.trees.TreeNode
     import org.apache.spark.sql.types.{DataType, StructType}
     
    -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {
    +abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
    +  extends TreeNode[PlanType] with PredicateHelper {
       self: PlanType =>
     
       def output: Seq[Attribute]
     
       /**
    +   * Extracts the output property from a given child.
    +   */
    +  def extractConstraintsFromChild(child: QueryPlan[PlanType]): Set[Expression] = {
    +    child.constraints.filter(_.references.subsetOf(outputSet))
    +  }
    +
    +  /**
    +   * An sequence of expressions that describes the data property of the output rows of this
    +   * operator. For example, if the output of this operator is column `a`, an example `constraints`
    +   * can be `Set(a > 10, a < 20)`.
    +   */
    +  def constraints: Set[Expression] = Set.empty
    --- End diff --
    
    This is probably going to be nontrivial to calculate for a large tree.  We might consider having an internal method, `private def validConstraints` or something, that we expand / canonicalize into a `lazy val constraints`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178873821
  
    **[Test build #50603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50603/consoleFull)** for PR 10844 at commit [`2bd2735`](https://github.com/apache/spark/commit/2bd27356a58eefc52b1f9529e7c708e7cbc4beee).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177053633
  
    **[Test build #50429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50429/consoleFull)** for PR 10844 at commit [`302444f`](https://github.com/apache/spark/commit/302444f5d7940c1e0327fe4826453c1b628a97b9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177062420
  
    **[Test build #50429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50429/consoleFull)** for PR 10844 at commit [`302444f`](https://github.com/apache/spark/commit/302444f5d7940c1e0327fe4826453c1b628a97b9).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176696953
  
    **[Test build #50371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50371/consoleFull)** for PR 10844 at commit [`53be837`](https://github.com/apache/spark/commit/53be8379ef7f2b86bcd4398361b6748abe2800d8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177015688
  
    test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51482050
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -89,9 +89,27 @@ case class Generate(
     
     case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode {
       override def output: Seq[Attribute] = child.output
    +
    +  override protected def validConstraints: Set[Expression] = {
    +    val newConstraint = splitConjunctivePredicates(condition)
    +      .filter(_.references.subsetOf(outputSet))
    --- End diff --
    
    not needed right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51647871
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala ---
    @@ -17,16 +17,62 @@
     
     package org.apache.spark.sql.catalyst.plans
     
    -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn}
    +import org.apache.spark.sql.catalyst.expressions._
     import org.apache.spark.sql.catalyst.trees.TreeNode
     import org.apache.spark.sql.types.{DataType, StructType}
     
    -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] {
    +abstract class QueryPlan[PlanType <: TreeNode[PlanType]]
    +  extends TreeNode[PlanType] with PredicateHelper {
       self: PlanType =>
     
       def output: Seq[Attribute]
     
       /**
    +   * Extracts the relevant constraints from a given set of constraints based on the attributes that
    +   * appear in the [[outputSet]].
    +   */
    +  protected def getRelevantConstraints(constraints: Set[Expression]): Set[Expression] = {
    +    constraints
    +      .union(constructIsNotNullConstraints(constraints))
    +      .filter(constraint =>
    +        constraint.references.nonEmpty && constraint.references.subsetOf(outputSet))
    +  }
    +
    +  private def constructIsNotNullConstraints(constraints: Set[Expression]): Set[Expression] = {
    --- End diff --
    
    added


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-177033963
  
    **[Test build #50417 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50417/consoleFull)** for PR 10844 at commit [`8c6bb70`](https://github.com/apache/spark/commit/8c6bb702e90346ec3189857c70e8579d431bb91c).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-174926671
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51478445
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +221,46 @@ case class Join(
         }
       }
     
    +  private def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    splitConjunctivePredicates(condition).map {
    +      case EqualTo(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case GreaterThan(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case GreaterThanOrEqual(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case LessThan(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case LessThanOrEqual(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case _ =>
    +        Set.empty[Expression]
    +    }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    +  }
    +
    +  override protected def validConstraints: Set[Expression] = {
    +    joinType match {
    +      case Inner if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftSemi if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftOuter =>
    +        left.constraints
    +      case RightOuter =>
    +        right.constraints
    +      case FullOuter =>
    +        Set.empty[Expression]
    +    }
    +  }
    +
    +  def selfJoinResolved: Boolean = left.outputSet.intersect(right.outputSet).isEmpty
    --- End diff --
    
    I think this was a merging mistake as its duplicated with the method below.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51482081
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +221,46 @@ case class Join(
         }
       }
     
    +  private def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    --- End diff --
    
    Why is this done as a special case here, instead of doing it as part of `getRelevantConstraints`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-178328771
  
    **[Test build #50518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50518/consoleFull)** for PR 10844 at commit [`b52742a`](https://github.com/apache/spark/commit/b52742a93581b6aa899b652cf1a346960a2888b7).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176958116
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50404/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10844#issuecomment-176697399
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12957][SQL] Initial support for constra...

Posted by sameeragarwal <gi...@git.apache.org>.
Github user sameeragarwal commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10844#discussion_r51490255
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala ---
    @@ -180,6 +221,46 @@ case class Join(
         }
       }
     
    +  private def constructIsNotNullConstraints(condition: Expression): Set[Expression] = {
    +    // Currently we only propagate constraints if the condition consists of equality
    +    // and ranges. For all other cases, we return an empty set of constraints
    +    splitConjunctivePredicates(condition).map {
    +      case EqualTo(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case GreaterThan(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case GreaterThanOrEqual(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case LessThan(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case LessThanOrEqual(l, r) =>
    +        Set(IsNotNull(l), IsNotNull(r))
    +      case _ =>
    +        Set.empty[Expression]
    +    }.foldLeft(Set.empty[Expression])(_ union _.toSet)
    +  }
    +
    +  override protected def validConstraints: Set[Expression] = {
    +    joinType match {
    +      case Inner if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftSemi if condition.isDefined =>
    +        left.constraints
    +          .union(right.constraints)
    +          .union(constructIsNotNullConstraints(condition.get))
    +      case LeftOuter =>
    +        left.constraints
    +      case RightOuter =>
    +        right.constraints
    +      case FullOuter =>
    +        Set.empty[Expression]
    +    }
    +  }
    +
    +  def selfJoinResolved: Boolean = left.outputSet.intersect(right.outputSet).isEmpty
    --- End diff --
    
    oops, fixed!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org