You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2015/12/23 12:28:50 UTC

[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/10445

    [SPARK-12498][SQL][MINOR] BooleanSimplication simplification

    Scala syntax and existing Catalyst expression DSL allows us to refactor the `BooleanSimplification` optimization rule to a clearer and more readable way. Actually you hardly need any extra comments to understand most of the transformations.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark boolean-simplification-simplification

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/10445.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #10445
    
----
commit 41abd4e22b69d2c96e95eb5e4bd329b195731f1b
Author: Cheng Lian <li...@databricks.com>
Date:   2015-12-23T11:03:28Z

    BooleanSimplication simplification

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167915927
  
    It's much more readable! LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167960210
  
    **[Test build #48471 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48471/consoleFull)** for PR 10445 at commit [`d6ab482`](https://github.com/apache/spark/commit/d6ab482b500d70937b3f37f2593e40cd22a03a09).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167974984
  
    **[Test build #48471 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48471/consoleFull)** for PR 10445 at commit [`d6ab482`](https://github.com/apache/spark/commit/d6ab482b500d70937b3f37f2593e40cd22a03a09).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-166891869
  
    **[Test build #48237 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48237/consoleFull)** for PR 10445 at commit [`41abd4e`](https://github.com/apache/spark/commit/41abd4e22b69d2c96e95eb5e4bd329b195731f1b).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167960416
  
    Thanks for the review! I reverted the DSL part and those changes that are not obviously correct.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167956491
  
    It seems to me the change is not necessary and might actually complicate the analyzer code itself due to the reliance on the dsl implicits and this "unapply" (I didn't even know it existed.....)
    
    One thing I do like though is the True and False, although I'd rename it TrueLiteral and FalseLiteral, in order to make it clear it is a literal.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-166870629
  
    **[Test build #48237 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48237/consoleFull)** for PR 10445 at commit [`41abd4e`](https://github.com/apache/spark/commit/41abd4e22b69d2c96e95eb5e4bd329b195731f1b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167975046
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167975047
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48471/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/10445


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-170765275
  
    I'm going to merge this. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/10445#discussion_r48592772
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala ---
    @@ -473,112 +474,97 @@ object OptimizeIn extends Rule[LogicalPlan] {
     object BooleanSimplification extends Rule[LogicalPlan] with PredicateHelper {
       def apply(plan: LogicalPlan): LogicalPlan = plan transform {
         case q: LogicalPlan => q transformExpressionsUp {
    -      case and @ And(left, right) => (left, right) match {
    -        // true && r  =>  r
    -        case (Literal(true, BooleanType), r) => r
    -        // l && true  =>  l
    -        case (l, Literal(true, BooleanType)) => l
    -        // false && r  =>  false
    -        case (Literal(false, BooleanType), _) => Literal(false)
    -        // l && false  =>  false
    -        case (_, Literal(false, BooleanType)) => Literal(false)
    -        // a && a  =>  a
    -        case (l, r) if l fastEquals r => l
    -        // a && (not(a) || b) => a && b
    -        case (l, Or(l1, r)) if (Not(l) == l1) => And(l, r)
    -        case (l, Or(r, l1)) if (Not(l) == l1) => And(l, r)
    -        case (Or(l, l1), r) if (l1 == Not(r)) => And(l, r)
    -        case (Or(l1, l), r) if (l1 == Not(r)) => And(l, r)
    -        // (a || b) && (a || c)  =>  a || (b && c)
    -        case _ =>
    -          // 1. Split left and right to get the disjunctive predicates,
    -          //   i.e. lhs = (a, b), rhs = (a, c)
    -          // 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a)
    -          // 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c)
    -          // 4. Apply the formula, get the optimized predicate: common || (ldiff && rdiff)
    -          val lhs = splitDisjunctivePredicates(left)
    -          val rhs = splitDisjunctivePredicates(right)
    -          val common = lhs.filter(e => rhs.exists(e.semanticEquals(_)))
    -          if (common.isEmpty) {
    -            // No common factors, return the original predicate
    -            and
    +      case TrueLiteral And e => e
    +      case e And TrueLiteral => e
    +      case FalseLiteral Or e => e
    +      case e Or FalseLiteral => e
    +
    +      case FalseLiteral And _ => FalseLiteral
    +      case _ And FalseLiteral => FalseLiteral
    +      case TrueLiteral Or _ => TrueLiteral
    +      case _ Or TrueLiteral => TrueLiteral
    +
    +      case a And b if a.semanticEquals(b) => a
    +      case a Or b if a.semanticEquals(b) => a
    +
    +      case a And (b Or c) if Not(a).semanticEquals(b) => And(a, c)
    +      case a And (b Or c) if Not(a).semanticEquals(c) => And(a, b)
    +      case (a Or b) And c if a.semanticEquals(Not(c)) => And(b, c)
    +      case (a Or b) And c if b.semanticEquals(Not(c)) => And(a, c)
    +
    +      case a Or (b And c) if Not(a).semanticEquals(b) => Or(a, c)
    +      case a Or (b And c) if Not(a).semanticEquals(c) => Or(a, b)
    +      case (a And b) Or c if a.semanticEquals(Not(c)) => Or(b, c)
    +      case (a And b) Or c if b.semanticEquals(Not(c)) => Or(a, c)
    +
    +      // Common factor elimination for conjunction
    +      case and @ (left And right) =>
    +        // 1. Split left and right to get the disjunctive predicates,
    +        //   i.e. lhs = (a, b), rhs = (a, c)
    +        // 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a)
    +        // 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c)
    +        // 4. Apply the formula, get the optimized predicate: common || (ldiff && rdiff)
    +        val lhs = splitDisjunctivePredicates(left)
    +        val rhs = splitDisjunctivePredicates(right)
    +        val common = lhs.filter(e => rhs.exists(e.semanticEquals))
    +        if (common.isEmpty) {
    +          // No common factors, return the original predicate
    +          and
    +        } else {
    +          val ldiff = lhs.filterNot(e => common.exists(e.semanticEquals))
    +          val rdiff = rhs.filterNot(e => common.exists(e.semanticEquals))
    +          if (ldiff.isEmpty || rdiff.isEmpty) {
    +            // (a || b || c || ...) && (a || b) => (a || b)
    +            common.reduce(Or)
               } else {
    -            val ldiff = lhs.filterNot(e => common.exists(e.semanticEquals(_)))
    -            val rdiff = rhs.filterNot(e => common.exists(e.semanticEquals(_)))
    -            if (ldiff.isEmpty || rdiff.isEmpty) {
    -              // (a || b || c || ...) && (a || b) => (a || b)
    -              common.reduce(Or)
    -            } else {
    -              // (a || b || c || ...) && (a || b || d || ...) =>
    -              // ((c || ...) && (d || ...)) || a || b
    -              (common :+ And(ldiff.reduce(Or), rdiff.reduce(Or))).reduce(Or)
    -            }
    +            // (a || b || c || ...) && (a || b || d || ...) =>
    +            // ((c || ...) && (d || ...)) || a || b
    +            (common :+ And(ldiff.reduce(Or), rdiff.reduce(Or))).reduce(Or)
               }
    -      }  // end of And(left, right)
    -
    -      case or @ Or(left, right) => (left, right) match {
    -        // true || r  =>  true
    -        case (Literal(true, BooleanType), _) => Literal(true)
    -        // r || true  =>  true
    -        case (_, Literal(true, BooleanType)) => Literal(true)
    -        // false || r  =>  r
    -        case (Literal(false, BooleanType), r) => r
    -        // l || false  =>  l
    -        case (l, Literal(false, BooleanType)) => l
    -        // a || a => a
    -        case (l, r) if l fastEquals r => l
    -        // (a && b) || (a && c)  =>  a && (b || c)
    -        case _ =>
    -           // 1. Split left and right to get the conjunctive predicates,
    -           //   i.e.  lhs = (a, b), rhs = (a, c)
    -           // 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a)
    -           // 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c)
    -           // 4. Apply the formula, get the optimized predicate: common && (ldiff || rdiff)
    -          val lhs = splitConjunctivePredicates(left)
    -          val rhs = splitConjunctivePredicates(right)
    -          val common = lhs.filter(e => rhs.exists(e.semanticEquals(_)))
    -          if (common.isEmpty) {
    -            // No common factors, return the original predicate
    -            or
    +        }
    +
    +      // Common factor elimination for disjunction
    +      case or @ (left Or right) =>
    +        // 1. Split left and right to get the conjunctive predicates,
    +        //   i.e.  lhs = (a, b), rhs = (a, c)
    +        // 2. Find the common predict between lhsSet and rhsSet, i.e. common = (a)
    +        // 3. Remove common predict from lhsSet and rhsSet, i.e. ldiff = (b), rdiff = (c)
    +        // 4. Apply the formula, get the optimized predicate: common && (ldiff || rdiff)
    +        val lhs = splitConjunctivePredicates(left)
    +        val rhs = splitConjunctivePredicates(right)
    +        val common = lhs.filter(e => rhs.exists(e.semanticEquals))
    +        if (common.isEmpty) {
    +          // No common factors, return the original predicate
    +          or
    +        } else {
    +          val ldiff = lhs.filterNot(e => common.exists(e.semanticEquals))
    +          val rdiff = rhs.filterNot(e => common.exists(e.semanticEquals))
    +          if (ldiff.isEmpty || rdiff.isEmpty) {
    +            // (a && b) || (a && b && c && ...) => a && b
    +            common.reduce(And)
               } else {
    -            val ldiff = lhs.filterNot(e => common.exists(e.semanticEquals(_)))
    -            val rdiff = rhs.filterNot(e => common.exists(e.semanticEquals(_)))
    -            if (ldiff.isEmpty || rdiff.isEmpty) {
    -              // (a && b) || (a && b && c && ...) => a && b
    -              common.reduce(And)
    -            } else {
    -              // (a && b && c && ...) || (a && b && d && ...) =>
    -              // ((c && ...) || (d && ...)) && a && b
    -              (common :+ Or(ldiff.reduce(And), rdiff.reduce(And))).reduce(And)
    -            }
    +            // (a && b && c && ...) || (a && b && d && ...) =>
    +            // ((c && ...) || (d && ...)) && a && b
    +            (common :+ Or(ldiff.reduce(And), rdiff.reduce(And))).reduce(And)
    --- End diff --
    
    (Code for common factor elimination for conjunction and disjunction is untouched except for indentation changes.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-166891946
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/48237/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-166891943
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-12498][SQL][MINOR] BooleanSimplication ...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the pull request:

    https://github.com/apache/spark/pull/10445#issuecomment-167957517
  
    Update: the unapply is probably fine. I still wouldn't have the dsl though.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org