You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by lw-lin <gi...@git.apache.org> on 2017/01/27 15:19:55 UTC

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

GitHub user lw-lin opened a pull request:

    https://github.com/apache/spark/pull/16719

    [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` should not expect such cases that l.hashcode > r.hashcode

    ## What changes were proposed in this pull request?
    
    During canonicalization, `NOT(l, r)` should not expect such cases that `l.hashcode > r.hashcode`.
    
    Take the rule `case NOT(Greater(l, r)) if l.hashcode > r.hashcode` for example, it should never be matched since `Greater(l, r)` itself would be re-written as `Greater(r, l)` after canonicalization given `l.hashcode > r.hashcode`.
    
    ## How was this patch tested?
    
    This patch expanded the `NOT` test case.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lw-lin/spark canonicalize

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16719.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16719
    
----
commit da0b98c7b255819ca7c24d32209ff5803ce46eab
Author: Liwei Lin <lw...@gmail.com>
Date:   2017-01-22T09:05:39Z

    Fix

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72075 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72075/testReport)** for PR 16719 at commit [`da0b98c`](https://github.com/apache/spark/commit/da0b98c7b255819ca7c24d32209ff5803ce46eab).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72076 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72076/testReport)** for PR 16719 at commit [`9c42889`](https://github.com/apache/spark/commit/9c428892c302eca3c8f6032a5c8e817501e1548f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72076/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72126/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72126 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72126/testReport)** for PR 16719 at commit [`c5fc394`](https://github.com/apache/spark/commit/c5fc394a2ba1bfcaafa9d19bd92c449639a5d576).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72128/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346104
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,14 +78,18 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    --- End diff --
    
    uh. Just saw the above comment from @dongjoon-hyun . Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346453
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    --- End diff --
    
    thanks.
    
    the reason is in [Canonicalize.scala#ignoreNamesTypes](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala#L41), we're making copies of `e` (`maxHash` in this case):
    ```scala
      private def ignoreNamesTypes(e: Expression): Expression = e match {
        case a: AttributeReference =>
          AttributeReference("none", a.dataType.asNullable)(exprId = a.exprId)
        case _ => e
      }
    ```
    so, even if we `override def hashCode: Int = Int.MaxValue` on `maxHash`, it has nothing to do with the copy's hashcode.
    
    then i took a step back -- defined `exprId`'s hashcode to a specific value (as provided in this patch), which would further affect the copied attribute-reference's hashcode.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98345847
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,14 +78,18 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    --- End diff --
    
    This is a dead code, because our canonicalization order is bottom up, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16719


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    The original logic was designed to be safe for changing the caller bottom-up code, [here](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala#L154-L157).
    
    ```scala
      lazy val canonicalized: Expression = {
        val canonicalizedChildren = children.map(_.canonicalized)
        Canonicalize.execute(withNewChildren(canonicalizedChildren))
      }
    ```
    But, I agree that it's safe to simplify that with the new @lw-lin 's test cases.
    
    For the `assert` statements, I think @cloud-fan and @gatorsmile can give more insightful advice.
    
    For me, LGTM except that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72075 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72075/testReport)** for PR 16719 at commit [`da0b98c`](https://github.com/apache/spark/commit/da0b98c7b255819ca7c24d32209ff5803ce46eab).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98347229
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,14 +78,18 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    -    case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
    -    case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l)
    -    case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
    -    case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
    -    case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
    -    case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
    -    case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
    +    case Not(GreaterThan(l, r)) =>
    +      assert(l.hashCode() <= r.hashCode())
    --- End diff --
    
    sure!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98360513
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    --- End diff --
    
    Great!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346066
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    +          // We are implementing this equals() only because the style-checking rule "you should
    +          // implement equals and hashCode together" requires us to
    +          override def equals(obj: Any): Boolean = super.equals(obj)
    +        })).asInstanceOf[AttributeReference]
    +  assert(maxHash.hashCode() == Int.MaxValue - 1)
    +
    +  // An [AttributeReference] with almost the minimum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val minHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("minHash", IntegerType)(exprId =
    +        new ExprId(5, NamedExpression.jvmId) {
    +          // minHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure minHash's hashcode is almost
    +          // `Int.MinValue`
    +          override def hashCode: Int = 826929707
    --- End diff --
    
    Why not `override def hashCode: Int = Int.MinValue`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346981
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    --- End diff --
    
    ah, `-1030353449` works great! let me push a commit updating this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72128 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72128/testReport)** for PR 16719 at commit [`c5fc394`](https://github.com/apache/spark/commit/c5fc394a2ba1bfcaafa9d19bd92c449639a5d576).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    @cloud-fan @gatorsmile @dongjoon-hyun would you take a look, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98322977
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,14 +78,18 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    -    case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
    -    case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l)
    -    case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
    -    case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
    -    case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
    -    case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
    -    case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
    +    case Not(GreaterThan(l, r)) =>
    +      assert(l.hashCode() <= r.hashCode())
    --- End diff --
    
    thanks! maybe an alternative way is to add comments saying it's guaranteed that `l.hashcode <= r.hashcode`, otherwise people might wonder why there is no `case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode()` at their first glance.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346062
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    --- End diff --
    
    Why not `override def hashCode: Int = Int.MaxValue`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98345869
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,14 +78,18 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    -    case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
    -    case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l)
    -    case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
    -    case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
    -    case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
    -    case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
    -    case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
    +    case Not(GreaterThan(l, r)) =>
    +      assert(l.hashCode() <= r.hashCode())
    --- End diff --
    
    It should be fine to get rid of `assert`, as long as we add the code comments and the needed test cases.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72075/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/72123/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72123 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72123/testReport)** for PR 16719 at commit [`4c0af3a`](https://github.com/apache/spark/commit/4c0af3ab3c79c8c132ea237d9cf9ad65ecf3594d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Thanks! Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    LGTM except one comment. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98363002
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,13 +78,11 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    +    // Note in the following `NOT` cases, `l.hashCode() <= r.hashCode()` holds. The reason is that
    +    // canonicalization is conducted bottom-up -- see [[Expression.canonicalized]].
    --- End diff --
    
    To the other reviewers, this PR added test cases in `ExpressionSetSuite.scala` to ensure it. Thus, it is safe to clean the codes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346741
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    +          // We are implementing this equals() only because the style-checking rule "you should
    +          // implement equals and hashCode together" requires us to
    +          override def equals(obj: Any): Boolean = super.equals(obj)
    +        })).asInstanceOf[AttributeReference]
    +  assert(maxHash.hashCode() == Int.MaxValue - 1)
    +
    +  // An [AttributeReference] with almost the minimum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val minHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("minHash", IntegerType)(exprId =
    +        new ExprId(5, NamedExpression.jvmId) {
    +          // minHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure minHash's hashcode is almost
    +          // `Int.MinValue`
    +          override def hashCode: Int = 826929707
    --- End diff --
    
    To make `minHash.hashCode()` equal to `Int.MinValue`, you can set it to `1407330692`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98347203
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,14 +78,18 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    -    case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
    -    case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l)
    -    case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
    -    case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
    -    case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
    -    case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
    -    case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
    +    case Not(GreaterThan(l, r)) =>
    +      assert(l.hashCode() <= r.hashCode())
    --- End diff --
    
    I think we can remove `assert`, because the test cases already cover the scenario. You can add a comment to explain. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72076/testReport)** for PR 16719 at commit [`9c42889`](https://github.com/apache/spark/commit/9c428892c302eca3c8f6032a5c8e817501e1548f).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by gatorsmile <gi...@git.apache.org>.

Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346688
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    --- End diff --
    
    uh, I did not read the comment carefully. Thanks for the explanation. 
    You can set it to `-1030353449`. Then, `maxHash.hashCode()` will be equal to `Int.MaxValue`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98346999
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -32,6 +32,38 @@ class ExpressionSetSuite extends SparkFunSuite {
     
       val aAndBSet = AttributeSet(aUpper :: bUpper :: Nil)
     
    +  // An [AttributeReference] with almost the maximum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val maxHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("maxHash", IntegerType)(exprId =
    +        new ExprId(4, NamedExpression.jvmId) {
    +          // maxHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure maxHash's hashcode is almost
    +          // `Int.MaxValue`
    +          override def hashCode: Int = 826929706
    +          // We are implementing this equals() only because the style-checking rule "you should
    +          // implement equals and hashCode together" requires us to
    +          override def equals(obj: Any): Boolean = super.equals(obj)
    +        })).asInstanceOf[AttributeReference]
    +  assert(maxHash.hashCode() == Int.MaxValue - 1)
    +
    +  // An [AttributeReference] with almost the minimum hashcode, to make testing canonicalize rules
    +  // like `case GreaterThan(l, r) if l.hashcode > r.hashcode => GreaterThan(r, l)` easier
    +  val minHash =
    +    Canonicalize.ignoreNamesTypes(
    +      AttributeReference("minHash", IntegerType)(exprId =
    +        new ExprId(5, NamedExpression.jvmId) {
    +          // minHash's hashcode is calculated based on this exprId's hashcode, so we set this
    +          // exprId's hashCode to this specific value to make sure minHash's hashcode is almost
    +          // `Int.MinValue`
    +          override def hashCode: Int = 826929707
    --- End diff --
    
    updated, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98280054
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Canonicalize.scala ---
    @@ -78,14 +78,18 @@ object Canonicalize extends {
         case GreaterThanOrEqual(l, r) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
         case LessThanOrEqual(l, r) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
     
    -    case Not(GreaterThan(l, r)) if l.hashCode() > r.hashCode() => GreaterThan(r, l)
    -    case Not(GreaterThan(l, r)) => LessThanOrEqual(l, r)
    -    case Not(LessThan(l, r)) if l.hashCode() > r.hashCode() => LessThan(r, l)
    -    case Not(LessThan(l, r)) => GreaterThanOrEqual(l, r)
    -    case Not(GreaterThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => GreaterThanOrEqual(r, l)
    -    case Not(GreaterThanOrEqual(l, r)) => LessThan(l, r)
    -    case Not(LessThanOrEqual(l, r)) if l.hashCode() > r.hashCode() => LessThanOrEqual(r, l)
    -    case Not(LessThanOrEqual(l, r)) => GreaterThan(l, r)
    +    case Not(GreaterThan(l, r)) =>
    +      assert(l.hashCode() <= r.hashCode())
    --- End diff --
    
    Can we remove these `assert`s? It seems to be verified with your test cases now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(l, r)` ...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    Hi, @lw-lin .
    Thank you for pining me. I'll take a look.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by dongjoon-hyun <gi...@git.apache.org>.

Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98278557
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite {
       setTest(1, aUpper >= bUpper, bUpper <= aUpper)
     
       // `Not` canonicalization
    -  setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), Literal(1) >= aUpper)
    -  setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), Literal(1) <= aUpper)
    -  setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), Literal(1) > aUpper)
    -  setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), Literal(1) < aUpper)
    +  setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), Literal(1) >= maxHash)
    +  setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), Literal(1) >= minHash)
    +  setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), Literal(1) <= maxHash)
    +  setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), Literal(1) <= minHash)
    +  setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), Literal(1) > maxHash)
    +  setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), Literal(1) > minHash)
    +  setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), Literal(1) < maxHash)
    +  setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), Literal(1) < minHash)
    --- End diff --
    
    These test cases are covered previously correctly. Actually, this PR simplifies the logics only. Am I right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72126 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72126/testReport)** for PR 16719 at commit [`c5fc394`](https://github.com/apache/spark/commit/c5fc394a2ba1bfcaafa9d19bd92c449639a5d576).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...(l, ...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/16719
  
    **[Test build #72128 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/72128/testReport)** for PR 16719 at commit [`c5fc394`](https://github.com/apache/spark/commit/c5fc394a2ba1bfcaafa9d19bd92c449639a5d576).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #16719: [SPARK-19385][SQL] During canonicalization, `NOT(...

Posted by lw-lin <gi...@git.apache.org>.

Github user lw-lin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16719#discussion_r98322886
  
    --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ExpressionSetSuite.scala ---
    @@ -75,10 +107,14 @@ class ExpressionSetSuite extends SparkFunSuite {
       setTest(1, aUpper >= bUpper, bUpper <= aUpper)
     
       // `Not` canonicalization
    -  setTest(1, Not(aUpper > 1), aUpper <= 1, Not(Literal(1) < aUpper), Literal(1) >= aUpper)
    -  setTest(1, Not(aUpper < 1), aUpper >= 1, Not(Literal(1) > aUpper), Literal(1) <= aUpper)
    -  setTest(1, Not(aUpper >= 1), aUpper < 1, Not(Literal(1) <= aUpper), Literal(1) > aUpper)
    -  setTest(1, Not(aUpper <= 1), aUpper > 1, Not(Literal(1) >= aUpper), Literal(1) < aUpper)
    +  setTest(1, Not(maxHash > 1), maxHash <= 1, Not(Literal(1) < maxHash), Literal(1) >= maxHash)
    +  setTest(1, Not(minHash > 1), minHash <= 1, Not(Literal(1) < minHash), Literal(1) >= minHash)
    +  setTest(1, Not(maxHash < 1), maxHash >= 1, Not(Literal(1) > maxHash), Literal(1) <= maxHash)
    +  setTest(1, Not(minHash < 1), minHash >= 1, Not(Literal(1) > minHash), Literal(1) <= minHash)
    +  setTest(1, Not(maxHash >= 1), maxHash < 1, Not(Literal(1) <= maxHash), Literal(1) > maxHash)
    +  setTest(1, Not(minHash >= 1), minHash < 1, Not(Literal(1) <= minHash), Literal(1) > minHash)
    +  setTest(1, Not(maxHash <= 1), maxHash > 1, Not(Literal(1) >= maxHash), Literal(1) < maxHash)
    +  setTest(1, Not(minHash <= 1), minHash > 1, Not(Literal(1) >= minHash), Literal(1) < minHash)
    --- End diff --
    
    yea sure they are covered correctly even prior to this patch's changes!
    
    the previous `aUpper`'hashcode is either greater than or less than `1`'s hashcode but can not be both, while this change aims to test both cases -- but I'm quite open to revert the changes if they are considered unnecessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org