You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by liancheng <gi...@git.apache.org> on 2014/10/11 10:03:39 UTC

[GitHub] spark pull request: [SQL] Refactors data type pattern matching

GitHub user liancheng opened a pull request:

    https://github.com/apache/spark/pull/2764

    [SQL] Refactors data type pattern matching

    Refactors/adds extractors for `DataType` and `Binary*` types to ease and simplify data type related (nested) pattern matching.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/liancheng/spark datatype-patmat

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/2764.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2764
    
----
commit f391be51ee91da4c12146c90aad9f63d06f0ac34
Author: Cheng Lian <li...@gmail.com>
Date:   2014-10-11T07:37:25Z

    Refactors data type pattern matching

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/2764


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-58742085
  
      [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21640/consoleFull) for   PR 2764 at commit [`f391be5`](https://github.com/apache/spark/commit/f391be51ee91da4c12146c90aad9f63d06f0ac34).
     * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-58951446
  
    This does look like a better style that what we were doing before.  We should coordinate with the changes @mateiz is making for decimal type.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2764#discussion_r19118466
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -107,20 +107,20 @@ trait HiveTypeCoercion {
             case e if !e.childrenResolved => e
     
             /* Double Conversions */
    -        case b: BinaryExpression if b.left == stringNaN && b.right.dataType == DoubleType =>
    -          b.makeCopy(Array(b.right, Literal(Double.NaN)))
    -        case b: BinaryExpression if b.left.dataType == DoubleType && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Double.NaN), b.left))
    -        case b: BinaryExpression if b.left == stringNaN && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Double.NaN), b.left))
    +        case b @ BinaryExpression(StringNaN, DoubleType(r)) =>
    +          b.makeCopy(Array(r, Literal(Double.NaN)))
    +        case b @ BinaryExpression(DoubleType(l), StringNaN) =>
    +          b.makeCopy(Array(Literal(Double.NaN), l))
    +        case b @ BinaryExpression(l @ StringNaN, StringNaN) =>
    +          b.makeCopy(Array(Literal(Double.NaN), l))
     
             /* Float Conversions */
    -        case b: BinaryExpression if b.left == stringNaN && b.right.dataType == FloatType =>
    -          b.makeCopy(Array(b.right, Literal(Float.NaN)))
    -        case b: BinaryExpression if b.left.dataType == FloatType && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Float.NaN), b.left))
    -        case b: BinaryExpression if b.left == stringNaN && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Float.NaN), b.left))
    +        case b @ BinaryExpression(StringNaN, FloatType(r)) =>
    +          b.makeCopy(Array(r, Literal(Float.NaN)))
    +        case b @ BinaryExpression(FloatType(l), StringNaN) =>
    +          b.makeCopy(Array(Literal(Float.NaN), l))
    +        case b @ BinaryExpression(l @ StringNaN, StringNaN) =>
    +          b.makeCopy(Array(Literal(Float.NaN), l))
    --- End diff --
    
    Yes, this is probably a copy paste bug.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2764#discussion_r19118592
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -319,10 +315,8 @@ case object ByteType extends IntegralType {
     
     /** Matcher for any expressions that evaluate to [[FractionalType]]s */
     object FractionalType {
    -  def unapply(a: Expression): Boolean = a match {
    -    case e: Expression if e.dataType.isInstanceOf[FractionalType] => true
    -    case _ => false
    -  }
    +  def unapply[T <: Expression](a: T): Option[T] =
    --- End diff --
    
    Why are these type parameterized?  Things seems to compile without this added complexity.  Extra refinement can be done regardless with nested pattern matching.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-58743014
  
      [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21640/consoleFull) for   PR 2764 at commit [`f391be5`](https://github.com/apache/spark/commit/f391be51ee91da4c12146c90aad9f63d06f0ac34).
     * This patch **passes all tests**.
     * This patch merges cleanly.
     * This patch adds the following public classes _(experimental)_:
      * `case class InSet(value: Expression, hset: HashSet[Any], child: Seq[Expression])`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by marmbrus <gi...@git.apache.org>.
Github user marmbrus commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-67381357
  
    Sorry for the delay here.  I think it would be good to clean up how we do pattern matching and this looks like a good start.  Here's what I propose:
     - close this issue for now
     - write up a short description in the PR on the flavors of matching that we are going to support for datatype and expressions.
     - reopen this and we'll merge it quickly this time :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2764#discussion_r18740906
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -107,20 +107,20 @@ trait HiveTypeCoercion {
             case e if !e.childrenResolved => e
     
             /* Double Conversions */
    -        case b: BinaryExpression if b.left == stringNaN && b.right.dataType == DoubleType =>
    -          b.makeCopy(Array(b.right, Literal(Double.NaN)))
    -        case b: BinaryExpression if b.left.dataType == DoubleType && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Double.NaN), b.left))
    -        case b: BinaryExpression if b.left == stringNaN && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Double.NaN), b.left))
    +        case b @ BinaryExpression(StringNaN, DoubleType(r)) =>
    +          b.makeCopy(Array(r, Literal(Double.NaN)))
    +        case b @ BinaryExpression(DoubleType(l), StringNaN) =>
    +          b.makeCopy(Array(Literal(Double.NaN), l))
    +        case b @ BinaryExpression(l @ StringNaN, StringNaN) =>
    +          b.makeCopy(Array(Literal(Double.NaN), l))
     
             /* Float Conversions */
    -        case b: BinaryExpression if b.left == stringNaN && b.right.dataType == FloatType =>
    -          b.makeCopy(Array(b.right, Literal(Float.NaN)))
    -        case b: BinaryExpression if b.left.dataType == FloatType && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Float.NaN), b.left))
    -        case b: BinaryExpression if b.left == stringNaN && b.right == stringNaN =>
    -          b.makeCopy(Array(Literal(Float.NaN), b.left))
    +        case b @ BinaryExpression(StringNaN, FloatType(r)) =>
    +          b.makeCopy(Array(r, Literal(Float.NaN)))
    +        case b @ BinaryExpression(FloatType(l), StringNaN) =>
    +          b.makeCopy(Array(Literal(Float.NaN), l))
    +        case b @ BinaryExpression(l @ StringNaN, StringNaN) =>
    +          b.makeCopy(Array(Literal(Float.NaN), l))
    --- End diff --
    
    This case branch can never be reached since line 114 supersedes it. As a result, "NaN" is always `Double`, bug?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-58743015
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21640/Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-61400524
  
    @marmbrus This should be ready to go since #2983 has been merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-61361922
  
    Rebased to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/2764#issuecomment-58742036
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SQL] Refactors data type pattern matching

Posted by liancheng <gi...@git.apache.org>.
Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/2764#discussion_r19135372
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/types/dataTypes.scala ---
    @@ -319,10 +315,8 @@ case object ByteType extends IntegralType {
     
     /** Matcher for any expressions that evaluate to [[FractionalType]]s */
     object FractionalType {
    -  def unapply(a: Expression): Boolean = a match {
    -    case e: Expression if e.dataType.isInstanceOf[FractionalType] => true
    -    case _ => false
    -  }
    +  def unapply[T <: Expression](a: T): Option[T] =
    --- End diff --
    
    Right, this type parameter can be removed. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org