You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by kevinyu98 <gi...@git.apache.org> on 2015/11/15 09:21:40 UTC

[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

GitHub user kevinyu98 opened a pull request:

    https://github.com/apache/spark/pull/9720

    [SPARK-11447][SQL] change NullType to StringType during binaryComparison between NullType and StringType

    During executing PromoteStrings rule, if one side of binaryComparison is StringType and the other side is not StringType, the current code will promote(cast) the StringType to DoubleType, and if the StringType doesn't contain the numbers, it will get null value. So if it is doing <=> (NULL-safe equal) with Null, it will not filter anything, caused the problem reported by this jira.
    
    I proposal to the changes through this PR, can you review my code changes ? 
    
    This problem only happen for <=>, other operators works fine.
    
    scala> val filteredDF = df.filter(df("column") > (new Column(Literal(null))))
    filteredDF: org.apache.spark.sql.DataFrame = [column: string]
    
    scala> filteredDF.show
    +------+
    |column|
    +------+
    +------+
    
    scala> val filteredDF = df.filter(df("column") === (new Column(Literal(null))))
    filteredDF: org.apache.spark.sql.DataFrame = [column: string]
    
    scala> filteredDF.show
    +------+
    |column|
    +------+
    +------+
    
    scala> df.registerTempTable("DF")
    
    scala> sqlContext.sql("select * from DF where 'column' = NULL")
    res27: org.apache.spark.sql.DataFrame = [column: string]
    
    scala> res27.show
    +------+
    |column|
    +------+
    +------+

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/kevinyu98/spark working_on_spark-11447

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/9720.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #9720
    
----
commit b53b85cad4f5fced9ba003351d5a9af1eb5111fc
Author: Kevin Yu <qy...@us.ibm.com>
Date:   2015-11-13T18:11:59Z

    [SPARK-11447]Check NullType before Promote StringType

commit bb705cae18032fcee8f8a532be464f0a995b27cb
Author: Kevin Yu <qy...@us.ibm.com>
Date:   2015-11-15T06:41:48Z

    add testcase in ColumnExpressionSuite

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by yhuai <gi...@git.apache.org>.
Github user yhuai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9720#discussion_r44885580
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -280,6 +280,12 @@ object HiveTypeCoercion {
           case p @ BinaryComparison(left @ DateType(), right @ TimestampType()) =>
             p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType)))
     
    +      // Checking NullType
    +      case p @ BinaryComparison(left @ StringType(), right @ NullType()) =>
    +        p.makeCopy(Array(left, Literal.create(null, StringType)))
    +      case p @ BinaryComparison(left @ NullType(), right @ StringType()) =>
    +        p.makeCopy(Array(Literal.create(null, StringType), right))
    +
           case p @ BinaryComparison(left @ StringType(), right) if right.dataType != StringType =>
             p.makeCopy(Array(Cast(left, DoubleType), right))
    --- End diff --
    
    Seems @liancheng added this change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156917038
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45974/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/9720#discussion_r44883090
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala ---
    @@ -280,6 +280,12 @@ object HiveTypeCoercion {
           case p @ BinaryComparison(left @ DateType(), right @ TimestampType()) =>
             p.makeCopy(Array(Cast(left, StringType), Cast(right, StringType)))
     
    +      // Checking NullType
    +      case p @ BinaryComparison(left @ StringType(), right @ NullType()) =>
    +        p.makeCopy(Array(left, Literal.create(null, StringType)))
    +      case p @ BinaryComparison(left @ NullType(), right @ StringType()) =>
    +        p.makeCopy(Array(Literal.create(null, StringType), right))
    +
           case p @ BinaryComparison(left @ StringType(), right) if right.dataType != StringType =>
             p.makeCopy(Array(Cast(left, DoubleType), right))
    --- End diff --
    
    this rule looks weird to me, how about cast to tightest common type of left and right? cc @marmbrus  @yhuai @nongli 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156887736
  
    ok to test


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156888589
  
    **[Test build #45967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45967/consoleFull)** for PR 9720 at commit [`bb705ca`](https://github.com/apache/spark/commit/bb705cae18032fcee8f8a532be464f0a995b27cb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156888775
  
    **[Test build #45967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45967/consoleFull)** for PR 9720 at commit [`bb705ca`](https://github.com/apache/spark/commit/bb705cae18032fcee8f8a532be464f0a995b27cb).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156916982
  
    **[Test build #45974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45974/consoleFull)** for PR 9720 at commit [`5a5be06`](https://github.com/apache/spark/commit/5a5be06dcd0bb02d2fecd546245106f8d045c724).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156789931
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156888778
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156888780
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/45967/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156901114
  
    **[Test build #45974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/45974/consoleFull)** for PR 9720 at commit [`5a5be06`](https://github.com/apache/spark/commit/5a5be06dcd0bb02d2fecd546245106f8d045c724).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request: [SPARK-11447][SQL] change NullType to StringTy...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/9720#issuecomment-156917037
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org