You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by damnMeddlingKid <gi...@git.apache.org> on 2016/07/12 22:26:11 UTC

[GitHub] spark pull request #14164: Allow comparisons between UDTs and Datatypes

GitHub user damnMeddlingKid opened a pull request:

    https://github.com/apache/spark/pull/14164

    Allow comparisons between UDTs and Datatypes

    ## What changes were proposed in this pull request?
    Currently UDTs can not be compared to Datatypes even if their sqlTypes match. this leads to errors like this 
    
    ```
    
    In [12]: thresholded = df.filter(df['udt_time'] > threshold)
    ---------------------------------------------------------------------------
    AnalysisException                         Traceback (most recent call last)
    /Users/franklyndsouza/dev/starscream/bin/starscream in <module>()
    ----> 1 thresholded = df.filter(df['tick_tock_est'] > threshold)
    
    AnalysisException: u"cannot resolve '(`tick_tock_est` > TIMESTAMP('2015-10-20 01:00:00.0'))' due to data typ mismatch: '(`tick_tock_est` > TIMESTAMP('2015-10-20 01:00:00.0'))' requires (boolean or tinyint or smallint or int or bigint or float or double or decimal or timestamp or date or string or binary) type, not pythonuserdefined"
    
    ```
    
    This PR adds some comparisons that allow UDTs to be correctly compared to a Datatype.
    
    
    ## How was this patch tested?
    
    Built locally and tested in the pyspark repl.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/damnMeddlingKid/spark fix-df-filtering

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14164.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14164
    
----
commit d0d31ca18c49fd24476d8b7291cb16d5f346ee6e
Author: Franklyn D'souza <fr...@gmail.com>
Date:   2016-07-12T22:17:25Z

    allow comparisons between UDTs and Datatypes

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14164: Allow comparisons between UDTs and Datatypes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14164
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14164: Allow comparisons between UDTs and Datatypes

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/14164
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14164: [SPARK-16629] Allow comparisons between UDTs and ...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/14164#discussion_r71965649
  
    --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala ---
    @@ -110,6 +110,28 @@ class DataFrameSuite extends QueryTest with SharedSQLContext {
         )
       }
     
    +  test("test filtering with predicates on UDT columns") {
    +    val rowRDD = sparkContext.parallelize(Seq(Row(new ExampleMoney(1.0)), Row(new ExampleMoney(2.0)), Row(new ExampleMoney(3.0))))
    +    val schema = StructType(Array(StructField("dollar", new ExampleMoneyUDT(), false)))
    +    val df = spark.createDataFrame(rowRDD, schema)
    +
    +    checkAnswer(df.filter(df("dollar") < 2.0), Seq(Row(new ExampleMoney(1.0))))
    --- End diff --
    
    cc @mengxr , is UDT designed to work like this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14164: [SPARK-16629] Allow comparisons between UDTs and Datatyp...

Posted by damnMeddlingKid <gi...@git.apache.org>.
Github user damnMeddlingKid commented on the issue:

    https://github.com/apache/spark/pull/14164
  
    I've tested this successfully with int and timestamp types, but it doesn't seem to work with DecimalType. Anyone know what could be wrong ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14164: [SPARK-16629] Allow comparisons between UDTs and Datatyp...

Posted by rxin <gi...@git.apache.org>.
Github user rxin commented on the issue:

    https://github.com/apache/spark/pull/14164
  
    cc @cloud-fan


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #14164: [SPARK-16629] Allow comparisons between UDTs and ...

Posted by damnMeddlingKid <gi...@git.apache.org>.
Github user damnMeddlingKid closed the pull request at:

    https://github.com/apache/spark/pull/14164


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14164: Allow comparisons between UDTs and Datatypes

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14164
  
    Please read https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark and attach this to your JIRA via the title. I fixed some fields in your JIRA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #14164: [SPARK-16629] Allow comparisons between UDTs and Datatyp...

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/14164
  
    can you add a regression test in your PR? thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org