You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2018/10/24 17:05:00 UTC

[jira] [Resolved] (SPARK-25798) Internally document type conversion between Pandas data and SQL types in Pandas UDFs

     [ https://issues.apache.org/jira/browse/SPARK-25798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Cutler resolved SPARK-25798.
----------------------------------
       Resolution: Fixed
    Fix Version/s: 3.0.0

Issue resolved by pull request 22795
[https://github.com/apache/spark/pull/22795]

> Internally document type conversion between Pandas data and SQL types in Pandas UDFs
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-25798
>                 URL: https://issues.apache.org/jira/browse/SPARK-25798
>             Project: Spark
>          Issue Type: Sub-task
>          Components: PySpark
>    Affects Versions: 2.4.0
>            Reporter: Hyukjin Kwon
>            Assignee: Hyukjin Kwon
>            Priority: Minor
>             Fix For: 3.0.0
>
>
> Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/20163 and https://github.com/apache/spark/pull/22610
> This JIRA targets to describe the type conversion logic internally. For instance:
> {code}
>     # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+  # noqa
>     # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)|            1(int32)|            1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 00:00:00(timedelta64[ns])|  # noqa
>     # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+  # noqa
>     # |               boolean|      True|   True|    True|                True|                True|    True|     True|     True|     True|        X|                              False|                                                False|       False|                     X|          X|                           False|  # noqa
>     # |               tinyint|         1|      1|       1|                   1|                   1|       X|        X|        X|        X|        X|                                  X|                                                    X|           1|                     X|          0|                               X|  # noqa
>     # |              smallint|         1|      1|       1|                   1|                   1|       1|        X|        X|        X|        X|                                  X|                                                    X|           1|                     X|          X|                               X|  # noqa
>     # |                   int|         1|      1|       1|                   1|                   1|       1|        1|        X|        X|        X|                                  X|                                                    X|           1|                     X|          X|                               X|  # noqa
>     # |                bigint|         1|      1|       1|                   1|                   1|       1|        1|        1|        X|        X|                                  0|                                       18000000000000|           1|                     X|          X|                               X|  # noqa
>     # |                string|       u''|u'\x01'| u'\x01'|             u'\x01'|             u'\x01'| u'\x01'|  u'\x01'|  u'\x01'|  u'\x01'|     u'a'|                                  X|                                                    X|         u''|                     X|          X|                               X|  # noqa
>     # |                  date|         X|      X|       X|datetime.date(197...|                   X|       X|        X|        X|        X|        X|               datetime.date(197...|                                                    X|           X|                     X|          X|                               X|  # noqa
>     # |             timestamp|         X|      X|       X|                   X|datetime.datetime...|       X|        X|        X|        X|        X|               datetime.datetime...|                                 datetime.datetime...|           X|                     X|          X|                               X|  # noqa
>     # |                 float|       1.0|    1.0|     1.0|                 1.0|                 1.0|     1.0|      1.0|      1.0|      1.0|        X|                                  X|                                                    X|         1.0|                     X|          X|                               X|  # noqa
>     # |                double|       1.0|    1.0|     1.0|                 1.0|                 1.0|     1.0|      1.0|      1.0|      1.0|        X|                                  X|                                                    X|         1.0|                     X|          X|                               X|  # noqa
>     # |            array<int>|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|             [1, 2, 3]|          X|                               X|  # noqa
>     # |                binary|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
>     # |         decimal(10,0)|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
>     # |       map<string,int>|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
>     # |        struct<_1:int>|         X|      X|       X|                   X|                   X|       X|        X|        X|        X|        X|                                  X|                                                    X|           X|                     X|          X|                               X|  # noqa
>     # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+  # noqa
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org