You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yuanjian Li (JIRA)" <ji...@apache.org> on 2018/10/22 14:37:00 UTC
[jira] [Updated] (SPARK-25798) Internally document type conversion
between Pandas data and SQL types in Pandas UDFs
[ https://issues.apache.org/jira/browse/SPARK-25798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yuanjian Li updated SPARK-25798:
--------------------------------
Description:
Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/20163 and https://github.com/apache/spark/pull/22610
This JIRA targets to describe the type conversion logic internally. For instance:
{code}
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
# |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
# | boolean| True| True| True| True| True| True| True| True| True| X| False| False| False| X| X| False| # noqa
# | tinyint| 1| 1| 1| 1| 1| X| X| X| X| X| X| X| 1| X| 0| X| # noqa
# | smallint| 1| 1| 1| 1| 1| 1| X| X| X| X| X| X| 1| X| X| X| # noqa
# | int| 1| 1| 1| 1| 1| 1| 1| X| X| X| X| X| 1| X| X| X| # noqa
# | bigint| 1| 1| 1| 1| 1| 1| 1| 1| X| X| 0| 18000000000000| 1| X| X| X| # noqa
# | string| u''|u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'a'| X| X| u''| X| X| X| # noqa
# | date| X| X| X|datetime.date(197...| X| X| X| X| X| X| datetime.date(197...| X| X| X| X| X| # noqa
# | timestamp| X| X| X| X|datetime.datetime...| X| X| X| X| X| datetime.datetime...| datetime.datetime...| X| X| X| X| # noqa
# | float| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
# | double| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
# | array<int>| X| X| X| X| X| X| X| X| X| X| X| X| X| [1, 2, 3]| X| X| # noqa
# | binary| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | decimal(10,0)| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | map<string,int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | struct<_1:int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
{code}
was:
Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/22610 and https://github.com/apache/spark/pull/22610
This JIRA targets to describe the type conversion logic internally. For instance:
{code}
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
# |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
# | boolean| True| True| True| True| True| True| True| True| True| X| False| False| False| X| X| False| # noqa
# | tinyint| 1| 1| 1| 1| 1| X| X| X| X| X| X| X| 1| X| 0| X| # noqa
# | smallint| 1| 1| 1| 1| 1| 1| X| X| X| X| X| X| 1| X| X| X| # noqa
# | int| 1| 1| 1| 1| 1| 1| 1| X| X| X| X| X| 1| X| X| X| # noqa
# | bigint| 1| 1| 1| 1| 1| 1| 1| 1| X| X| 0| 18000000000000| 1| X| X| X| # noqa
# | string| u''|u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'a'| X| X| u''| X| X| X| # noqa
# | date| X| X| X|datetime.date(197...| X| X| X| X| X| X| datetime.date(197...| X| X| X| X| X| # noqa
# | timestamp| X| X| X| X|datetime.datetime...| X| X| X| X| X| datetime.datetime...| datetime.datetime...| X| X| X| X| # noqa
# | float| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
# | double| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
# | array<int>| X| X| X| X| X| X| X| X| X| X| X| X| X| [1, 2, 3]| X| X| # noqa
# | binary| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | decimal(10,0)| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | map<string,int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | struct<_1:int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
{code}
> Internally document type conversion between Pandas data and SQL types in Pandas UDFs
> ------------------------------------------------------------------------------------
>
> Key: SPARK-25798
> URL: https://issues.apache.org/jira/browse/SPARK-25798
> Project: Spark
> Issue Type: Sub-task
> Components: PySpark
> Affects Versions: 2.4.0
> Reporter: Hyukjin Kwon
> Priority: Minor
>
> Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/20163 and https://github.com/apache/spark/pull/22610
> This JIRA targets to describe the type conversion logic internally. For instance:
> {code}
> # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
> # |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa
> # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
> # | boolean| True| True| True| True| True| True| True| True| True| X| False| False| False| X| X| False| # noqa
> # | tinyint| 1| 1| 1| 1| 1| X| X| X| X| X| X| X| 1| X| 0| X| # noqa
> # | smallint| 1| 1| 1| 1| 1| 1| X| X| X| X| X| X| 1| X| X| X| # noqa
> # | int| 1| 1| 1| 1| 1| 1| 1| X| X| X| X| X| 1| X| X| X| # noqa
> # | bigint| 1| 1| 1| 1| 1| 1| 1| 1| X| X| 0| 18000000000000| 1| X| X| X| # noqa
> # | string| u''|u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'a'| X| X| u''| X| X| X| # noqa
> # | date| X| X| X|datetime.date(197...| X| X| X| X| X| X| datetime.date(197...| X| X| X| X| X| # noqa
> # | timestamp| X| X| X| X|datetime.datetime...| X| X| X| X| X| datetime.datetime...| datetime.datetime...| X| X| X| X| # noqa
> # | float| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
> # | double| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
> # | array<int>| X| X| X| X| X| X| X| X| X| X| X| X| X| [1, 2, 3]| X| X| # noqa
> # | binary| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
> # | decimal(10,0)| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
> # | map<string,int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
> # | struct<_1:int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
> # +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org