You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Hyukjin Kwon (JIRA)" <ji...@apache.org> on 2018/10/22 10:47:00 UTC
[jira] [Created] (SPARK-25798) Internally document type conversion
between Pandas data and SQL types in Pandas UDFs
Hyukjin Kwon created SPARK-25798:
------------------------------------
Summary: Internally document type conversion between Pandas data and SQL types in Pandas UDFs
Key: SPARK-25798
URL: https://issues.apache.org/jira/browse/SPARK-25798
Project: Spark
Issue Type: Sub-task
Components: PySpark
Affects Versions: 2.4.0
Reporter: Hyukjin Kwon
Currently, UDF's type coercion is not cleanly defined. See also https://github.com/apache/spark/pull/22610 and https://github.com/apache/spark/pull/22610
This JIRA targets to describe the type conversion logic internally. For instance:
{code}
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
# |SQL Type \ Pandas Type|True(bool)|1(int8)|1(int16)| 1(int32)| 1(int64)|1(uint8)|1(uint16)|1(uint32)|1(uint64)|a(object)|1970-01-01 00:00:00(datetime64[ns])|1970-01-01 00:00:00-05:00(datetime64[ns, US/Eastern])|1.0(float64)|[1 2 3](object(array))|A(category)|1 days 00:00:00(timedelta64[ns])| # noqa
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
# | boolean| True| True| True| True| True| True| True| True| True| X| False| False| False| X| X| False| # noqa
# | tinyint| 1| 1| 1| 1| 1| X| X| X| X| X| X| X| 1| X| 0| X| # noqa
# | smallint| 1| 1| 1| 1| 1| 1| X| X| X| X| X| X| 1| X| X| X| # noqa
# | int| 1| 1| 1| 1| 1| 1| 1| X| X| X| X| X| 1| X| X| X| # noqa
# | bigint| 1| 1| 1| 1| 1| 1| 1| 1| X| X| 0| 18000000000000| 1| X| X| X| # noqa
# | string| u''|u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'\x01'| u'a'| X| X| u''| X| X| X| # noqa
# | date| X| X| X|datetime.date(197...| X| X| X| X| X| X| datetime.date(197...| X| X| X| X| X| # noqa
# | timestamp| X| X| X| X|datetime.datetime...| X| X| X| X| X| datetime.datetime...| datetime.datetime...| X| X| X| X| # noqa
# | float| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
# | double| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| 1.0| X| X| X| 1.0| X| X| X| # noqa
# | array<int>| X| X| X| X| X| X| X| X| X| X| X| X| X| [1, 2, 3]| X| X| # noqa
# | binary| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | decimal(10,0)| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | map<string,int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# | struct<_1:int>| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| X| # noqa
# +----------------------+----------+-------+--------+--------------------+--------------------+--------+---------+---------+---------+---------+-----------------------------------+-----------------------------------------------------+------------+----------------------+-----------+--------------------------------+ # noqa
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org