You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Davies Liu (JIRA)" <ji...@apache.org> on 2016/04/28 01:05:13 UTC

[jira] [Closed] (SPARK-13323) Type cast support in type inference during merging types.

     [ https://issues.apache.org/jira/browse/SPARK-13323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Davies Liu closed SPARK-13323.
------------------------------
    Resolution: Not A Problem

> Type cast support in type inference during merging types.
> ---------------------------------------------------------
>
>                 Key: SPARK-13323
>                 URL: https://issues.apache.org/jira/browse/SPARK-13323
>             Project: Spark
>          Issue Type: Improvement
>          Components: PySpark
>    Affects Versions: 2.0.0
>            Reporter: Hyukjin Kwon
>
> As described in {{types.py}}, there is a todo {{TODO: type cast (such as int -> long)}}.
> Currently, PySpark infers types but does not try to find compatible types when the given types are different during merging schemas.
> I think this can be done by resembling {{HiveTypeCoercion.findTightestCommonTypeOfTwo}} for numbers and when one of both is compared to {{StingType}}, just convert them into string.
> It looks the possible leaf data types are below:
> {code}
> # Mapping Python types to Spark SQL DataType
> _type_mappings = {
>     type(None): NullType,
>     bool: BooleanType,
>     int: LongType,
>     float: DoubleType,
>     str: StringType,
>     bytearray: BinaryType,
>     decimal.Decimal: DecimalType,
>     datetime.date: DateType,
>     datetime.datetime: TimestampType,
>     datetime.time: TimestampType,
> }
> {code}
> and they are converted pretty well to string as below:
> {code}
> >>> print str(None)
> None
> >>> print str(True)
> True
> >>> print str(float(0.1))
> 0.1
> >>> str(bytearray([255]))
> '\xff'
> >>> str(decimal.Decimal())
> '0'
> >>> str(datetime.date(1,1,1))
> '0001-01-01'
> >>> str(datetime.datetime(1,1,1))
> '0001-01-01 00:00:00'
> >>> str(datetime.time(1,1,1))
> '01:01:01'
> {code}
> First, I tried to find the relevant issue with this but I couldn't. Please mark this as a duplicate if there is already.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org