You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Reynold Xin (JIRA)" <ji...@apache.org> on 2015/05/15 00:47:02 UTC

[jira] [Updated] (SPARK-7278) Inconsistent handling of dates in PySparks Row object

     [ https://issues.apache.org/jira/browse/SPARK-7278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Reynold Xin updated SPARK-7278:
-------------------------------
    Fix Version/s: 1.3.2

> Inconsistent handling of dates in PySparks Row object
> -----------------------------------------------------
>
>                 Key: SPARK-7278
>                 URL: https://issues.apache.org/jira/browse/SPARK-7278
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 1.3.1
>            Reporter: Kalle Jepsen
>            Assignee: Kalle Jepsen
>             Fix For: 1.3.2, 1.4.0
>
>
> Consider the following Python code:
> {code:none}
> import datetime
> rdd = sc.parallelize([[0, datetime.date(2014, 11, 11)], [1, datetime.date(2015,6,4)]])
> df = rdd.toDF(schema=['rid', 'date'])
> row = df.first()
> {code}
> Accessing the {{date}} column via {{\_\_getitem\_\_}} returns a {{datetime.datetime}} instance
> {code:none}
> >>>row[1]
> datetime.datetime(2014, 11, 11, 0, 0)
> {code}
> while access via {{getattr}} returns a {{datetime.date}} instance:
> {code:none}
> >>>row.date
> datetime.date(2014, 11, 11)
> {code}
> The problem seems to be that that Java deserializes the {{datetime.date}} objects to {{datetime.datetime}}. This is taken care of [here|https://github.com/apache/spark/blob/master/python/pyspark/sql/_types.py#L1027] when using {{getattr}}, but is overlooked when directly accessing the tuple by index.
> Is there an easy way to fix this?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org