You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "holdenk (JIRA)" <ji...@apache.org> on 2018/09/20 01:35:00 UTC

[jira] [Commented] (SPARK-25467) Python date/datetime objects in dataframes increment by 1 day when converted to JSON

    [ https://issues.apache.org/jira/browse/SPARK-25467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16621391#comment-16621391 ] 

holdenk commented on SPARK-25467:
---------------------------------

cc [~bryanc]

> Python date/datetime objects in dataframes increment by 1 day when converted to JSON
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-25467
>                 URL: https://issues.apache.org/jira/browse/SPARK-25467
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.3.1
>         Environment: Spark 2.3.1
> Python 3.6.5 | packaged by conda-forge | (default, Apr  6 2018, 13:39:56) 
> [GCC 4.8.2 20140120 (Red Hat 4.8.2-15)]
> openjdk version "1.8.0_181"
> OpenJDK Runtime Environment (build 1.8.0_181-b13)
> OpenJDK 64-Bit Server VM (build 25.181-b13, mixed mode)
> Centos 7 3.10.0-862.11.6.el7.x86_64 #1 SMP Tue Aug 14 21:49:04 UTC 2018 x86_64 x86_64 GNU/Linux
>            Reporter: David V. Hill
>            Priority: Major
>
> When Dataframes contains datetime.date or datetime.datetime instances and toJSON() is called on the Dataframe, the day is incremented in the JSON date representation.
> {code}
> # Create a Dataframe containing datetime.date instances, convert to JSON and display
> rows = [Row(cx=1, cy=2, dates=[datetime.date.fromordinal(1), datetime.date.fromordinal(2)])]
> df = sqc.createDataFrame(rows)
> df.collect()
> [Row(cx=1, cy=2, dates=[datetime.date(1, 1, 1), datetime.date(1, 1, 2)])]
> df.toJSON().collect()
> ['{"cx":1,"cy":2,"dates":["0001-01-03","0001-01-04"]}']
> # Issue also occurs with datetime.datetime instances
> rows = [Row(cx=1, cy=2, dates=[datetime.datetime.fromordinal(1), datetime.datetime.fromordinal(2)])]
> df = sqc.createDataFrame(rows)
> df.collect()
> [Row(cx=1, cy=2, dates=[datetime.datetime(1, 1, 1, 0, 0, fold=1), datetime.datetime(1, 1, 2, 0, 0)])]
> df.toJSON().collect()
> ['{"cx":1,"cy":2,"dates":["0001-01-02T23:50:36.000-06:00","0001-01-03T23:50:36.000-06:00"]}']
> {code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org