You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Maxim Gekk (Jira)" <ji...@apache.org> on 2021/02/06 19:32:00 UTC

[jira] [Commented] (SPARK-34386) "Proleptic" date off by 10 days when returned by .collectAsList

    [ https://issues.apache.org/jira/browse/SPARK-34386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280268#comment-17280268 ] 

Maxim Gekk commented on SPARK-34386:
------------------------------------

[~bysza] Thanks for the ping. This is expected behavior, actually. The collectAsList() method converts internal timestamp values (in the Proleptic Gregorian calendar) to java.sql.Timestamp which is based on the hybrid calendar (Julian + Proleptic Gregorian calendars). The timestamp from your example doesn't exist in the hybrid calendar, so, Spark shifts it to the closest valid date which is 1582-10-15. If you want to receive timestamps AS IS  from collectAsList(), please, switch to Java 8 types via *spark.sql.datetime.java8API.enabled*.

> "Proleptic" date off by 10 days when returned by .collectAsList
> ---------------------------------------------------------------
>
>                 Key: SPARK-34386
>                 URL: https://issues.apache.org/jira/browse/SPARK-34386
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.1
>         Environment: Windows 10
>            Reporter: Marek Byszewski
>            Priority: Major
>
> Run the following commands using Spark 3.0.1:
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as data_console").show(false)}}
> {{+-----------------------+}}
> {{|data_console           |}}
> {{+-----------------------+}}
> {{|*1582-10-05 02:12:34.997*|}}
> {{+-----------------------+}}
> {{scala> spark.sql("select to_timestamp('1582-10-05 02:12:34.997') as data_console")}}
> {{res3: org.apache.spark.sql.DataFrame = [data_console: timestamp]}}
> {{scala> res3.collectAsList}}
> {{res4: java.util.List[org.apache.spark.sql.Row] = [[*1582-10-{color:#FF0000}15{color} 02:12:34.997*]]}}
> Notice that the returned date is off by 10 days compared to the date returned by the first command.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org