You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "JinxinTang (Jira)" <ji...@apache.org> on 2020/05/01 07:00:00 UTC

[jira] [Commented] (SPARK-31598) LegacySimpleTimestampFormatter incorrectly interprets pre-Gregorian timestamps

    [ https://issues.apache.org/jira/browse/SPARK-31598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097202#comment-17097202 ] 

JinxinTang commented on SPARK-31598:
------------------------------------

already fix: [#anchor]https://issues.apache.org/jira/browse/SPARK-31557

> LegacySimpleTimestampFormatter incorrectly interprets pre-Gregorian timestamps
> ------------------------------------------------------------------------------
>
>                 Key: SPARK-31598
>                 URL: https://issues.apache.org/jira/browse/SPARK-31598
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.0.0, 3.1.0
>            Reporter: Bruce Robbins
>            Priority: Major
>
> As per discussion with [~maxgekk]:
> {{LegacySimpleTimestampFormatter#parse}} misinterprets pre-Gregorian timestamps:
> {noformat}
> scala> sql("set spark.sql.legacy.timeParserPolicy=LEGACY")
> res0: org.apache.spark.sql.DataFrame = [key: string, value: string]
> scala> val df1 = Seq("0002-01-01 00:00:00", "1000-01-01 00:00:00", "1800-01-01 00:00:00").toDF("expected")
> df1: org.apache.spark.sql.DataFrame = [expected: string]
> scala> val df2 = df1.select('expected, to_timestamp('expected, "yyyy-MM-dd HH:mm:ss").as("actual"))
> df2: org.apache.spark.sql.DataFrame = [expected: string, actual: timestamp]
> scala> df2.show(truncate=false)
> +-------------------+-------------------+
> |expected           |actual             |
> +-------------------+-------------------+
> |0002-01-01 00:00:00|0001-12-30 00:00:00|
> |1000-01-01 00:00:00|1000-01-06 00:00:00|
> |1800-01-01 00:00:00|1800-01-01 00:00:00|
> +-------------------+-------------------+
> scala> 
> {noformat}
> Legacy timestamp parsing with JSON and CSV files is correct, so apparently {{LegacyFastTimestampFormatter}} does not have this issue (need to double check).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org