You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "Tomi Ruokola (Jira)" <ji...@apache.org> on 2020/05/19 09:47:00 UTC

[jira] [Created] (SPARK-31758) Incorrect timestamp parsing from JSON

Tomi Ruokola created SPARK-31758:
------------------------------------

             Summary: Incorrect timestamp parsing from JSON
                 Key: SPARK-31758
                 URL: https://issues.apache.org/jira/browse/SPARK-31758
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 2.4.5
            Reporter: Tomi Ruokola


Parsing a json string into a TimestampType can give incorrect results.
{code:python}
schema = StructType([StructField("timestamp", TimestampType())])
df = spark.createDataFrame([('{"timestamp":"2020-01-01T20:00:00.900125Z"}', )], ["body"])
df.select(from_json("body", schema)).collect(){code}
Output:
{code:python}
datetime.datetime(2020, 1, 1, 20, 15, 0, 125000){code}
This seems to happen when the timestamp has sub-millisecond precision and a Z suffix. For example, if the fraction is .900125, the output fraction is .125 and 900 seconds is added to the timestamp.

Workaround: Adding the timestampFormat option fixes the problem, even if the format string is not exactly correct.
{code:python}
df.select(from_json("body", schema, {"timestampFormat": "yyyy-MM-dd HH:mm:ss"})).collect()
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org