You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Yin Huai (JIRA)" <ji...@apache.org> on 2015/09/09 20:04:46 UTC

[jira] [Updated] (SPARK-10519) Investigate if we should encode timezone information to a timestamp value stored in JSON

     [ https://issues.apache.org/jira/browse/SPARK-10519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Yin Huai updated SPARK-10519:
-----------------------------
    Target Version/s: 1.6.0

> Investigate if we should encode timezone information to a timestamp value stored in JSON
> ----------------------------------------------------------------------------------------
>
>                 Key: SPARK-10519
>                 URL: https://issues.apache.org/jira/browse/SPARK-10519
>             Project: Spark
>          Issue Type: Task
>          Components: SQL
>            Reporter: Yin Huai
>            Priority: Minor
>
> Since Spark 1.3, we store a timestamp in JSON without encoding the timezone information and the string representation of a timestamp stored in JSON implicitly using the local timezone (see [1|https://github.com/apache/spark/blob/branch-1.3/sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala#L454], [2|https://github.com/apache/spark/blob/branch-1.4/sql/core/src/main/scala/org/apache/spark/sql/json/JacksonGenerator.scala#L38], [3|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L41], [4|https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/json/JacksonGenerator.scala#L93]). This behavior may cause the data consumers got different values when they are in a different timezone with the data producers.
> Since JSON is string based, if we encode timezone information to timestamp value, downstream applications may need to change their code (for example, java.sql.Timestamp.valueOf only supports the format of {{yyyy-\[m]m-\[d]d hh:mm:ss\[.f...]}}).
> We should investigate what we should do about this issue. Right now, I can think of three options:
> 1. Encoding timezone info in the timestamp value, which can break user code and may change the semantic of timestamp (our timestamp value is timezone-less).
> 2. When saving a timestamp value to json, we treat this value as a value in the local timezone and convert it to UTC time. Then, when save the data, we do not encode timezone info in the value.
> 3. We do not change our current behavior. But, in our doc, we explicitly say that users need to use a single timezone for their datasets (e.g. always use UTC time). 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org