You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by davies <gi...@git.apache.org> on 2016/06/20 20:33:00 UTC

[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...

GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/13784

    [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp should not depends on local timezone

    ## What changes were proposed in this pull request?
    
    Currently, we use local timezone to parse or format a timestamp (TimestampType), then use Long as the microseconds since epoch UTC.
    
    In from_utc_timestamp() and to_utc_timestamp(), we did not consider the local timezone, they could return different results with different local timezone.
    
    This PR will do the conversion based on human time (in local timezone), it should return same result in whatever timezone. But because the mapping from absolute timestamp to human time is not exactly one-to-one mapping, it will still return wrong result in some timezone (also in the begging or ending of DST).
    
    This PR is kind of the best effort fix. In long term, we should make the TimestampType be timezone aware to fix this totally.
    
    ## How was this patch tested?
    
    Tested these function in all timezone.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark convert_tz

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13784.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13784
    
----
commit 5c60bc6321bda90c85a26180c38be5744d68cc55
Author: Davies Liu <da...@databricks.com>
Date:   2016-06-20T19:48:35Z

    fix fromUTCTime/toUTCTime

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13784#discussion_r67786054
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ---
    @@ -869,16 +870,17 @@ case class ToUTCTimestamp(left: Expression, right: Expression)
              """.stripMargin)
           } else {
             val tzTerm = ctx.freshName("tz")
    +        val utcTerm = ctx.freshName("utc")
             val tzClass = classOf[TimeZone].getName
             ctx.addMutableState(tzClass, tzTerm, s"""$tzTerm = $tzClass.getTimeZone("$tz");""")
    +        ctx.addMutableState(tzClass, utcTerm, s"""$utcTerm = $tzClass.getTimeZone("GMT");""")
    --- End diff --
    
    UTC? Universal Time Coordinated and Greenwich Mean Time are in practice the same (GMT is a timezone, UTC is not); but lets use one for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    **[Test build #60888 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60888/consoleFull)** for PR 13784 at commit [`4bba902`](https://github.com/apache/spark/commit/4bba9020c4ea86f60a903b69aa4d95aa5705f5d8).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by davies <gi...@git.apache.org>.
Github user davies commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    cc @hvanhovell 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    LGTM - thanks! Merging to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60890/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    **[Test build #60890 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60890/consoleFull)** for PR 13784 at commit [`659c9fe`](https://github.com/apache/spark/commit/659c9fe08e2cc86ddbd6db1c33509611208eff85).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    **[Test build #60890 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60890/consoleFull)** for PR 13784 at commit [`659c9fe`](https://github.com/apache/spark/commit/659c9fe08e2cc86ddbd6db1c33509611208eff85).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13784


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...

Posted by hvanhovell <gi...@git.apache.org>.
Github user hvanhovell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13784#discussion_r67914174
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -886,23 +886,45 @@ object DateTimeUtils {
       }
     
       /**
    +   * Convert the timestamp `ts` from one timezone to another.
    +   *
    +   * TODO: Because of DST, the conversion between UTC and human time is not exactly one-to-one
    +   * mapping, the conversion here may return wrong result, we should make the timestamp
    +   * timezone-aware.
    +   */
    +  def convertTz(ts: SQLTimestamp, fromZone: TimeZone, toZone: TimeZone): SQLTimestamp = {
    +    // We always use local timezone to parse or format a timestamp
    +    val localZone = threadLocalLocalTimeZone.get()
    +    val utcTs = if (fromZone.getID == localZone.getID) {
    +      ts
    +    } else {
    +      // get the human time using local time zone, that actually is in fromZone.
    +      val localTs = ts + localZone.getOffset(ts / 1000L) * 1000L  // in fromZone
    +      localTs - getOffsetFromLocalMillis(localTs / 1000L, fromZone) * 1000L
    +    }
    +    if (toZone.getID == localZone.getID) {
    +      utcTs
    +    } else {
    +      val localTs2 = utcTs + toZone.getOffset(utcTs / 1000L) * 1000L  // in toZone
    +      // treat it as local timezone, convert to UTC (we could get the expected human time back)
    +      localTs2 - getOffsetFromLocalMillis(localTs2 / 1000L, localZone) * 1000L
    +    }
    +  }
    +
    +  /**
        * Returns a timestamp of given timezone from utc timestamp, with the same string
        * representation in their timezone.
        */
       def fromUTCTime(time: SQLTimestamp, timeZone: String): SQLTimestamp = {
    -    val tz = TimeZone.getTimeZone(timeZone)
    -    val offset = tz.getOffset(time / 1000L)
    -    time + offset * 1000L
    +    convertTz(time, TimeZoneGMT, TimeZone.getTimeZone(timeZone))
    --- End diff --
    
    Lets try to make it correct first. More optimizations are always possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    **[Test build #60869 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60869/consoleFull)** for PR 13784 at commit [`5c60bc6`](https://github.com/apache/spark/commit/5c60bc6321bda90c85a26180c38be5744d68cc55).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_tim...

Posted by adrian-wang <gi...@git.apache.org>.
Github user adrian-wang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/13784#discussion_r67795207
  
    --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ---
    @@ -886,23 +886,45 @@ object DateTimeUtils {
       }
     
       /**
    +   * Convert the timestamp `ts` from one timezone to another.
    +   *
    +   * TODO: Because of DST, the conversion between UTC and human time is not exactly one-to-one
    +   * mapping, the conversion here may return wrong result, we should make the timestamp
    +   * timezone-aware.
    +   */
    +  def convertTz(ts: SQLTimestamp, fromZone: TimeZone, toZone: TimeZone): SQLTimestamp = {
    +    // We always use local timezone to parse or format a timestamp
    +    val localZone = threadLocalLocalTimeZone.get()
    +    val utcTs = if (fromZone.getID == localZone.getID) {
    +      ts
    +    } else {
    +      // get the human time using local time zone, that actually is in fromZone.
    +      val localTs = ts + localZone.getOffset(ts / 1000L) * 1000L  // in fromZone
    +      localTs - getOffsetFromLocalMillis(localTs / 1000L, fromZone) * 1000L
    +    }
    +    if (toZone.getID == localZone.getID) {
    +      utcTs
    +    } else {
    +      val localTs2 = utcTs + toZone.getOffset(utcTs / 1000L) * 1000L  // in toZone
    +      // treat it as local timezone, convert to UTC (we could get the expected human time back)
    +      localTs2 - getOffsetFromLocalMillis(localTs2 / 1000L, localZone) * 1000L
    +    }
    +  }
    +
    +  /**
        * Returns a timestamp of given timezone from utc timestamp, with the same string
        * representation in their timezone.
        */
       def fromUTCTime(time: SQLTimestamp, timeZone: String): SQLTimestamp = {
    -    val tz = TimeZone.getTimeZone(timeZone)
    -    val offset = tz.getOffset(time / 1000L)
    -    time + offset * 1000L
    +    convertTz(time, TimeZoneGMT, TimeZone.getTimeZone(timeZone))
    --- End diff --
    
    For `fromUTCTime`, this would result in a little bit overhead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    **[Test build #60869 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60869/consoleFull)** for PR 13784 at commit [`5c60bc6`](https://github.com/apache/spark/commit/5c60bc6321bda90c85a26180c38be5744d68cc55).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60869/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    **[Test build #60888 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/60888/consoleFull)** for PR 13784 at commit [`4bba902`](https://github.com/apache/spark/commit/4bba9020c4ea86f60a903b69aa4d95aa5705f5d8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #13784: [SPARK-16078] [SQL] from_utc_timestamp/to_utc_timestamp ...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/13784
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/60888/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org