You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2020/06/16 05:01:29 UTC

[spark] branch branch-3.0 updated: [SPARK-31959][SQL][3.0] Fix Gregorian-Julian micros rebasing while switching standard time zone offset

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new c69bbf4  [SPARK-31959][SQL][3.0] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
c69bbf4 is described below

commit c69bbf4652f8ded5108826f38ae2316f31eb10e7
Author: Max Gekk <ma...@gmail.com>
AuthorDate: Tue Jun 16 04:56:30 2020 +0000

    [SPARK-31959][SQL][3.0] Fix Gregorian-Julian micros rebasing while switching standard time zone offset
    
    ### What changes were proposed in this pull request?
    Fix the bug in microseconds rebasing during transitions from one standard time zone offset to another one. In the PR, I propose to change the implementation of `rebaseGregorianToJulianMicros` which performs rebasing via local timestamps. In the case of overlapping:
    1. Check that the original instant belongs to earlier or later instant of overlapped local timestamp.
    2. If it is an earlier instant, take zone and DST offsets from the previous day otherwise
    3. Set time zone offsets to Julian timestamp from the next day.
    
    Note: The fix assumes that transitions cannot happen more often than once per 2 days.
    
    Adopt the test "SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945" to outdated tzdb. Old JDK can have outdated time zone database in which Asia/Hong_Kong doesn't have timestamp overlapping in 1945 at all.
    
    ### Why are the changes needed?
    1. Current implementation handles timestamps overlapping only during daylight saving time but overlapping can happen also during transition from one standard time zone to another one. For example in the case of `Asia/Hong_Kong`, the time zone switched from `Japan Standard Time` (UTC+9) to `Hong Kong Time` (UTC+8) on _Sunday, 18 November, 1945 01:59:59 AM_. The changes allow to handle the special case as well.
    2. To fix the test failures on old JDK w/ outdated tzdb like on Jenkins machine `research-jenkins-worker-09`.
    
    ### Does this PR introduce _any_ user-facing change?
    It might affect micros rebasing in before common era when not-optimised version of `rebaseGregorianToJulianMicros()` is used directly.
    
    ### How was this patch tested?
    1. By existing tests in `DateTimeUtilsSuite`, `RebaseDateTimeSuite`, `DateFunctionsSuite`, `DateExpressionsSuite` and `TimestampFormatterSuite`.
    2. Added new test to `RebaseDateTimeSuite`
    3. Regenerated `gregorian-julian-rebase-micros.json` with the step of 30 minutes, and got the same JSON file. The JSON file isn't affected because previously it was generated with the step of 1 week. And the spike in diffs/switch points during 1 hour of timestamp overlapping wasn't detected.
    
    Authored-by: Max Gekk <max.gekkgmail.com>
    Signed-off-by: Wenchen Fan <wenchendatabricks.com>
    (cherry picked from commit c259844)
    Signed-off-by: Dongjoon Hyun <dongjoonapache.org>
    (cherry picked from commit eae1747)
    Signed-off-by: Max Gekk <max.gekkgmail.com>
    
    Closes #28809 from MaxGekk/HongKong-tz-1945-3.0.
    
    Authored-by: Max Gekk <ma...@gmail.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 .../spark/sql/catalyst/util/RebaseDateTime.scala   | 26 +++++++++++----
 .../sql/catalyst/util/RebaseDateTimeSuite.scala    | 38 ++++++++++++++++++++++
 2 files changed, 58 insertions(+), 6 deletions(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
index e29fa4b..92d76c8 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
@@ -326,20 +326,34 @@ object RebaseDateTime {
    */
   private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = {
     val instant = microsToInstant(micros)
-    var ldt = instant.atZone(zoneId).toLocalDateTime
+    val zonedDateTime = instant.atZone(zoneId)
+    var ldt = zonedDateTime.toLocalDateTime
     if (ldt.isAfter(julianEndTs) && ldt.isBefore(gregorianStartTs)) {
       ldt = LocalDateTime.of(gregorianStartDate, ldt.toLocalTime)
     }
     val cal = new Calendar.Builder()
-      // `gregory` is a hybrid calendar that supports both
-      // the Julian and Gregorian calendar systems
+      // `gregory` is a hybrid calendar that supports both the Julian and Gregorian calendar systems
       .setCalendarType("gregory")
       .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth)
       .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond)
-      // Local time-line can overlaps, such as at an autumn daylight savings cutover.
-      // This setting selects the original local timestamp mapped to the given `micros`.
-      .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt)
       .build()
+    // A local timestamp can have 2 instants in the cases of switching from:
+    //  1. Summer to winter time.
+    //  2. One standard time zone to another one. For example, Asia/Hong_Kong switched from JST
+    //     to HKT on 18 November, 1945 01:59:59 AM.
+    // Below we check that the original `instant` is earlier or later instant. If it is an earlier
+    // instant, we take the standard and DST offsets of the previous day otherwise of the next one.
+    val trans = zoneId.getRules.getTransition(ldt)
+    if (trans != null && trans.isOverlap) {
+      val cloned = cal.clone().asInstanceOf[Calendar]
+      // Does the current offset belong to the offset before the transition.
+      // If so, we will take zone offsets from the previous day otherwise from the next day.
+      // This assumes that transitions cannot happen often than once per 2 days.
+      val shift = if (trans.getOffsetBefore == zonedDateTime.getOffset) -1 else 1
+      cloned.add(Calendar.DAY_OF_MONTH, shift)
+      cal.set(Calendar.ZONE_OFFSET, cloned.get(Calendar.ZONE_OFFSET))
+      cal.set(Calendar.DST_OFFSET, cloned.get(Calendar.DST_OFFSET))
+    }
     fromMillis(cal.getTimeInMillis) + ldt.get(ChronoField.MICRO_OF_SECOND)
   }
 
diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
index 0111fa0..6ecdb05 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/RebaseDateTimeSuite.scala
@@ -409,4 +409,42 @@ class RebaseDateTimeSuite extends SparkFunSuite with Matchers with SQLHelper {
       }
     }
   }
+
+  test("SPARK-31959: JST -> HKT at Asia/Hong_Kong in 1945") {
+    // The 'Asia/Hong_Kong' time zone switched from 'Japan Standard Time' (JST = UTC+9)
+    // to 'Hong Kong Time' (HKT = UTC+8). After Sunday, 18 November, 1945 01:59:59 AM,
+    // clocks were moved backward to become Sunday, 18 November, 1945 01:00:00 AM.
+    // In this way, the overlap happened w/o Daylight Saving Time.
+    val hkZid = getZoneId("Asia/Hong_Kong")
+    withDefaultTimeZone(hkZid) {
+      var expected = "1945-11-18 01:30:00.0"
+      var ldt = LocalDateTime.of(1945, 11, 18, 1, 30, 0)
+      var earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant)
+      var laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant)
+      var overlapInterval = MICROS_PER_HOUR
+      if (earlierMicros + overlapInterval != laterMicros) {
+        // Old JDK might have an outdated time zone database.
+        // See https://bugs.openjdk.java.net/browse/JDK-8228469: "Hong Kong ... Its 1945 transition
+        // from JST to HKT was on 11-18 at 02:00, not 09-15 at 00:00"
+        expected = "1945-09-14 23:30:00.0"
+        ldt = LocalDateTime.of(1945, 9, 14, 23, 30, 0)
+        earlierMicros = instantToMicros(ldt.atZone(hkZid).withEarlierOffsetAtOverlap().toInstant)
+        laterMicros = instantToMicros(ldt.atZone(hkZid).withLaterOffsetAtOverlap().toInstant)
+        // If time zone db doesn't have overlapping at all, set the overlap interval to zero.
+        overlapInterval = laterMicros - earlierMicros
+      }
+      val rebasedEarlierMicros = rebaseGregorianToJulianMicros(hkZid, earlierMicros)
+      val rebasedLaterMicros = rebaseGregorianToJulianMicros(hkZid, laterMicros)
+      def toTsStr(micros: Long): String = toJavaTimestamp(micros).toString
+      assert(toTsStr(rebasedEarlierMicros) === expected)
+      assert(toTsStr(rebasedLaterMicros) === expected)
+      assert(rebasedEarlierMicros + overlapInterval === rebasedLaterMicros)
+      // Check optimized rebasing
+      assert(rebaseGregorianToJulianMicros(earlierMicros) === rebasedEarlierMicros)
+      assert(rebaseGregorianToJulianMicros(laterMicros) === rebasedLaterMicros)
+      // Check reverse rebasing
+      assert(rebaseJulianToGregorianMicros(rebasedEarlierMicros) === earlierMicros)
+      assert(rebaseJulianToGregorianMicros(rebasedLaterMicros) === laterMicros)
+    }
+  }
 }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org