You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2020/04/10 03:53:11 UTC

[spark] branch branch-3.0 updated: [SPARK-31359][DOC][FOLLOWUP] improve code comments in RebaseDateTime

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new d2ba5b2  [SPARK-31359][DOC][FOLLOWUP] improve code comments in RebaseDateTime
d2ba5b2 is described below

commit d2ba5b2e6962b0069e28243639215ee6f5ee50a3
Author: Wenchen Fan <we...@databricks.com>
AuthorDate: Fri Apr 10 03:43:32 2020 +0000

    [SPARK-31359][DOC][FOLLOWUP] improve code comments in RebaseDateTime
    
    improve the code comment and make them consistent between `rebaseJulianToGregorian*` and `rebaseGregorianToJulian*`
    
    improve readability.
    
    no
    
    N/A
    
    Closes #28166 from cloud-fan/comment.
    
    Authored-by: Wenchen Fan <we...@databricks.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
    (cherry picked from commit 148950fa2ba7ad03d7536c8c2201ba18b9345c0e)
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 .../spark/sql/catalyst/util/RebaseDateTime.scala   | 176 ++++++++++++---------
 1 file changed, 98 insertions(+), 78 deletions(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
index 793eb17..7440787 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/RebaseDateTime.scala
@@ -32,20 +32,20 @@ import org.apache.spark.sql.catalyst.util.DateTimeUtils._
 /**
  * The collection of functions for rebasing days and microseconds from/to the hybrid calendar
  * (Julian + Gregorian since 1582-10-15) which is used by Spark 2.4 and earlier versions
- * to/from Proleptic Gregorian calendar used in Spark SQL since version 3.0, see SPARK-26651.
+ * to/from Proleptic Gregorian calendar which is used by Spark since version 3.0. See SPARK-26651.
  */
 object RebaseDateTime {
   /**
    * Rebases days since the epoch from an original to an target calendar, for instance,
    * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
    *
-   * It finds the latest switch day which is less than `value`, and adds the difference
-   * in days associated with the switch days to the given `value`.
+   * It finds the latest switch day which is less than the given `days`, and adds the difference
+   * in days associated with the switch days to the given `days`.
    * The function is based on linear search which starts from the most recent switch days.
    * This allows to perform less comparisons for modern dates.
    *
-   * @param switches The days when difference in days between original
-   *                 and target calendar was changed.
+   * @param switches The days when difference in days between original and target calendar
+   *                 was changed.
    * @param diffs The differences in days between calendars.
    * @param days The number of days since the epoch 1970-01-01 to be rebased
    *             to the target calendar.
@@ -71,12 +71,20 @@ object RebaseDateTime {
     -390745, -354220, -317695, -244645, -208120, -171595, -141427)
 
   /**
-   * Converts the given number of days since the epoch day 1970-01-01 to
-   * a local date in Julian calendar, interprets the result as a local
-   * date in Proleptic Gregorian calendar, and take the number of days
-   * since the epoch from the Gregorian date.
+   * Converts the given number of days since the epoch day 1970-01-01 to a local date in Julian
+   * calendar, interprets the result as a local date in Proleptic Gregorian calendar, and takes the
+   * number of days since the epoch from the Gregorian local date.
    *
-   * @param days The number of days since the epoch in Julian calendar.
+   * This is used to guarantee backward compatibility, as Spark 2.4 and earlier versions use
+   * Julian calendar for dates before 1582-10-15, while Spark 3.0 and later use Proleptic Gregorian
+   * calendar. See SPARK-26651.
+   *
+   * For example:
+   *   Julian calendar: 1582-01-01 -> -141704
+   *   Proleptic Gregorian calendar: 1582-01-01 -> -141714
+   * The code below converts -141704 to -141714.
+   *
+   * @param days The number of days since the epoch in Julian calendar. It can be negative.
    * @return The rebased number of days in Gregorian calendar.
    */
   def rebaseJulianToGregorianDays(days: Int): Int = {
@@ -97,51 +105,26 @@ object RebaseDateTime {
     -390750, -354226, -317702, -244653, -208129, -171605, -141427)
 
   /**
-   * Rebasing days since the epoch to store the same number of days
-   * as by Spark 2.4 and earlier versions. Spark 3.0 switched to
-   * Proleptic Gregorian calendar (see SPARK-26651), and as a consequence of that,
-   * this affects dates before 1582-10-15. Spark 2.4 and earlier versions use
-   * Julian calendar for dates before 1582-10-15. So, the same local date may
-   * be mapped to different number of days since the epoch in different calendars.
+   * Converts the given number of days since the epoch day 1970-01-01 to a local date in Proleptic
+   * Gregorian calendar, interprets the result as a local date in Julian calendar, and takes the
+   * number of days since the epoch from the Julian local date.
+   *
+   * This is used to guarantee backward compatibility, as Spark 2.4 and earlier versions use
+   * Julian calendar for dates before 1582-10-15, while Spark 3.0 and later use Proleptic Gregorian
+   * calendar. See SPARK-26651.
    *
    * For example:
    *   Proleptic Gregorian calendar: 1582-01-01 -> -141714
    *   Julian calendar: 1582-01-01 -> -141704
    * The code below converts -141714 to -141704.
    *
-   * @param days The number of days since the epoch 1970-01-01. It can be negative.
+   * @param days The number of days since the epoch in Gregorian calendar. It can be negative.
    * @return The rebased number of days since the epoch in Julian calendar.
    */
   def rebaseGregorianToJulianDays(days: Int): Int = {
     rebaseDays(gregJulianDiffSwitchDay, gregJulianDiffs, days)
   }
 
-  /**
-   * Converts the given microseconds to a local date-time in UTC time zone in Proleptic Gregorian
-   * calendar, interprets the result as a local date-time in Julian calendar in UTC time zone.
-   * And takes microseconds since the epoch from the Julian timestamp.
-   *
-   * @param zoneId The time zone ID at which the rebasing should be performed.
-   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
-   *               in Proleptic Gregorian calendar. It can be negative.
-   * @return The rebased microseconds since the epoch in Julian calendar.
-   */
-  private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = {
-    val instant = microsToInstant(micros)
-    val ldt = instant.atZone(zoneId).toLocalDateTime
-    val cal = new Calendar.Builder()
-      // `gregory` is a hybrid calendar that supports both
-      // the Julian and Gregorian calendar systems
-      .setCalendarType("gregory")
-      .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth)
-      .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond)
-      // Local time-line can overlaps, such as at an autumn daylight savings cutover.
-      // This setting selects the original local timestamp mapped to the given `micros`.
-      .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt)
-      .build()
-    fromMillis(cal.getTimeInMillis) + ldt.get(ChronoField.MICRO_OF_SECOND)
-  }
-
 
   /**
    * The class describes JSON records with microseconds rebasing info.
@@ -176,8 +159,8 @@ object RebaseDateTime {
    * Rebases micros since the epoch from an original to an target calendar, for instance,
    * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
    *
-   * It finds the latest switch micros which is less than `value`, and adds the difference
-   * in micros associated with the switch micros to the given `value`.
+   * It finds the latest switch micros which is less than the given `micros`, and adds the
+   * difference in micros associated with the switch micros to the given `micros`.
    * The function is based on linear search which starts from the most recent switch micros.
    * This allows to perform less comparisons for modern timestamps.
    *
@@ -227,22 +210,53 @@ object RebaseDateTime {
   private val gregJulianRebaseMap = loadRebaseRecords("gregorian-julian-rebase-micros.json")
 
   /**
-   * Rebases the given `micros` to the number of microseconds since the epoch via a local
-   * timestamp that have the same date-time fields in Proleptic Gregorian and in the hybrid
-   * calendars.
+   * Converts the given number of microseconds since the epoch '1970-01-01T00:00:00Z', to a local
+   * date-time in Proleptic Gregorian calendar with timezone identified by `zoneId`, interprets the
+   * result as a local date-time in Julian calendar with the same timezone, and takes microseconds
+   * since the epoch from the Julian local date-time.
+   *
+   * This is used to guarantee backward compatibility, as Spark 2.4 and earlier versions use
+   * Julian calendar for dates before 1582-10-15, while Spark 3.0 and later use Proleptic Gregorian
+   * calendar. See SPARK-26651.
    *
-   * The function may optimize the rebasing by using pre-calculated rebasing maps. If the maps
-   * don't contains information about the current JVM system time zone, the functions falls back
-   * to regular conversion mechanism via building local timestamps.
+   * For example:
+   *   Proleptic Gregorian calendar: 1582-01-01 00:00:00.123456 -> -12244061221876544
+   *   Julian calendar: 1582-01-01 00:00:00.123456 -> -12243196799876544
+   * The code below converts -12244061221876544 to -12243196799876544.
+   *
+   * @param zoneId The time zone ID at which the rebasing should be performed.
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in Proleptic Gregorian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Julian calendar.
+   */
+  private[sql] def rebaseGregorianToJulianMicros(zoneId: ZoneId, micros: Long): Long = {
+    val instant = microsToInstant(micros)
+    val ldt = instant.atZone(zoneId).toLocalDateTime
+    val cal = new Calendar.Builder()
+      // `gregory` is a hybrid calendar that supports both
+      // the Julian and Gregorian calendar systems
+      .setCalendarType("gregory")
+      .setDate(ldt.getYear, ldt.getMonthValue - 1, ldt.getDayOfMonth)
+      .setTimeOfDay(ldt.getHour, ldt.getMinute, ldt.getSecond)
+      // Local time-line can overlaps, such as at an autumn daylight savings cutover.
+      // This setting selects the original local timestamp mapped to the given `micros`.
+      .set(Calendar.DST_OFFSET, zoneId.getRules.getDaylightSavings(instant).toMillis.toInt)
+      .build()
+    fromMillis(cal.getTimeInMillis) + ldt.get(ChronoField.MICRO_OF_SECOND)
+  }
+
+  /**
+   * An optimized version of [[rebaseGregorianToJulianMicros(ZoneId, Long)]]. This method leverages
+   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
+   * information about the current JVM system time zone, the function falls back to the regular
+   * unoptimized version.
    *
    * Note: The function assumes that the input micros was derived from a local timestamp
    *       at the default system JVM time zone in Proleptic Gregorian calendar.
    *
-   * @param micros The microseconds since the epoch 1970-01-01T00:00:00Z
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
    *               in Proleptic Gregorian calendar. It can be negative.
-   * @return The microseconds since the epoch that have the same local timestamp representation
-   *         in the hybrid calendar (Julian + Gregorian) as the original `micros` in
-   *         Proleptic Gregorian calendar.
+   * @return The rebased microseconds since the epoch in Julian calendar.
    */
   def rebaseGregorianToJulianMicros(micros: Long): Long = {
     val timeZone = TimeZone.getDefault
@@ -256,14 +270,23 @@ object RebaseDateTime {
   }
 
   /**
-   * Converts the given microseconds to a local date-time in UTC time zone in the hybrid
-   * calendar (Julian + Gregorian since 1582-10-15), interprets the result as a local date-time
-   * in Proleptic Gregorian calendar in UTC time zone. And takes microseconds since the epoch
-   * from the Gregorian timestamp.
+   * Converts the given number of microseconds since the epoch '1970-01-01T00:00:00Z', to a local
+   * date-time in Julian calendar with timezone identified by `zoneId`, interprets the result as a
+   * local date-time in Proleptic Gregorian calendar with the same timezone, and takes microseconds
+   * since the epoch from the Gregorian local date-time.
+   *
+   * This is used to guarantee backward compatibility, as Spark 2.4 and earlier versions use
+   * Julian calendar for dates before 1582-10-15, while Spark 3.0 and later use Proleptic Gregorian
+   * calendar. See SPARK-26651.
+   *
+   * For example:
+   *   Julian calendar: 1582-01-01 00:00:00.123456 -> -12243196799876544
+   *   Proleptic Gregorian calendar: 1582-01-01 00:00:00.123456 -> -12244061221876544
+   * The code below converts -12243196799876544 to -12244061221876544.
    *
    * @param zoneId The time zone ID at which the rebasing should be performed.
    * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
-   *               in the hybrid calendar (Julian + Gregorian). It can be negative.
+   *               in the Julian calendar. It can be negative.
    * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
   private[sql] def rebaseJulianToGregorianMicros(zoneId: ZoneId, micros: Long): Long = {
@@ -287,9 +310,13 @@ object RebaseDateTime {
       (Math.floorMod(micros, MICROS_PER_SECOND) * NANOS_PER_MICROS).toInt)
       .plusDays(cal.get(Calendar.DAY_OF_MONTH) - 1)
     val zonedDateTime = localDateTime.atZone(zoneId)
-    // Zero DST offset means that local clocks have switched to the winter time already.
-    // So, clocks go back one hour. We should correct zoned date-time and change
-    // the zone offset to the later of the two valid offsets at a local time-line overlap.
+    // Assuming the daylight saving switchover time is 2:00, the local clock will go back to
+    // 2:00 after hitting 2:59. This means the local time between [2:00, 3:00) appears twice, and
+    // can map to two different physical times (seconds from the UTC epoch).
+    // Java 8 time API resolves the ambiguity by picking the earlier physical time. This is the same
+    // as Java 7 time API, except for 2:00 where Java 7 picks the later physical time.
+    // Here we detect the "2:00" case and pick the latter physical time, to be compatible with the
+    // Java 7 date-time.
     val adjustedZdt = if (cal.get(Calendar.DST_OFFSET) == 0) {
       zonedDateTime.withLaterOffsetAtOverlap()
     } else {
@@ -306,24 +333,17 @@ object RebaseDateTime {
   private val julianGregRebaseMap = loadRebaseRecords("julian-gregorian-rebase-micros.json")
 
   /**
-   * This is an opposite to `rebaseGregorianToJulianMicros` function which rebases the given
-   * microseconds since the epoch 1970-01-01T00:00:00Z via local timestamps from the
-   * hybrid calendar (Julian + Gregorian) to Proleptic Gregorian calendar.
-   * For example, the input `micros` -12243196799876544 is mapped to 1582-01-01 00:00:00.123456 in
-   * Julian calendar. The same local timestamp in Proleptic Gregorian calendar is mapped to
-   * the different number of micros -12244061221876544 since the epoch. Semantically, the function
-   * performs such conversion either via direct conversion to local timestamps,
-   * or via pre-calculated rebasing tables.
+   * An optimized version of [[rebaseJulianToGregorianMicros(ZoneId, Long)]]. This method leverages
+   * the pre-calculated rebasing maps to save calculation. If the rebasing map doesn't contain
+   * information about the current JVM system time zone, the function falls back to the regular
+   * unoptimized version.
    *
    * Note: The function assumes that the input micros was derived from a local timestamp
-   *       at the default system JVM time zone in the hybrid calendar (Julian and Gregorian
-   *       since 1582-10-15)
+   *       at the default system JVM time zone in the Julian calendar.
    *
-   * @param micros The number of microseconds since the epoch 1970-01-01T00:00:00Z
-   *               in the hybrid calendar (Julian + Gregorian). It can be negative.
-   * @return The rebased number of microseconds since the epoch which is mapped to the same
-   *         local timestamp in Proleptic Gregorian calendar as `micros` in the hybrid calendar
-   *         at the system time zone.
+   * @param micros The number of microseconds since the epoch '1970-01-01T00:00:00Z'
+   *               in the Julian calendar. It can be negative.
+   * @return The rebased microseconds since the epoch in Proleptic Gregorian calendar.
    */
   def rebaseJulianToGregorianMicros(micros: Long): Long = {
     val timeZone = TimeZone.getDefault


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org