You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by we...@apache.org on 2020/04/30 03:17:51 UTC

[spark] branch branch-2.4 updated: [SPARK-31449][SQL][2.4] Fix getting time zone offsets from local milliseconds

This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch branch-2.4
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-2.4 by this push:
     new da3748f  [SPARK-31449][SQL][2.4] Fix getting time zone offsets from local milliseconds
da3748f is described below

commit da3748f0119e754c6bb4a5f17ea111ef74f704ff
Author: Max Gekk <ma...@gmail.com>
AuthorDate: Thu Apr 30 03:16:37 2020 +0000

    [SPARK-31449][SQL][2.4] Fix getting time zone offsets from local milliseconds
    
    ### What changes were proposed in this pull request?
    Replace current implementation of `getOffsetFromLocalMillis()` by the code from JDK https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/aa318070b27849f1fe00d14684b2a40f7b29bf79/jdk/src/share/classes/java/util/GregorianCalendar.java#L2795-L2801:
    ```java
                if (zone instanceof ZoneInfo) {
                    ((ZoneInfo)zone).getOffsetsByWall(millis, zoneOffsets);
                } else {
                    int gmtOffset = isFieldSet(fieldMask, ZONE_OFFSET) ?
                                        internalGet(ZONE_OFFSET) : zone.getRawOffset();
                    zone.getOffsets(millis - gmtOffset, zoneOffsets);
                }
    ```
    
    ### Why are the changes needed?
    Current domestic implementation of `getOffsetFromLocalMillis()` is incompatible with other date-time functions used from JDK's `GregorianCalendar` like `ZoneInfo.getOffsets`, and can return wrong results as it is demonstrated in SPARK-31449. For example, currently the function returns 1h offset but JDK function 0h:
    ```
    Europe/Paris 1916-10-01 23:50:39.0 3600000 0
    ```
    Actually, the timestamp is in a DST interval of shifting wall clocks by 1 hour back
    
    Year | Date & Time | Abbreviation | Time Change | Offset After
    -- | -- | -- | -- | --
    1916 |Tue, 14 Jun, 23:00 | WET → WEST | +1 hour (DST start) | UTC+1h
    1916 |Sun, 2 Oct, 00:00 | WEST → WET | -1 hour (DST end) | UTC
    
    And according the default JDK policy, the latest timestamp should be taken in the case of overlapping but current implementation takes the earliest one. That makes it incompatible with other JDK calls.
    
    ### Does this PR introduce any user-facing change?
    Yes, see differences in SPARK-31449.
    
    ### How was this patch tested?
    By existing test suite `DateTimeUtilsSuite`, `DateFunctionsSuite` and `DateExpressionsSuite`.
    
    Closes #28410 from MaxGekk/fix-tz-offset-by-wallclock-2.4.
    
    Authored-by: Max Gekk <ma...@gmail.com>
    Signed-off-by: Wenchen Fan <we...@databricks.com>
---
 .../spark/sql/catalyst/util/DateTimeUtils.scala    | 36 +++-------------------
 1 file changed, 4 insertions(+), 32 deletions(-)

diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index b095007..8c0b8ce 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -28,6 +28,7 @@ import javax.xml.bind.DatatypeConverter
 import scala.annotation.tailrec
 
 import org.apache.commons.lang3.time.FastDateFormat
+import sun.util.calendar.ZoneInfo
 
 import org.apache.spark.sql.types.Decimal
 import org.apache.spark.unsafe.types.UTF8String
@@ -1082,39 +1083,10 @@ object DateTimeUtils {
 
   /**
    * Lookup the offset for given millis seconds since 1970-01-01 00:00:00 in given timezone.
-   * TODO: Improve handling of normalization differences.
-   * TODO: Replace with JSR-310 or similar system - see SPARK-16788
    */
-  private[sql] def getOffsetFromLocalMillis(millisLocal: Long, tz: TimeZone): Long = {
-    var guess = tz.getRawOffset
-    // the actual offset should be calculated based on milliseconds in UTC
-    val offset = tz.getOffset(millisLocal - guess)
-    if (offset != guess) {
-      guess = tz.getOffset(millisLocal - offset)
-      if (guess != offset) {
-        // fallback to do the reverse lookup using java.sql.Timestamp
-        // this should only happen near the start or end of DST
-        val days = Math.floor(millisLocal.toDouble / MILLIS_PER_DAY).toInt
-        val year = getYear(days)
-        val month = getMonth(days)
-        val day = getDayOfMonth(days)
-
-        var millisOfDay = (millisLocal % MILLIS_PER_DAY).toInt
-        if (millisOfDay < 0) {
-          millisOfDay += MILLIS_PER_DAY.toInt
-        }
-        val seconds = (millisOfDay / 1000L).toInt
-        val hh = seconds / 3600
-        val mm = seconds / 60 % 60
-        val ss = seconds % 60
-        val ms = millisOfDay % 1000
-        val calendar = Calendar.getInstance(tz)
-        calendar.set(year, month - 1, day, hh, mm, ss)
-        calendar.set(Calendar.MILLISECOND, ms)
-        guess = (millisLocal - calendar.getTimeInMillis()).toInt
-      }
-    }
-    guess
+  private[sql] def getOffsetFromLocalMillis(millisLocal: Long, tz: TimeZone): Long = tz match {
+    case zoneInfo: ZoneInfo => zoneInfo.getOffsetsByWall(millisLocal, null)
+    case timeZone: TimeZone => timeZone.getOffset(millisLocal - timeZone.getRawOffset)
   }
 
   /**


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org