You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@spark.apache.org by do...@apache.org on 2019/04/08 16:53:21 UTC
[spark] branch master updated: [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps

This is an automated email from the ASF dual-hosted git repository.

dongjoon pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 0024173  [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps
0024173 is described below

commit 00241733a6d7761a4f6e64442319b11dd9d2b57b
Author: Maxim Gekk <ma...@gmail.com>
AuthorDate: Mon Apr 8 09:53:00 2019 -0700

    [SPARK-27405][SQL][TEST] Restrict the range of generated random timestamps
    
    ## What changes were proposed in this pull request?
    
    In the PR, I propose to restrict the range of random timestamp literals generated in `LiteralGenerator. timestampLiteralGen`. The generator creates instances of `java.sql.Timestamp` by passing milliseconds since epoch as `Long` type. Converting the milliseconds to microseconds can cause arithmetic overflow of Long type because Catalyst's Timestamp type stores microseconds since epoch in `Long` type internally as well. Proposed interval of random milliseconds is `[Long.MinValue / 1000, [...]
    
    For example, generated timestamp `new java.sql.Timestamp(-3948373668011580000)` causes `Long` overflow at the method:
    ```scala
      def fromJavaTimestamp(t: Timestamp): SQLTimestamp = {
      ...
          MILLISECONDS.toMicros(t.getTime()) + NANOSECONDS.toMicros(t.getNanos()) % NANOS_PER_MICROS
      ...
      }
    ```
    because `t.getTime()` returns `-3948373668011580000` which is multiplied by `1000` at `MILLISECONDS.toMicros`, and the result `-3948373668011580000000` is less than `Long.MinValue`.
    
    ## How was this patch tested?
    
    By `DateExpressionsSuite` in the PR https://github.com/apache/spark/pull/24311
    
    Closes #24316 from MaxGekk/random-timestamps-gen.
    
    Authored-by: Maxim Gekk <ma...@gmail.com>
    Signed-off-by: Dongjoon Hyun <dh...@apple.com>
---
 .../spark/sql/catalyst/expressions/LiteralGenerator.scala   | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
index 032aec0..be5fdb5 100644
--- a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
+++ b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/LiteralGenerator.scala
@@ -21,6 +21,7 @@ import java.sql.{Date, Timestamp}
 
 import org.scalacheck.{Arbitrary, Gen}
 
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.CalendarInterval
 
@@ -102,8 +103,16 @@ object LiteralGenerator {
   lazy val dateLiteralGen: Gen[Literal] =
     for { d <- Arbitrary.arbInt.arbitrary } yield Literal.create(new Date(d), DateType)
 
-  lazy val timestampLiteralGen: Gen[Literal] =
-    for { t <- Arbitrary.arbLong.arbitrary } yield Literal.create(new Timestamp(t), TimestampType)
+  lazy val timestampLiteralGen: Gen[Literal] = {
+    // Catalyst's Timestamp type stores number of microseconds since epoch in
+    // a variable of Long type. To prevent arithmetic overflow of Long on
+    // conversion from milliseconds to microseconds, the range of random milliseconds
+    // since epoch is restricted here.
+    val maxMillis = Long.MaxValue / DateTimeUtils.MICROS_PER_MILLIS
+    val minMillis = Long.MinValue / DateTimeUtils.MICROS_PER_MILLIS
+    for { millis <- Gen.choose(minMillis, maxMillis) }
+      yield Literal.create(new Timestamp(millis), TimestampType)
+  }
 
   lazy val calendarIntervalLiterGen: Gen[Literal] =
     for { m <- Arbitrary.arbInt.arbitrary; s <- Arbitrary.arbLong.arbitrary}


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@spark.apache.org
For additional commands, e-mail: commits-help@spark.apache.org