You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/27 21:00:29 UTC

[GitHub] [spark] MaxGekk opened a new pull request #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

MaxGekk opened a new pull request #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057
 
 
   TL&DR
   I benchmarked save/load for Parquet:
   - Saving is **~6 times slower**
   - Loading w/ vectorized **off** is **~4 times slower**
   - Loading w/ vectorized **on** is **~11 times slower**
   
   Here is the results I got on my laptop:
   ```
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
   Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
   Save timestamps to parquet:               Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   after 1582, noop                                   2265           2265           0         44.2          22.6       1.0X
   after 1582, rebase off                            13689          13689           0          7.3         136.9       0.2X
   after 1582, rebase on                             71073          71073           0          1.4         710.7       0.0X
   before 1582, noop                                  2118           2118           0         47.2          21.2       1.1X
   before 1582, rebase off                           14442          14442           0          6.9         144.4       0.2X
   before 1582, rebase on                            78824          78824           0          1.3         788.2       0.0X
   
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
   Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
   Load timestamps from parquet:             Best Time(ms)   Avg Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   ------------------------------------------------------------------------------------------------------------------------
   after 1582, vec off, rebase off                   12025          12326         390          8.3         120.2       1.0X
   after 1582, vec off, rebase on                    51214          52395        1724          2.0         512.1       0.2X
   after 1582, vec on, rebase off                     3798           3848          71         26.3          38.0       3.2X
   after 1582, vec on, rebase on                     42998          43137         138          2.3         430.0       0.3X
   before 1582, vec off, rebase off                  11692          11793         140          8.6         116.9       1.0X
   before 1582, vec off, rebase on                   52789          52973         173          1.9         527.9       0.2X
   before 1582, vec on, rebase off                    3832           3871          48         26.1          38.3       3.1X
   after 1582, vec on, rebase on                     44904          44956          66          2.2         449.0       0.3X
   ```
   
   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
     2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
     3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
     4. Be sure to keep the PR description updated to reflect all changes.
     5. Please write your PR title to summarize what this PR proposes.
     6. If possible, provide a concise example to reproduce the issue for a faster review.
     7. If you want to add a new configuration, please read the guideline first for naming configurations in
        'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
   -->
   
   ### What changes were proposed in this pull request?
   <!--
   Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue. 
   If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
     1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
     2. If you fix some SQL features, you can provide some references of other DBMSes.
     3. If there is design documentation, please add the link.
     4. If there is a discussion in the mailing list, please add the link.
   -->
   
   
   ### Why are the changes needed?
   <!--
   Please clarify why the changes are needed. For instance,
     1. If you propose a new API, clarify the use case for a new API.
     2. If you fix a bug, you can clarify why it is a bug.
   -->
   
   
   ### Does this PR introduce any user-facing change?
   <!--
   If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
   If no, write 'No'.
   -->
   
   
   ### How was this patch tested?
   <!--
   If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
   If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
   If tests were not added, please describe why they were not added and/or why it was difficult to add.
   -->
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973641
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120580/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602801
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25258/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602799
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399910407
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
+        val rowsNum = 100000000
+        Seq("date", "timestamp").foreach { dateTime =>
+          val benchmark = new Benchmark(s"Save ${dateTime}s to parquet", rowsNum, output = output)
+          benchmark.addCase("after 1582, noop", 1) { _ =>
+            genDF(rowsNum, dateTime, after1582 = true).noop()
 
 Review comment:
   do you include the dataframe generation in the benchmark number? I think it should be excluded.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605328003
 
 
   Thank you for the benchmark. Ya. It's an expected drawback.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313805
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821271
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120577/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan closed pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313805
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541682
 
 
   **[Test build #120538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120538/testReport)** for PR 28057 at commit [`912dee4`](https://github.com/apache/spark/commit/912dee41526ac5d7ae9dd44a790a961d2f04b54f).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541780
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120538/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821264
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805340
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25281/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843876
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25284/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973626
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399837182
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
 
 Review comment:
   Could you use more specific benchmark title because this is used in the generate files?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805328
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628472
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518801
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399910178
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
 
 Review comment:
   +1 to make the title mention second rebase.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313816
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25221/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805340
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25281/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602801
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25258/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373021
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120515/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518802
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25244/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541780
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120538/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313370
 
 
   **[Test build #120515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120515/testReport)** for PR 28057 at commit [`e217139`](https://github.com/apache/spark/commit/e217139ee63c7755c6630354847e8c5b3d447aa7).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605846660
 
 
   **[Test build #120580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120580/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497653
 
 
   **[Test build #120535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120535/testReport)** for PR 28057 at commit [`8703579`](https://github.com/apache/spark/commit/87035792f6a5c04eb357feed0b73bf75c274b4f9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605972808
 
 
   **[Test build #120580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120580/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605372787
 
 
   **[Test build #120515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120515/testReport)** for PR 28057 at commit [`e217139`](https://github.com/apache/spark/commit/e217139ee63c7755c6630354847e8c5b3d447aa7).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973626
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843876
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25284/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605401675
 
 
   > It's an expected drawback.
   
   Parquet and Avro perform rebasing only if a SQL config enabled (and the config is off by default). ORC does rebasing always. I would expect some slowdown in ORC too.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518802
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25244/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628473
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120552/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605846660
 
 
   **[Test build #120580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120580/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525869
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602702
 
 
   **[Test build #120552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120552/testReport)** for PR 28057 at commit [`c89f2c9`](https://github.com/apache/spark/commit/c89f2c9a0dd717e4ed12101a05236a2c3bd7252a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805328
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541779
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525725
 
 
   **[Test build #120535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120535/testReport)** for PR 28057 at commit [`8703579`](https://github.com/apache/spark/commit/87035792f6a5c04eb357feed0b73bf75c274b4f9).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373019
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk edited a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
MaxGekk edited a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313506
 
 
   @cloud-fan @HyukjinKwon @dongjoon-hyun Here are intermediate results of benchmarking of timestamps rebasing in parquet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373019
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821271
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120577/
   Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497653
 
 
   **[Test build #120535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120535/testReport)** for PR 28057 at commit [`8703579`](https://github.com/apache/spark/commit/87035792f6a5c04eb357feed0b73bf75c274b4f9).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497821
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525872
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120535/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628279
 
 
   **[Test build #120552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120552/testReport)** for PR 28057 at commit [`c89f2c9`](https://github.com/apache/spark/commit/c89f2c9a0dd717e4ed12101a05236a2c3bd7252a).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313816
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25221/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497829
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25241/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313370
 
 
   **[Test build #120515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120515/testReport)** for PR 28057 at commit [`e217139`](https://github.com/apache/spark/commit/e217139ee63c7755c6630354847e8c5b3d447aa7).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843862
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525872
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120535/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821264
 
 
   Merged build finished. Test FAILed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605804944
 
 
   **[Test build #120577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120577/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628472
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525869
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821163
 
 
   **[Test build #120577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120577/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
    * This patch **fails due to an unknown error code, -9**.
    * This patch merges cleanly.
    * This patch adds no public classes.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373021
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120515/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497821
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605519440
 
 
   **[Test build #120538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120538/testReport)** for PR 28057 at commit [`912dee4`](https://github.com/apache/spark/commit/912dee41526ac5d7ae9dd44a790a961d2f04b54f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843102
 
 
   retest this please

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399853876
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
 
 Review comment:
   Isn't the name scoped by concrete benchmark? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399942411
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
 
 Review comment:
   I am going to replace it by "Rebasing dates/timestamps in Parquet datasource"

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628473
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120552/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on issue #28057: [WIP][SPARK-31294][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #28057: [WIP][SPARK-31294][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605588542
 
 
   Yes, thank you so much for the benchamrks.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399854345
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
 
 Review comment:
   I will address the comments together with other comments because launching EC2 instance and re-running the benchmark twice for jdk 8 & 11 is time consuming process.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497829
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25241/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399931369
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
+        val rowsNum = 100000000
+        Seq("date", "timestamp").foreach { dateTime =>
+          val benchmark = new Benchmark(s"Save ${dateTime}s to parquet", rowsNum, output = output)
+          benchmark.addCase("after 1582, noop", 1) { _ =>
+            genDF(rowsNum, dateTime, after1582 = true).noop()
 
 Review comment:
   For example:
   ```
   after 1582, noop                                   9272           9272           0         10.8          92.7       1.0X
   ```
   ```
   after 1582, rebase off                            21841          21841           0          4.6         218.4       0.4X
   ```
   The `noop` benchmark shows non-avoidable overhead. If we subtract it, we get 21841 - 9272 = 12569. So, overhead of preparing input data is roughly 45%. I do believe this is important info, and we should keep in the benchmark results.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605519440
 
 
   **[Test build #120538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120538/testReport)** for PR 28057 at commit [`912dee4`](https://github.com/apache/spark/commit/912dee41526ac5d7ae9dd44a790a961d2f04b54f).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602702
 
 
   **[Test build #120552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120552/testReport)** for PR 28057 at commit [`c89f2c9`](https://github.com/apache/spark/commit/c89f2c9a0dd717e4ed12101a05236a2c3bd7252a).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973641
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120580/
   Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843862
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602799
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313506
 
 
   @cloud-fan @HyukjinKwon @dongjoon-hyun Here is intermediate results of benchmarking of timestamps rebasing in parquet.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541779
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps 
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518801
 
 
   Merged build finished. Test PASSed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605866414
 
 
   it's just benchmark so no need to wait for jenkins.
   
   Thanks, merging to master/3.0!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605804944
 
 
   **[Test build #120577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120577/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399929050
 
 

 ##########
 File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
 ##########
 @@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ *   1. without sbt:
+ *      bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ *   2. build/sbt "sql/test:runMain <this class>"
+ *   3. generate result:
+ *      SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ *      Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+  import spark.implicits._
+
+  private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+    val startSec = start.toEpochSecond(ZoneOffset.UTC)
+    val endSec = end.toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality, 1, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+  }
+
+  private def genTsAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+    val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genTsBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+    val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+    genTs(cardinality, start, end)
+  }
+
+  private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+    val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+    spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+      .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+      .select($"seconds".cast("timestamp").as("ts"))
+      .select($"ts".cast("date").as("date"))
+  }
+
+  private def genDateAfter1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(1582, 10, 15)
+    val end = LocalDate.of(3000, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDateBefore1582(cardinality: Int): DataFrame = {
+    val start = LocalDate.of(10, 1, 1)
+    val end = LocalDate.of(1580, 1, 1)
+    genDate(cardinality, start, end)
+  }
+
+  private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+    (dateTime, after1582) match {
+      case ("date", true) => genDateAfter1582(cardinality)
+      case ("date", false) => genDateBefore1582(cardinality)
+      case ("timestamp", true) => genTsAfter1582(cardinality)
+      case ("timestamp", false) => genTsBefore1582(cardinality)
+      case _ => throw new IllegalArgumentException(
+        s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+    }
+  }
+
+  override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+    withTempPath { path =>
+      runBenchmark("Parquet read/write") {
+        val rowsNum = 100000000
+        Seq("date", "timestamp").foreach { dateTime =>
+          val benchmark = new Benchmark(s"Save ${dateTime}s to parquet", rowsNum, output = output)
+          benchmark.addCase("after 1582, noop", 1) { _ =>
+            genDF(rowsNum, dateTime, after1582 = true).noop()
 
 Review comment:
   We have already discussed this in PRs for another benchmarks. The overhead of preparing input dataframe is assumed to be subtracted from other numbers.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org