You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/27 21:00:29 UTC
[GitHub] [spark] MaxGekk opened a new pull request #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
MaxGekk opened a new pull request #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057
TL&DR
I benchmarked save/load for Parquet:
- Saving is **~6 times slower**
- Loading w/ vectorized **off** is **~4 times slower**
- Loading w/ vectorized **on** is **~11 times slower**
Here is the results I got on my laptop:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Save timestamps to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 2265 2265 0 44.2 22.6 1.0X
after 1582, rebase off 13689 13689 0 7.3 136.9 0.2X
after 1582, rebase on 71073 71073 0 1.4 710.7 0.0X
before 1582, noop 2118 2118 0 47.2 21.2 1.1X
before 1582, rebase off 14442 14442 0 6.9 144.4 0.2X
before 1582, rebase on 78824 78824 0 1.3 788.2 0.0X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Load timestamps from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 12025 12326 390 8.3 120.2 1.0X
after 1582, vec off, rebase on 51214 52395 1724 2.0 512.1 0.2X
after 1582, vec on, rebase off 3798 3848 71 26.3 38.0 3.2X
after 1582, vec on, rebase on 42998 43137 138 2.3 430.0 0.3X
before 1582, vec off, rebase off 11692 11793 140 8.6 116.9 1.0X
before 1582, vec off, rebase on 52789 52973 173 1.9 527.9 0.2X
before 1582, vec on, rebase off 3832 3871 48 26.1 38.3 3.1X
after 1582, vec on, rebase on 44904 44956 66 2.2 449.0 0.3X
```
<!--
Thanks for sending a pull request! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines: https://spark.apache.org/contributing.html
2. Ensure you have added or run the appropriate tests for your PR: https://spark.apache.org/developer-tools.html
3. If the PR is unfinished, add '[WIP]' in your PR title, e.g., '[WIP][SPARK-XXXX] Your PR title ...'.
4. Be sure to keep the PR description updated to reflect all changes.
5. Please write your PR title to summarize what this PR proposes.
6. If possible, provide a concise example to reproduce the issue for a faster review.
7. If you want to add a new configuration, please read the guideline first for naming configurations in
'core/src/main/scala/org/apache/spark/internal/config/ConfigEntry.scala'.
-->
### What changes were proposed in this pull request?
<!--
Please clarify what changes you are proposing. The purpose of this section is to outline the changes and how this PR fixes the issue.
If possible, please consider writing useful notes for better and faster reviews in your PR. See the examples below.
1. If you refactor some codes with changing classes, showing the class hierarchy will help reviewers.
2. If you fix some SQL features, you can provide some references of other DBMSes.
3. If there is design documentation, please add the link.
4. If there is a discussion in the mailing list, please add the link.
-->
### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
-->
### Does this PR introduce any user-facing change?
<!--
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If no, write 'No'.
-->
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973641
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120580/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602801
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25258/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602799
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399910407
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
+ val rowsNum = 100000000
+ Seq("date", "timestamp").foreach { dateTime =>
+ val benchmark = new Benchmark(s"Save ${dateTime}s to parquet", rowsNum, output = output)
+ benchmark.addCase("after 1582, noop", 1) { _ =>
+ genDF(rowsNum, dateTime, after1582 = true).noop()
Review comment:
do you include the dataframe generation in the benchmark number? I think it should be excluded.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605328003
Thank you for the benchmark. Ya. It's an expected drawback.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313805
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821271
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120577/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313805
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541682
**[Test build #120538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120538/testReport)** for PR 28057 at commit [`912dee4`](https://github.com/apache/spark/commit/912dee41526ac5d7ae9dd44a790a961d2f04b54f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541780
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120538/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821264
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805340
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25281/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843876
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25284/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973626
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399837182
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
Review comment:
Could you use more specific benchmark title because this is used in the generate files?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805328
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628472
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518801
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399910178
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
Review comment:
+1 to make the title mention second rebase.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313816
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25221/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805340
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25281/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602801
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25258/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373021
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120515/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518802
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25244/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541780
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120538/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313370
**[Test build #120515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120515/testReport)** for PR 28057 at commit [`e217139`](https://github.com/apache/spark/commit/e217139ee63c7755c6630354847e8c5b3d447aa7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605846660
**[Test build #120580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120580/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497653
**[Test build #120535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120535/testReport)** for PR 28057 at commit [`8703579`](https://github.com/apache/spark/commit/87035792f6a5c04eb357feed0b73bf75c274b4f9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605972808
**[Test build #120580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120580/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605372787
**[Test build #120515 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120515/testReport)** for PR 28057 at commit [`e217139`](https://github.com/apache/spark/commit/e217139ee63c7755c6630354847e8c5b3d447aa7).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973626
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843876
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25284/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605401675
> It's an expected drawback.
Parquet and Avro perform rebasing only if a SQL config enabled (and the config is off by default). ORC does rebasing always. I would expect some slowdown in ORC too.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518802
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25244/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628473
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120552/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605846660
**[Test build #120580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120580/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525869
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SPARK-31296][SQL]
Benchmark date-time rebasing to/from Julian calendar
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602702
**[Test build #120552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120552/testReport)** for PR 28057 at commit [`c89f2c9`](https://github.com/apache/spark/commit/c89f2c9a0dd717e4ed12101a05236a2c3bd7252a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605805328
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541779
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525725
**[Test build #120535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120535/testReport)** for PR 28057 at commit [`8703579`](https://github.com/apache/spark/commit/87035792f6a5c04eb357feed0b73bf75c274b4f9).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373019
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk edited a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
MaxGekk edited a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313506
@cloud-fan @HyukjinKwon @dongjoon-hyun Here are intermediate results of benchmarking of timestamps rebasing in parquet.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373019
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821271
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120577/
Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497653
**[Test build #120535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120535/testReport)** for PR 28057 at commit [`8703579`](https://github.com/apache/spark/commit/87035792f6a5c04eb357feed0b73bf75c274b4f9).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497821
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525872
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120535/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [SPARK-31296][SQL]
Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628279
**[Test build #120552 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120552/testReport)** for PR 28057 at commit [`c89f2c9`](https://github.com/apache/spark/commit/c89f2c9a0dd717e4ed12101a05236a2c3bd7252a).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313816
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25221/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497829
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25241/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313370
**[Test build #120515 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120515/testReport)** for PR 28057 at commit [`e217139`](https://github.com/apache/spark/commit/e217139ee63c7755c6630354847e8c5b3d447aa7).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843862
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525872
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120535/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821264
Merged build finished. Test FAILed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605804944
**[Test build #120577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120577/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL]
Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628472
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605525869
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605821163
**[Test build #120577 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120577/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605373021
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120515/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497821
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605519440
**[Test build #120538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120538/testReport)** for PR 28057 at commit [`912dee4`](https://github.com/apache/spark/commit/912dee41526ac5d7ae9dd44a790a961d2f04b54f).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843102
retest this please
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399853876
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
Review comment:
Isn't the name scoped by concrete benchmark?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399942411
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
Review comment:
I am going to replace it by "Rebasing dates/timestamps in Parquet datasource"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [SPARK-31296][SQL]
Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605628473
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120552/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28057:
[WIP][SPARK-31294][SQL] Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #28057: [WIP][SPARK-31294][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605588542
Yes, thank you so much for the benchamrks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399854345
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
Review comment:
I will address the comments together with other comments because launching EC2 instance and re-running the benchmark twice for jdk 8 & 11 is time consuming process.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605497829
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25241/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399931369
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
+ val rowsNum = 100000000
+ Seq("date", "timestamp").foreach { dateTime =>
+ val benchmark = new Benchmark(s"Save ${dateTime}s to parquet", rowsNum, output = output)
+ benchmark.addCase("after 1582, noop", 1) { _ =>
+ genDF(rowsNum, dateTime, after1582 = true).noop()
Review comment:
For example:
```
after 1582, noop 9272 9272 0 10.8 92.7 1.0X
```
```
after 1582, rebase off 21841 21841 0 4.6 218.4 0.4X
```
The `noop` benchmark shows non-avoidable overhead. If we subtract it, we get 21841 - 9272 = 12569. So, overhead of preparing input data is roughly 45%. I do believe this is important info, and we should keep in the benchmark results.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605519440
**[Test build #120538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120538/testReport)** for PR 28057 at commit [`912dee4`](https://github.com/apache/spark/commit/912dee41526ac5d7ae9dd44a790a961d2f04b54f).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28057:
[SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [SPARK-31296][SQL] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602702
**[Test build #120552 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120552/testReport)** for PR 28057 at commit [`c89f2c9`](https://github.com/apache/spark/commit/c89f2c9a0dd717e4ed12101a05236a2c3bd7252a).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605973641
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120580/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605843862
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28057:
[WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28057: [WIP][SPARK-31296][SQL] Benchmark date-time rebasing to/from Julian calendar
URL: https://github.com/apache/spark/pull/28057#issuecomment-605602799
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #28057: [WIP][SQL] Benchmark
rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605313506
@cloud-fan @HyukjinKwon @dongjoon-hyun Here is intermediate results of benchmarking of timestamps rebasing in parquet.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605541779
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28057: [WIP][SQL]
Benchmark rebasing of dates/timestamps
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28057: [WIP][SQL] Benchmark rebasing of dates/timestamps
URL: https://github.com/apache/spark/pull/28057#issuecomment-605518801
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605866414
it's just benchmark so no need to wait for jenkins.
Thanks, merging to master/3.0!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#issuecomment-605804944
**[Test build #120577 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120577/testReport)** for PR 28057 at commit [`e0aedf5`](https://github.com/apache/spark/commit/e0aedf5dbf363477ca88b6c6a7fb0038bafe8261).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28057:
[SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28057: [SPARK-31296][SQL][TESTS] Benchmark date-time rebasing in Parquet datasource
URL: https://github.com/apache/spark/pull/28057#discussion_r399929050
##########
File path: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/DateTimeRebaseBenchmark.scala
##########
@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.benchmark
+
+import java.time.{LocalDate, LocalDateTime, LocalTime, ZoneOffset}
+
+import org.apache.spark.benchmark.Benchmark
+import org.apache.spark.sql.DataFrame
+import org.apache.spark.sql.catalyst.util.DateTimeConstants.SECONDS_PER_DAY
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * Synthetic benchmark for rebasing of date and timestamp in read/write.
+ * To run this benchmark:
+ * {{{
+ * 1. without sbt:
+ * bin/spark-submit --class <this class> --jars <spark core test jar> <sql core test jar>
+ * 2. build/sbt "sql/test:runMain <this class>"
+ * 3. generate result:
+ * SPARK_GENERATE_BENCHMARK_FILES=1 build/sbt "sql/test:runMain <this class>"
+ * Results will be written to "benchmarks/DateTimeRebaseBenchmark-results.txt".
+ * }}}
+ */
+object DateTimeRebaseBenchmark extends SqlBasedBenchmark {
+ import spark.implicits._
+
+ private def genTs(cardinality: Int, start: LocalDateTime, end: LocalDateTime): DataFrame = {
+ val startSec = start.toEpochSecond(ZoneOffset.UTC)
+ val endSec = end.toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality, 1, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ }
+
+ private def genTsAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(1582, 10, 15, 0, 0, 0)
+ val end = LocalDateTime.of(3000, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genTsBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDateTime.of(10, 1, 1, 0, 0, 0)
+ val end = LocalDateTime.of(1580, 1, 1, 0, 0, 0)
+ genTs(cardinality, start, end)
+ }
+
+ private def genDate(cardinality: Int, start: LocalDate, end: LocalDate): DataFrame = {
+ val startSec = LocalDateTime.of(start, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ val endSec = LocalDateTime.of(end, LocalTime.MIDNIGHT).toEpochSecond(ZoneOffset.UTC)
+ spark.range(0, cardinality * SECONDS_PER_DAY, SECONDS_PER_DAY, 1)
+ .select((($"id" % (endSec - startSec)) + startSec).as("seconds"))
+ .select($"seconds".cast("timestamp").as("ts"))
+ .select($"ts".cast("date").as("date"))
+ }
+
+ private def genDateAfter1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(1582, 10, 15)
+ val end = LocalDate.of(3000, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDateBefore1582(cardinality: Int): DataFrame = {
+ val start = LocalDate.of(10, 1, 1)
+ val end = LocalDate.of(1580, 1, 1)
+ genDate(cardinality, start, end)
+ }
+
+ private def genDF(cardinality: Int, dateTime: String, after1582: Boolean): DataFrame = {
+ (dateTime, after1582) match {
+ case ("date", true) => genDateAfter1582(cardinality)
+ case ("date", false) => genDateBefore1582(cardinality)
+ case ("timestamp", true) => genTsAfter1582(cardinality)
+ case ("timestamp", false) => genTsBefore1582(cardinality)
+ case _ => throw new IllegalArgumentException(
+ s"cardinality = $cardinality dateTime = $dateTime after1582 = $after1582")
+ }
+ }
+
+ override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {
+ withTempPath { path =>
+ runBenchmark("Parquet read/write") {
+ val rowsNum = 100000000
+ Seq("date", "timestamp").foreach { dateTime =>
+ val benchmark = new Benchmark(s"Save ${dateTime}s to parquet", rowsNum, output = output)
+ benchmark.addCase("after 1582, noop", 1) { _ =>
+ genDF(rowsNum, dateTime, after1582 = true).noop()
Review comment:
We have already discussed this in PRs for another benchmarks. The overhead of preparing input dataframe is assumed to be subtracted from other numbers.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org