You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2020/03/29 19:34:02 UTC
[GitHub] [spark] MaxGekk opened a new pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
MaxGekk opened a new pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067
### What changes were proposed in this pull request?
Here is the benchmark results:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Save dates to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 7703 7703 0 13.0 77.0 1.0X
before 1582, noop 7679 7679 0 13.0 76.8 1.0X
after 1582, rebase off 17668 17668 0 5.7 176.7 0.4X
after 1582, rebase on 18527 18527 0 5.4 185.3 0.4X
before 1582, rebase off 17526 17526 0 5.7 175.3 0.4X
before 1582, rebase on 18189 18189 0 5.5 181.9 0.4X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Load dates from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 10582 10694 192 9.5 105.8 1.0X
after 1582, vec off, rebase on 11611 11620 10 8.6 116.1 0.9X
after 1582, vec on, rebase off 2982 3010 38 33.5 29.8 3.5X
after 1582, vec on, rebase on 4448 4538 82 22.5 44.5 2.4X
before 1582, vec off, rebase off 10559 10614 71 9.5 105.6 1.0X
before 1582, vec off, rebase on 11487 11572 74 8.7 114.9 0.9X
before 1582, vec on, rebase off 2894 2951 83 34.6 28.9 3.7X
before 1582, vec on, rebase on 4505 4614 102 22.2 45.1 2.3X
```
### Why are the changes needed?
<!--
Please clarify why the changes are needed. For instance,
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
-->
### Does this PR introduce any user-facing change?
<!--
If yes, please clarify the previous behavior and the change this PR proposes - provide the console output, description and/or an example to show the behavior difference if possible.
If no, write 'No'.
-->
### How was this patch tested?
<!--
If tests were added, say they were added here. Please make sure to add some test cases that check the changes thoroughly including negative and positive cases if possible.
If it was tested in a way different from regular unit tests, please clarify how you tested step by step, ideally copy and paste-able, so that other reviewers can test and check, and descendants can verify in the future.
If tests were not added, please describe why they were not added and/or why it was difficult to add.
-->
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606077922
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25301/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606216202
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606245253
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120601/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605688884
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25264/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606216212
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120597/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606077284
**[Test build #120597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120597/testReport)** for PR 28067 at commit [`db5badb`](https://github.com/apache/spark/commit/db5badb9c9731068e5ce23ed780ddd1612dc7cef).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606077284
**[Test build #120597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120597/testReport)** for PR 28067 at commit [`db5badb`](https://github.com/apache/spark/commit/db5badb9c9731068e5ce23ed780ddd1612dc7cef).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605727581
**[Test build #120559 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120559/testReport)** for PR 28067 at commit [`65f222e`](https://github.com/apache/spark/commit/65f222e03396e43f5629ac4a53853617980ea9a0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606052412
**[Test build #120585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120585/testReport)** for PR 28067 at commit [`fd88c56`](https://github.com/apache/spark/commit/fd88c5692bbd34fe55066ecfb893ba4e533aa4d1).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605922963
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25288/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605693366
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605922955
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605688881
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606245253
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120601/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605807798
> can you briefly explain your idea to optimize it?
@cloud-fan The difference in days between Proleptic Gregorian and the hybrid calendar (Julian+Gregorian) doesn't change so often. If you look at the JIRA ticket [SPARK-31297](https://issues.apache.org/jira/browse/SPARK-31297?focusedCommentId=17070457&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17070457), you can see that it changed 14 times on the interval `1001-01-01`-`2030-01-01`. The idea is to build an array of days when the diff was changed, and for the given date, find the interval to which the date belongs to.
> and what's the benchmark numbers before your optimization?
The benchmark has not been merged yet. It waits for you approval. You can find numbers there https://github.com/apache/spark/pull/28057
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605922963
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25288/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606103462
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25305/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606077901
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605724376
**[Test build #120558 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120558/testReport)** for PR 28067 at commit [`3aa88bc`](https://github.com/apache/spark/commit/3aa88bca1fb0ccb10124ffa8f78c428a6ced752b).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [SPARK-31297][SQL] Speed
up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606215002
**[Test build #120597 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120597/testReport)** for PR 28067 at commit [`db5badb`](https://github.com/apache/spark/commit/db5badb9c9731068e5ce23ed780ddd1612dc7cef).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [SPARK-31297][SQL] Speed
up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606102869
**[Test build #120601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120601/testReport)** for PR 28067 at commit [`b8fa18e`](https://github.com/apache/spark/commit/b8fa18ee1968fb8b9aca84daa67f2419c16dca95).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605693370
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25265/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605693370
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25265/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605727912
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605693366
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606053582
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400130761
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
Review comment:
Then it means `julianGregDiffSwitchDay(0)` is useless? It's not a point that difference in days changes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605693189
**[Test build #120559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120559/testReport)** for PR 28067 at commit [`65f222e`](https://github.com/apache/spark/commit/65f222e03396e43f5629ac4a53853617980ea9a0).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606053598
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120585/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605888605
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25286/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605888595
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606102869
**[Test build #120601 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120601/testReport)** for PR 28067 at commit [`b8fa18e`](https://github.com/apache/spark/commit/b8fa18ee1968fb8b9aca84daa67f2419c16dca95).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606013559
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606013569
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120583/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605688881
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400179977
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
Review comment:
let's document it clearly.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606525011
+1 from me too
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605887856
**[Test build #120583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120583/testReport)** for PR 28067 at commit [`89d35fd`](https://github.com/apache/spark/commit/89d35fd2d2d85c1a09be95f73583b6120a5d6f40).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606216202
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606053582
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400320283
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
Review comment:
I updated comments for `gregJulianDiffSwitchDay` and `julianGregDiffSwitchDay`.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605724628
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606053598
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120585/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605724630
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120558/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605727915
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120559/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606245241
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606013559
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400120317
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
Review comment:
The dates before `0001-01-01` is out of supported range, the current implementation just returns constant diff of 2 days.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606513073
thanks, merging to master/3.0!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
dongjoon-hyun commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605754654
cc @rxin and @gatorsmile
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605727912
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605688734
**[Test build #120558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120558/testReport)** for PR 28067 at commit [`3aa88bc`](https://github.com/apache/spark/commit/3aa88bca1fb0ccb10124ffa8f78c428a6ced752b).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605727915
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120559/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605688884
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25264/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605887856
**[Test build #120583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120583/testReport)** for PR 28067 at commit [`89d35fd`](https://github.com/apache/spark/commit/89d35fd2d2d85c1a09be95f73583b6120a5d6f40).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605688734
**[Test build #120558 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120558/testReport)** for PR 28067 at commit [`3aa88bc`](https://github.com/apache/spark/commit/3aa88bca1fb0ccb10124ffa8f78c428a6ced752b).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606103451
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606077901
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606012444
**[Test build #120583 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120583/testReport)** for PR 28067 at commit [`89d35fd`](https://github.com/apache/spark/commit/89d35fd2d2d85c1a09be95f73583b6120a5d6f40).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605724630
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120558/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605888605
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25286/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605692063
Here are results with linear search. They seems better than w/ binary search.
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Save dates to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 8083 8083 0 12.4 80.8 1.0X
before 1582, noop 7971 7971 0 12.5 79.7 1.0X
after 1582, rebase off 17882 17882 0 5.6 178.8 0.5X
after 1582, rebase on 17677 17677 0 5.7 176.8 0.5X
before 1582, rebase off 17811 17811 0 5.6 178.1 0.5X
before 1582, rebase on 17858 17858 0 5.6 178.6 0.5X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Load dates from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 10511 10588 87 9.5 105.1 1.0X
after 1582, vec off, rebase on 10674 10758 143 9.4 106.7 1.0X
after 1582, vec on, rebase off 2932 2983 52 34.1 29.3 3.6X
after 1582, vec on, rebase on 4176 4225 52 23.9 41.8 2.5X
before 1582, vec off, rebase off 10663 10719 52 9.4 106.6 1.0X
before 1582, vec off, rebase on 11047 11110 80 9.1 110.5 1.0X
before 1582, vec on, rebase off 2914 2983 81 34.3 29.1 3.6X
before 1582, vec on, rebase on 4384 4457 64 22.8 43.8 2.4X
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606245241
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605693189
**[Test build #120559 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120559/testReport)** for PR 28067 at commit [`65f222e`](https://github.com/apache/spark/commit/65f222e03396e43f5629ac4a53853617980ea9a0).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400110953
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
Review comment:
This doesn't explain what's the diff of dates before `julianGregDiffSwitchDay(0)`
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606103451
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [SPARK-31297][SQL] Speed
up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606244391
**[Test build #120601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120601/testReport)** for PR 28067 at commit [`b8fa18e`](https://github.com/apache/spark/commit/b8fa18ee1968fb8b9aca84daa67f2419c16dca95).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605687880
@cloud-fan @HyukjinKwon @dongjoon-hyun Linear search from the end of arrays should be even faster, I guess.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605922955
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606077922
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25301/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605852912
Is there any public document to support your statement?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605888595
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606013569
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120583/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400175202
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
Review comment:
It is not completely useless because I need some point to stop in searching. Let's say this is a diff switching day from undefined diff to concrete diff (2 days).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400130966
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
+ private val julianGregDiffs = Array(2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, 0)
+ // The sorted days when difference in days between Julian and Proleptic
Review comment:
`sorted days` -> `sorted days in Julian calendar`?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk commented on a change in pull request #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#discussion_r400187386
##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1033,6 +1033,40 @@ object DateTimeUtils {
instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
}
+ /**
+ * Rebases days since the epoch from an original to an target calendar, from instance
+ * from a hybrid (Julian + Gregorian) to Proleptic Gregorian calendar.
+ *
+ * It finds the latest switch day which is less than `days`, and adds the difference
+ * in days associated with the switch day to the given `days`. The function is based
+ * on linear search which starts from the most recent switch days. This allows to perform
+ * less comparisons for modern dates.
+ *
+ * @param switchDays The days when difference in days between original and target
+ * calendar was changed.
+ * @param diffs The differences in days between calendars.
+ * @param days The number of days since the epoch 1970-01-01 to be rebased to the
+ * target calendar.
+ * @return The rebased day
+ */
+ private def rebaseDays(switchDays: Array[Int], diffs: Array[Int], days: Int): Int = {
+ var i = switchDays.length - 1
+ while (i >= 0 && days < switchDays(i)) {
+ i -= 1
+ }
+ val rebased = days + diffs(if (i < 0) 0 else i)
+ rebased
+ }
+
+ // The differences in days between Julian and Proleptic Gregorian dates.
+ // The diff at the index `i` is applicable for all days in the date interval:
+ // [julianGregDiffSwitchDay(i), julianGregDiffSwitchDay(i+1))
+ private val julianGregDiffs = Array(2, 1, 0, -1, -2, -3, -4, -5, -6, -7, -8, -9, -10, 0)
+ // The sorted days when difference in days between Julian and Proleptic
Review comment:
Changed here and below
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605761710
can you briefly explain your idea to optimize it? and what's the benchmark numbers before your optimization?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] MaxGekk edited a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
MaxGekk edited a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605692063
Here are results with linear search. They seems better than w/ binary search for dates after **1582**:
```
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Save dates to parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, noop 8083 8083 0 12.4 80.8 1.0X
before 1582, noop 7971 7971 0 12.5 79.7 1.0X
after 1582, rebase off 17882 17882 0 5.6 178.8 0.5X
after 1582, rebase on 17677 17677 0 5.7 176.8 0.5X
before 1582, rebase off 17811 17811 0 5.6 178.1 0.5X
before 1582, rebase on 17858 17858 0 5.6 178.6 0.5X
Java HotSpot(TM) 64-Bit Server VM 1.8.0_231-b11 on Mac OS X 10.15.3
Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz
Load dates from parquet: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
after 1582, vec off, rebase off 10511 10588 87 9.5 105.1 1.0X
after 1582, vec off, rebase on 10674 10758 143 9.4 106.7 1.0X
after 1582, vec on, rebase off 2932 2983 52 34.1 29.3 3.6X
after 1582, vec on, rebase on 4176 4225 52 23.9 41.8 2.5X
before 1582, vec off, rebase off 10663 10719 52 9.4 106.6 1.0X
before 1582, vec off, rebase on 11047 11110 80 9.1 110.5 1.0X
before 1582, vec on, rebase off 2914 2983 81 34.3 29.1 3.6X
before 1582, vec on, rebase on 4384 4457 64 22.8 43.8 2.4X
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605724628
Merged build finished. Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] cloud-fan closed pull request #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
cloud-fan closed pull request #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #28067:
[SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606216212
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/120597/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #28067: [SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on issue #28067: [SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-606103462
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25305/
Test PASSed.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #28067:
[WIP][SPARK-31297][SQL] Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605922495
**[Test build #120585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120585/testReport)** for PR 28067 at commit [`fd88c56`](https://github.com/apache/spark/commit/fd88c5692bbd34fe55066ecfb893ba4e533aa4d1).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL]
Speed up dates rebasing
Posted by GitBox <gi...@apache.org>.
SparkQA commented on issue #28067: [WIP][SPARK-31297][SQL] Speed up dates rebasing
URL: https://github.com/apache/spark/pull/28067#issuecomment-605922495
**[Test build #120585 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/120585/testReport)** for PR 28067 at commit [`fd88c56`](https://github.com/apache/spark/commit/fd88c5692bbd34fe55066ecfb893ba4e533aa4d1).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org