You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/14 15:18:10 UTC

[GitHub] [spark] MaxGekk opened a new pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

MaxGekk opened a new pull request #32170:
URL: https://github.com/apache/spark/pull/32170


   ### What changes were proposed in this pull request?
   Support `date +/- day-time interval`. In the PR, I propose to update the binary arithmetic rules, and cast an input date to a timestamp, and then add a day-time interval to it.
   
   ### Why are the changes needed?
   1. To conform the ANSI SQL standard which requires to support such operation over dates and intervals:
   <img width="811" alt="Screenshot 2021-03-12 at 11 36 14" src="https://user-images.githubusercontent.com/1580697/111081674-865d4900-8515-11eb-86c8-3538ecaf4804.png">
   2. To fix regression against the recent release with default settings.
   
   Before the changes:
   ```sql
   spark-sql> select date'now' + (timestamp'now' - timestamp'yesterday');
   Error in query: cannot resolve 'DATE '2021-04-14' + subtracttimestamps(TIMESTAMP '2021-04-14 18:14:56.497', TIMESTAMP '2021-04-13 00:00:00')' due to data type mismatch: argument 1 requires timestamp type, however, 'DATE '2021-04-14'' is of date type.; line 1 pos 7;
   'Project [unresolvedalias(cast(2021-04-14 + subtracttimestamps(2021-04-14 18:14:56.497, 2021-04-13 00:00:00, false, Some(Europe/Moscow)) as date), None)]
   +- OneRowRelation
   ```
   
   Spark 3.1:
   ```sql
   spark-sql> select date'now' + (timestamp'now' - timestamp'yesterday');
   2021-04-15
   ```
   
   Hive:
   ```sql
   0: jdbc:hive2://localhost:10000/default> select date'2021-04-14' + (timestamp'2020-04-14 18:15:30' - timestamp'2020-04-13 00:00:00');
   +------------------------+
   |          _c0           |
   +------------------------+
   | 2021-04-15 18:15:30.0  |
   +------------------------+
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   Should not since new intervals have not been released yet.
   
   After the changes:
   ```sql
   spark-sql> select date'now' + (timestamp'now' - timestamp'yesterday');
   2021-04-15 18:13:16.555
   ```
   
   ### How was this patch tested?
   By running new tests:
   ```
   $ build/sbt "test:testOnly *ColumnExpressionSuite"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #32170:
URL: https://github.com/apache/spark/pull/32170


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819615956


   **[Test build #137356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137356/testReport)** for PR 32170 at commit [`4a425e1`](https://github.com/apache/spark/commit/4a425e1452d4834c645e3627dec469fd14e7c322).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819661726


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819649885


   All tests passed:
   <img width="773" alt="Screenshot 2021-04-14 at 19 26 12" src="https://user-images.githubusercontent.com/1580697/114745421-4dec9d00-9d57-11eb-9cb4-e27b2b245810.png">
   Merging to master. Thank you @cloud-fan for your review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins removed a comment on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
AmplabJenkins removed a comment on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819816070


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819814404


   **[Test build #137356 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137356/testReport)** for PR 32170 at commit [`4a425e1`](https://github.com/apache/spark/commit/4a425e1452d4834c645e3627dec469fd14e7c322).
    * This patch passes all tests.
    * This patch merges cleanly.
    * This patch adds no public classes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819816070


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/137356/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819661696


   Kubernetes integration test status failure
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32170:
URL: https://github.com/apache/spark/pull/32170#discussion_r659901849



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -345,6 +345,8 @@ class Analyzer(override val catalogManager: CatalogManager)
     override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
       case p: LogicalPlan => p.transformExpressionsUp {
         case a @ Add(l, r, f) if a.childrenResolved => (l.dataType, r.dataType) match {
+          case (DateType, DayTimeIntervalType) => TimeAdd(Cast(l, TimestampType), r)
+          case (DayTimeIntervalType, DateType) => TimeAdd(Cast(r, TimestampType), l)

Review comment:
       I doubt this. For old interval types, `Date + Interval` always returns Date:
   ```
   case (DateType, CalendarIntervalType) => DateAddInterval(l, r, ansiEnabled = f)
   case (CalendarIntervalType, DateType) => DateAddInterval(r, l, ansiEnabled = f)
   ```
   
   I think we should do this for new interval types as well, and fail at runtime if the interval has hour, minute, second fields.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819632794


   > Spark 3.1 truncates timestamp to date?
   
   @cloud-fan Yep. Spark 3.1 as well as master when `spark.sql.legacy.interval.enabled` is set to `true` casts the result to the type of the left parameter.
   
   > I think the new behavior makes more sense
   
   It depends how you get the interval. For instance,
   
   Spark 3.1:
   ```sql
   spark-sql> select date'now' + (date'now' - date'yesterday');
   2021-04-15
   ```
   
   After the changes:
   ```sql
   spark-sql> select date'now' + (date'now' - date'yesterday');
   2021-04-15 00:00:00
   ```
   As you can see, Spark 3.1 returns a date as the result which also makes sense.
   
   > shall we add an item in the migration guide?
   
   Adding ANSI day-time interval to a date wasn't supported in Spark 3.1. Actually, the behavior change is related to subtraction of dates/timestamps which has been already documented in the SQL migration guide:
   ```
     - In Spark 3.2, the dates subtraction expression such as `date1 - date2` returns values of `DayTimeIntervalType`. In Spark 3.1 and earlier, the returned type is `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.
   
     - In Spark 3.2, the timestamps subtraction expression such as `timestamp '2021-03-31 23:48:00' - timestamp '2021-01-01 00:00:00'` returns values of `DayTimeIntervalType`. In Spark 3.1 and earlier, the type of the same expression is `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA removed a comment on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
SparkQA removed a comment on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819615956


   **[Test build #137356 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/137356/testReport)** for PR 32170 at commit [`4a425e1`](https://github.com/apache/spark/commit/4a425e1452d4834c645e3627dec469fd14e7c322).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk edited a comment on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
MaxGekk edited a comment on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819632794


   > Spark 3.1 truncates timestamp to date?
   
   @cloud-fan Yep. Spark 3.1 as well as master when `spark.sql.legacy.interval.enabled` is set to `true` casts the result to the type of the left parameter.
   
   > I think the new behavior makes more sense
   
   It depends on how you get the interval. For instance,
   
   Spark 3.1:
   ```sql
   spark-sql> select date'now' + (date'now' - date'yesterday');
   2021-04-15
   ```
   
   After the changes:
   ```sql
   spark-sql> select date'now' + (date'now' - date'yesterday');
   2021-04-15 00:00:00
   ```
   As you can see, Spark 3.1 returns a date as the result which also makes sense.
   
   > shall we add an item in the migration guide?
   
   Adding ANSI day-time interval to a date wasn't supported in Spark 3.1. Actually, the behavior change is related to subtraction of dates/timestamps which has been already documented in the SQL migration guide:
   ```
     - In Spark 3.2, the dates subtraction expression such as `date1 - date2` returns values of `DayTimeIntervalType`. In Spark 3.1 and earlier, the returned type is `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.
   
     - In Spark 3.2, the timestamps subtraction expression such as `timestamp '2021-03-31 23:48:00' - timestamp '2021-01-01 00:00:00'` returns values of `DayTimeIntervalType`. In Spark 3.1 and earlier, the type of the same expression is `CalendarIntervalType`. To restore the behavior before Spark 3.2, you can set `spark.sql.legacy.interval.enabled` to `true`.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] github-actions[bot] commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819730769


   **[Test build #748727708](https://github.com/MaxGekk/spark/actions/runs/748727708)** for PR 32170 at commit [`4a425e1`](https://github.com/MaxGekk/spark/commit/4a425e1452d4834c645e3627dec469fd14e7c322).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] AmplabJenkins commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
AmplabJenkins commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819661726


   
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/41935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819621994


   > Spark 3.1:
   > spark-sql> select date'now' + (timestamp'now' - timestamp'yesterday');
   > 2021-04-15
   
   Spark 3.1 truncates timestamp to date? I think the new behavior makes more sense but shall we add an item in the migration guide?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] SparkQA commented on pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
SparkQA commented on pull request #32170:
URL: https://github.com/apache/spark/pull/32170#issuecomment-819658250


   Kubernetes integration test starting
   URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/41935/
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #32170: [SPARK-35051][SQL] Support add/subtract of a day-time interval to/from a date

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #32170:
URL: https://github.com/apache/spark/pull/32170#discussion_r659901849



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -345,6 +345,8 @@ class Analyzer(override val catalogManager: CatalogManager)
     override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUp {
       case p: LogicalPlan => p.transformExpressionsUp {
         case a @ Add(l, r, f) if a.childrenResolved => (l.dataType, r.dataType) match {
+          case (DateType, DayTimeIntervalType) => TimeAdd(Cast(l, TimestampType), r)
+          case (DayTimeIntervalType, DateType) => TimeAdd(Cast(r, TimestampType), l)

Review comment:
       I doubt this. For old interval types, `Date + Interval` always returns Date:
   ```
   case (DateType, CalendarIntervalType) => DateAddInterval(l, r, ansiEnabled = f)
   case (CalendarIntervalType, DateType) => DateAddInterval(r, l, ansiEnabled = f)
   ```
   
   I think we should do this for new interval types as well, and fail at runtime if the interval has hour, minute, second fields.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org