You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/25 19:48:25 UTC

[GitHub] [spark] MaxGekk opened a new pull request #35661: [WIP][SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

MaxGekk opened a new pull request #35661:
URL: https://github.com/apache/spark/pull/35661


   ### What changes were proposed in this pull request?
   In the PR, I propose to add two aliases for the `TIMESTAMPADD()` function introduced by https://github.com/apache/spark/pull/35502:
   - `DATEADD()`
   - `DATE_ADD()`
   
   ### Why are the changes needed?
   1. To make the migration process from other systems to Spark SQL easier.
   2. To achieve feature parity with other DBMSs.
   
   ### Does this PR introduce _any_ user-facing change?
   No. The new aliases just extend Spark SQL API.
   
   ### How was this patch tested?
   1. By running the existing test suites:
   ```
   $ build/sbt "test:testOnly *SQLKeywordSuite"
   ```
   3. and new checks:
   ```
   $ build/sbt "sql/testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z date.sql"
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] superdupershant commented on a change in pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
superdupershant commented on a change in pull request #35661:
URL: https://github.com/apache/spark/pull/35661#discussion_r815827342



##########
File path: sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
##########
@@ -660,3 +660,83 @@ struct<>
 -- !query output
 org.apache.spark.SparkUpgradeException
 You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'dd/MMMMM/yyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
+
+
+-- !query
+select dateadd(MICROSECOND, 1001, timestamp'2022-02-25 01:02:03.123')
+-- !query schema
+struct<timestampadd(MICROSECOND, 1001, TIMESTAMP '2022-02-25 01:02:03.123'):timestamp>
+-- !query output
+2022-02-25 01:02:03.124001
+
+
+-- !query
+select date_add(MILLISECOND, -1, timestamp'2022-02-25 01:02:03.456')
+-- !query schema
+struct<timestampadd(MILLISECOND, -1, TIMESTAMP '2022-02-25 01:02:03.456'):timestamp>
+-- !query output
+2022-02-25 01:02:03.455
+
+
+-- !query
+select dateadd(SECOND, 58, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(SECOND, 58, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 01:03:01
+
+
+-- !query
+select date_add(MINUTE, -100, date'2022-02-25')
+-- !query schema
+struct<timestampadd(MINUTE, -100, DATE '2022-02-25'):timestamp>
+-- !query output
+2022-02-24 22:20:00
+
+
+-- !query
+select dateadd(HOUR, -1, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(HOUR, -1, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 00:02:03
+
+
+-- !query
+select date_add(DAY, 367, date'2022-02-25')

Review comment:
       @cloud-fan no both expressions can take date and timestamp inputs it is just a simple alias.
   
   @MaxGekk yes I agree fixing it in a separate PR makes sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35661:
URL: https://github.com/apache/spark/pull/35661#issuecomment-1053593278


   @srielau @entong @superdupershant Any feedback is welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35661:
URL: https://github.com/apache/spark/pull/35661#issuecomment-1054329785


   Merging to master. Thank you, @HyukjinKwon @superdupershant @cloud-fan @gengliangwang for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35661:
URL: https://github.com/apache/spark/pull/35661#discussion_r815762212



##########
File path: sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
##########
@@ -660,3 +660,83 @@ struct<>
 -- !query output
 org.apache.spark.SparkUpgradeException
 You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'dd/MMMMM/yyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
+
+
+-- !query
+select dateadd(MICROSECOND, 1001, timestamp'2022-02-25 01:02:03.123')
+-- !query schema
+struct<timestampadd(MICROSECOND, 1001, TIMESTAMP '2022-02-25 01:02:03.123'):timestamp>
+-- !query output
+2022-02-25 01:02:03.124001
+
+
+-- !query
+select date_add(MILLISECOND, -1, timestamp'2022-02-25 01:02:03.456')
+-- !query schema
+struct<timestampadd(MILLISECOND, -1, TIMESTAMP '2022-02-25 01:02:03.456'):timestamp>
+-- !query output
+2022-02-25 01:02:03.455
+
+
+-- !query
+select dateadd(SECOND, 58, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(SECOND, 58, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 01:03:01
+
+
+-- !query
+select date_add(MINUTE, -100, date'2022-02-25')
+-- !query schema
+struct<timestampadd(MINUTE, -100, DATE '2022-02-25'):timestamp>
+-- !query output
+2022-02-24 22:20:00
+
+
+-- !query
+select dateadd(HOUR, -1, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(HOUR, -1, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 00:02:03
+
+
+-- !query
+select date_add(DAY, 367, date'2022-02-25')

Review comment:
       Do you mean timestampadd can take date inputs but dateadd can't take timestamp inputs? Then it's not a simple alias any more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] superdupershant commented on a change in pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
superdupershant commented on a change in pull request #35661:
URL: https://github.com/apache/spark/pull/35661#discussion_r815539428



##########
File path: sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
##########
@@ -660,3 +660,83 @@ struct<>
 -- !query output
 org.apache.spark.SparkUpgradeException
 You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'dd/MMMMM/yyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
+
+
+-- !query
+select dateadd(MICROSECOND, 1001, timestamp'2022-02-25 01:02:03.123')
+-- !query schema
+struct<timestampadd(MICROSECOND, 1001, TIMESTAMP '2022-02-25 01:02:03.123'):timestamp>
+-- !query output
+2022-02-25 01:02:03.124001
+
+
+-- !query
+select date_add(MILLISECOND, -1, timestamp'2022-02-25 01:02:03.456')
+-- !query schema
+struct<timestampadd(MILLISECOND, -1, TIMESTAMP '2022-02-25 01:02:03.456'):timestamp>
+-- !query output
+2022-02-25 01:02:03.455
+
+
+-- !query
+select dateadd(SECOND, 58, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(SECOND, 58, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 01:03:01
+
+
+-- !query
+select date_add(MINUTE, -100, date'2022-02-25')
+-- !query schema
+struct<timestampadd(MINUTE, -100, DATE '2022-02-25'):timestamp>
+-- !query output
+2022-02-24 22:20:00
+
+
+-- !query
+select dateadd(HOUR, -1, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(HOUR, -1, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 00:02:03
+
+
+-- !query
+select date_add(DAY, 367, date'2022-02-25')

Review comment:
       I think the convention is if the third argument is of type DATE the expression's return type is:
   
   DATE if the unit is DAY or larger
   TIMESTAMP if the unit is smaller than DAY i.e. (HOUR, MINUTE, SECOND, etc...)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35661:
URL: https://github.com/apache/spark/pull/35661#discussion_r815627202



##########
File path: sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
##########
@@ -660,3 +660,83 @@ struct<>
 -- !query output
 org.apache.spark.SparkUpgradeException
 You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'dd/MMMMM/yyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
+
+
+-- !query
+select dateadd(MICROSECOND, 1001, timestamp'2022-02-25 01:02:03.123')
+-- !query schema
+struct<timestampadd(MICROSECOND, 1001, TIMESTAMP '2022-02-25 01:02:03.123'):timestamp>
+-- !query output
+2022-02-25 01:02:03.124001
+
+
+-- !query
+select date_add(MILLISECOND, -1, timestamp'2022-02-25 01:02:03.456')
+-- !query schema
+struct<timestampadd(MILLISECOND, -1, TIMESTAMP '2022-02-25 01:02:03.456'):timestamp>
+-- !query output
+2022-02-25 01:02:03.455
+
+
+-- !query
+select dateadd(SECOND, 58, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(SECOND, 58, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 01:03:01
+
+
+-- !query
+select date_add(MINUTE, -100, date'2022-02-25')
+-- !query schema
+struct<timestampadd(MINUTE, -100, DATE '2022-02-25'):timestamp>
+-- !query output
+2022-02-24 22:20:00
+
+
+-- !query
+select dateadd(HOUR, -1, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(HOUR, -1, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 00:02:03
+
+
+-- !query
+select date_add(DAY, 367, date'2022-02-25')

Review comment:
       I don't think we should do such kind of changes in the PR which just introduces aliases. Let's do that separately w/ tests if the feature is supported by other DBMSs as well. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35661:
URL: https://github.com/apache/spark/pull/35661#discussion_r815627202



##########
File path: sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
##########
@@ -660,3 +660,83 @@ struct<>
 -- !query output
 org.apache.spark.SparkUpgradeException
 You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'dd/MMMMM/yyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
+
+
+-- !query
+select dateadd(MICROSECOND, 1001, timestamp'2022-02-25 01:02:03.123')
+-- !query schema
+struct<timestampadd(MICROSECOND, 1001, TIMESTAMP '2022-02-25 01:02:03.123'):timestamp>
+-- !query output
+2022-02-25 01:02:03.124001
+
+
+-- !query
+select date_add(MILLISECOND, -1, timestamp'2022-02-25 01:02:03.456')
+-- !query schema
+struct<timestampadd(MILLISECOND, -1, TIMESTAMP '2022-02-25 01:02:03.456'):timestamp>
+-- !query output
+2022-02-25 01:02:03.455
+
+
+-- !query
+select dateadd(SECOND, 58, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(SECOND, 58, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 01:03:01
+
+
+-- !query
+select date_add(MINUTE, -100, date'2022-02-25')
+-- !query schema
+struct<timestampadd(MINUTE, -100, DATE '2022-02-25'):timestamp>
+-- !query output
+2022-02-24 22:20:00
+
+
+-- !query
+select dateadd(HOUR, -1, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(HOUR, -1, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 00:02:03
+
+
+-- !query
+select date_add(DAY, 367, date'2022-02-25')

Review comment:
       I don't think we should do such kind of changes in the PR which just introduces aliases. Let's do that separately w/ tests if the feature is supported by other DBMSs as well. 
   
   Also I would prefer to not link such changes to `DATEADD` since they affect `TIMESTAMPADD`. It is slightly strange to me that a timestamp function will return date type. Let's discuss this separately.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #35661:
URL: https://github.com/apache/spark/pull/35661


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] superdupershant commented on a change in pull request #35661: [SPARK-38332][SQL] Add the `DATEADD()` and `DATE_ADD()` aliases for `TIMESTAMPADD()`

Posted by GitBox <gi...@apache.org>.
superdupershant commented on a change in pull request #35661:
URL: https://github.com/apache/spark/pull/35661#discussion_r815539428



##########
File path: sql/core/src/test/resources/sql-tests/results/ansi/date.sql.out
##########
@@ -660,3 +660,83 @@ struct<>
 -- !query output
 org.apache.spark.SparkUpgradeException
 You may get a different result due to the upgrading of Spark 3.0: Fail to recognize 'dd/MMMMM/yyyy' pattern in the DateTimeFormatter. 1) You can set spark.sql.legacy.timeParserPolicy to LEGACY to restore the behavior before Spark 3.0. 2) You can form a valid datetime pattern with the guide from https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
+
+
+-- !query
+select dateadd(MICROSECOND, 1001, timestamp'2022-02-25 01:02:03.123')
+-- !query schema
+struct<timestampadd(MICROSECOND, 1001, TIMESTAMP '2022-02-25 01:02:03.123'):timestamp>
+-- !query output
+2022-02-25 01:02:03.124001
+
+
+-- !query
+select date_add(MILLISECOND, -1, timestamp'2022-02-25 01:02:03.456')
+-- !query schema
+struct<timestampadd(MILLISECOND, -1, TIMESTAMP '2022-02-25 01:02:03.456'):timestamp>
+-- !query output
+2022-02-25 01:02:03.455
+
+
+-- !query
+select dateadd(SECOND, 58, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(SECOND, 58, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 01:03:01
+
+
+-- !query
+select date_add(MINUTE, -100, date'2022-02-25')
+-- !query schema
+struct<timestampadd(MINUTE, -100, DATE '2022-02-25'):timestamp>
+-- !query output
+2022-02-24 22:20:00
+
+
+-- !query
+select dateadd(HOUR, -1, timestamp'2022-02-25 01:02:03')
+-- !query schema
+struct<timestampadd(HOUR, -1, TIMESTAMP '2022-02-25 01:02:03'):timestamp>
+-- !query output
+2022-02-25 00:02:03
+
+
+-- !query
+select date_add(DAY, 367, date'2022-02-25')

Review comment:
       I think the convention is if the third argument is of type DATE the expression's return type is:
   
   DATE if the unit DAY or larger
   TIMESTAMP if the unit is smaller than DAY i.e. (HOUR, MINUTE, SECOND, etc...)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org