You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/13 16:27:58 UTC

[GitHub] [spark] MaxGekk opened a new pull request #35502: [WIP][SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

MaxGekk opened a new pull request #35502:
URL: https://github.com/apache/spark/pull/35502


   ### What changes were proposed in this pull request?
   TODO.
   
   ### Why are the changes needed?
   1. To make the migration process from other systems to Spark SQL easier.
   2. To achieve feature parity with other DBMSs.
   
   ### Does this PR introduce _any_ user-facing change?
   No. This is new feature.
   
   ### How was this patch tested?
   By running new tests:
   ```
   $ 
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809726416



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       We can but I just wonder what kind of problem would the earlier check solve? Parser and compiler do a lot of work for now, adding more unnecessary things should be motivated somehow, from my point of view. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809793748



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Also I would like to add, what you are taking is about a mistake in a query actually, like:
   ```sql
   SELECT timestampadd(YEER, 1, timestampColumn);
   ```
   Such kind of mistakes are not permanent, and usually users fix them during the debug stage. There are no so much reasons to double check such mistakes at parsing and in runtime (we must do that since `unit` can be non-foldable). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810812193



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -4510,4 +4510,15 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
   private def alterViewTypeMismatchHint: Option[String] = Some("Please use ALTER TABLE instead.")
 
   private def alterTableTypeMismatchHint: Option[String] = Some("Please use ALTER VIEW instead.")
+
+  /**
+   * Create a TimestampAdd expression.
+   */
+  override def visitTimestampadd(ctx: TimestampaddContext): Expression = withOrigin(ctx) {
+    val arguments = Seq(
+      Literal(ctx.unit.getText),

Review comment:
       This is not the single enter point for `timestampadd`, so, the **must** word is not applicable. See the example above https://github.com/apache/spark/pull/35502#discussion_r810112314, how it should work if `the unit parameter **must** be foldable`. BTW, I will open an JIRA to implement the optimization when `unit` is foldable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809781924



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Failing earlier is a pretty strong reason, right? It's a waste of resource if we submit a Spark job which fails with wrong unit name.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810064290



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       > I don't understand why we suddenly want to stop doing it from this PR.
   
   1. The `unit` param can be non-foldable. I made it generic intentionally. If you wonder why, I will answer to that separately.
   2. As `unit` can be non-foldable, we need the runtime check.
   3. If we add checks in parser, we will do checks twice at parsing and at execution... which is not necessary because
   4. We can handle foldable `unit` in codegen as an optimization where we (of course) have to check `unit` values at the optimization phase.
   
   As summary, taking into account that we will optimize foldable `unit` in codegen in the near future where we validate correctness of `unit`, there is no need to do that in parser as you proposed.
   
   > Example: EXTRACT, TO_BINARY, TO_NUMBER
   
   The expressions require one of their param (format, field and etc) to be **always** foldable. In the case, of `TIMESTAMPADD()` is unnecessary restriction, I believe. I have faced to the situation a few times in my life when some code was deployed in the production after testing, and need to increase precision of intervals. Let's say we had:
   ```sql
   select timestampadd(SECOND, tbl.quantity, tbl.ts1) 
   ```
   , and we wants to bump precision of `tbl.quantity` to milliseconds. Since `quantity` is a column in a table, we can just multiply it by 1000 during a maintenance time but we should do with `SECOND`?  We have to re-deploy to code, including pass whole release cycle, only because a Spark dev forced us to hard-code the `SECOND` in our code, for some unclear reasons.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809799636



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Even for now, Spark could catch the mistake at the optimization phase (if params are foldable), see the exception comes from:
   ```sql
   spark-sql> select timestampadd(YEER, 1, timestamp'now');
   org.apache.spark.SparkIllegalArgumentException: The value of parameter(s) 'unit' in timestampadd is invalid: YEER
   	at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidUnitInTimestampAdd(QueryExecutionErrors.scala:1970)
   	at org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1199)
   	at org.apache.spark.sql.catalyst.expressions.TimestampAdd.nullSafeEval(datetimeExpressions.scala:3125)
   	at org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:753)
   	at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:69)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809691616



##########
File path: docs/sql-ref-ansi-compliance.md
##########
@@ -573,6 +573,7 @@ Below is a list of all the keywords in Spark SQL.
 |THEN|reserved|non-reserved|reserved|
 |TIME|reserved|non-reserved|reserved|
 |TIMESTAMP|non-reserved|non-reserved|non-reserved|
+|TIMESTAMPADD|non-reserved|non-reserved|non-reserved|

Review comment:
       Let's discuss overloading of `DATEADD` in a separate JIRA. This is arguable, and need to reach a consensus, from my point of view.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on pull request #35502:
URL: https://github.com/apache/spark/pull/35502#issuecomment-1042916613


   @srielau @entong @superdupershant Any feedback is welcome.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] superdupershant commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

superdupershant commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809619316



##########
File path: docs/sql-ref-ansi-compliance.md
##########
@@ -573,6 +573,7 @@ Below is a list of all the keywords in Spark SQL.
 |THEN|reserved|non-reserved|reserved|
 |TIME|reserved|non-reserved|reserved|
 |TIMESTAMP|non-reserved|non-reserved|non-reserved|
+|TIMESTAMPADD|non-reserved|non-reserved|non-reserved|

Review comment:
       Do you want to add DATEADD as an alias for the same function in this PR?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810112314



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       > so that the unit parameter is an identifier.
   
   It can be an identifier or a string column, see:
   ```sql
   spark-sql> create table tbl as select 'SECOND' as u, 1 as q, timestamp'now' as t;
   spark-sql> select * from tbl;
   SECOND	1	2022-02-18 18:33:34.939
   spark-sql> select timestampadd(tbl.u, q, t) from tbl;
   2022-02-18 18:33:35.939
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809971128



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       But this violates the convention: we try to fail as early as possible in all other places. I don't understand why we suddenly want to stop doing it from this PR. It's just a few CPU cycles in this case and I don't see how it's a performance overhead for the driver.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810092688



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       I'm totally OK with your decision if the unit parameter can be unfoldable, but it seems not the case? We even added a special parse rule for this function so that the unit parameter is an identifier.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810752074



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -4510,4 +4510,15 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
   private def alterViewTypeMismatchHint: Option[String] = Some("Please use ALTER TABLE instead.")
 
   private def alterTableTypeMismatchHint: Option[String] = Some("Please use ALTER VIEW instead.")
+
+  /**
+   * Create a TimestampAdd expression.
+   */
+  override def visitTimestampadd(ctx: TimestampaddContext): Expression = withOrigin(ctx) {
+    val arguments = Seq(
+      Literal(ctx.unit.getText),

Review comment:
       @MaxGekk I think this indicates the unit parameter must be foldable?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809709153



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Since we don't the optimization for foldable unit names. It doesn't make sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810112314



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       > so that the unit parameter is an identifier.
   
   It can be an identifier or a string column, see:
   ```sql
   spark-sql> create table tbl as select 'SECOND' as u, 1 as q, timestamp'now' as t;
   spark-sql> select * from tbl;
   SECOND	1	2022-02-18 18:33:34.939
   spark-sql> select timestampadd(tbl.u, q, t) from tbl;
   2022-02-18 18:33:35.939
   ```
   or
   ```sql
   spark-sql> select timestampadd('HOUR', 1, timestamp'now');
   2022-02-18 19:38:54.817
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809793748



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Also I would like to add, what are you taking is about a mistake in a query actually, like:
   ```sql
   SELECT timestampadd(YEER, 1, timestampColumn);
   ```
   Such kind of mistakes are not permanent, and usually users fix them during the debug stage. There are no so much reasons to double check such mistakes at parsing and in runtime (we must do that since `unit` can be non-foldable). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809695261



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Shall we check the unit names in the parser? To fail earlier.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810820679



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -4510,4 +4510,15 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
   private def alterViewTypeMismatchHint: Option[String] = Some("Please use ALTER TABLE instead.")
 
   private def alterTableTypeMismatchHint: Option[String] = Some("Please use ALTER VIEW instead.")
+
+  /**
+   * Create a TimestampAdd expression.
+   */
+  override def visitTimestampadd(ctx: TimestampaddContext): Expression = withOrigin(ctx) {
+    val arguments = Seq(
+      Literal(ctx.unit.getText),

Review comment:
       Ah I see, then I think the current implementation is fine.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810812193



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/AstBuilder.scala
##########
@@ -4510,4 +4510,15 @@ class AstBuilder extends SqlBaseBaseVisitor[AnyRef] with SQLConfHelper with Logg
   private def alterViewTypeMismatchHint: Option[String] = Some("Please use ALTER TABLE instead.")
 
   private def alterTableTypeMismatchHint: Option[String] = Some("Please use ALTER VIEW instead.")
+
+  /**
+   * Create a TimestampAdd expression.
+   */
+  override def visitTimestampadd(ctx: TimestampaddContext): Expression = withOrigin(ctx) {
+    val arguments = Seq(
+      Literal(ctx.unit.getText),

Review comment:
       This is not the single enter point for `timestampadd`, so, the **must** word is not applicable. See the example above https://github.com/apache/spark/pull/35502#discussion_r810112314, how it should work if _the unit parameter **must** be foldable_. BTW, I will open an JIRA to implement the optimization when `unit` is foldable.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810638665



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       > the first argument, unit, should be a datetime **interval type**
   
   I didn't get your point. How it could be the interval type?
   
   > ... as in what's used with EXTRACT
   
   Just wonder why do you linked `TIMESTAMPADD` to `EXTRACT` but not to `TIMESTAMPDIFF`, for example.  Anyway technically specking the type of the first argument is the same - string type.
   
   > ... makes things any simpler.
   
   This PR achieve this goal, I believe. It makes the migration process to Spark SQL simpler, and gives additional benefits of using Spark SQL in the real production (see my comment above).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809709153



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Since we don't the optimization for foldable unit names. It doesn't make sense.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809787232



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       >  It's a waste of resource if we submit a Spark job which fails with wrong unit name.
   
   Not sure. Can you imagine a cluster of 1000 executors waiting for the driver that is still processing a query because we eagerly want to check everything even when user's queries and data don't have any issues. This is real waste of user's resources. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809695762



##########
File path: sql/core/src/test/resources/sql-tests/inputs/timestamp.sql
##########
@@ -142,3 +142,9 @@ select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
 select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
 select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
 select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
+
+-- Add a number of units to a timestamp or a date
+select timestampadd('MONTH', -1, timestamp'2022-02-14 01:02:03');
+select timestampadd(MINUTE, 58, timestamp'2022-02-14 01:02:03');
+select timestampadd(YEAR, 1, date'2022-02-15');
+select timestampadd('SECOND', -1, date'2022-02-15');

Review comment:
       can we have some negative tests? e.g. invalid unit name, overflow, etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk closed pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk closed pull request #35502:
URL: https://github.com/apache/spark/pull/35502


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on pull request #35502:
URL: https://github.com/apache/spark/pull/35502#issuecomment-1043947445


   Merging to master. Thank you, @gengliangwang @HyukjinKwon and @superdupershant for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809724649



##########
File path: sql/core/src/test/resources/sql-tests/inputs/timestamp.sql
##########
@@ -142,3 +142,9 @@ select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
 select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
 select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
 select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
+
+-- Add a number of units to a timestamp or a date
+select timestampadd('MONTH', -1, timestamp'2022-02-14 01:02:03');
+select timestampadd(MINUTE, 58, timestamp'2022-02-14 01:02:03');
+select timestampadd(YEAR, 1, date'2022-02-15');
+select timestampadd('SECOND', -1, date'2022-02-15');

Review comment:
       Regarding to overflow, actually I reused methods of adding ANSI intervals to timestamps. I think we should test overflow for both ANSI intervals and for timestampadd(). I will open an JIRA for that.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

cloud-fan commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809972875



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Example: `EXTRACT`, `TO_BINARY`, `TO_NUMBER`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] superdupershant commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

superdupershant commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r810630151



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Technically speaking the first argument, unit, should be a datetime interval type as in what's used with EXTRACT. It isn't meant to be a string if that makes things any simpler.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809708085



##########
File path: sql/core/src/test/resources/sql-tests/inputs/timestamp.sql
##########
@@ -142,3 +142,9 @@ select to_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
 select unix_timestamp('22 05 2020 Friday', 'dd MM yyyy EEEEE');
 select from_json('{"t":"26/October/2015"}', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
 select from_csv('26/October/2015', 't Timestamp', map('timestampFormat', 'dd/MMMMM/yyyy'));
+
+-- Add a number of units to a timestamp or a date
+select timestampadd('MONTH', -1, timestamp'2022-02-14 01:02:03');
+select timestampadd(MINUTE, 58, timestamp'2022-02-14 01:02:03');
+select timestampadd(YEAR, 1, date'2022-02-15');
+select timestampadd('SECOND', -1, date'2022-02-15');

Review comment:
       The test for invalid unit name is in this PR, see the test for error class.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] MaxGekk commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

MaxGekk commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r809799636



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
##########
@@ -1163,4 +1163,40 @@ object DateTimeUtils {
     val localStartTs = getLocalDateTime(startMicros, zoneId)
     ChronoUnit.MICROS.between(localStartTs, localEndTs)
   }
+
+  /**
+   * Adds the specified number of units to a timestamp.
+   *
+   * @param unit A keyword that specifies the interval units to add to the input timestamp.
+   * @param quantity The amount of `unit`s to add. It can be positive or negative.
+   * @param micros The input timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   * @param zoneId The time zone ID at which the operation is performed.
+   * @return A timestamp value, expressed in microseconds since 1970-01-01 00:00:00Z.
+   */
+  def timestampAdd(unit: String, quantity: Int, micros: Long, zoneId: ZoneId): Long = {
+    unit.toUpperCase(Locale.ROOT) match {
+      case "MICROSECOND" =>
+        timestampAddDayTime(micros, quantity, zoneId)
+      case "MILLISECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MILLIS, zoneId)
+      case "SECOND" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_SECOND, zoneId)
+      case "MINUTE" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_MINUTE, zoneId)
+      case "HOUR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_HOUR, zoneId)
+      case "DAY" | "DAYOFYEAR" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY, zoneId)
+      case "WEEK" =>
+        timestampAddDayTime(micros, quantity * MICROS_PER_DAY * DAYS_PER_WEEK, zoneId)
+      case "MONTH" =>
+        timestampAddMonths(micros, quantity, zoneId)
+      case "QUARTER" =>
+        timestampAddMonths(micros, quantity * 3, zoneId)
+      case "YEAR" =>
+        timestampAddMonths(micros, quantity * MONTHS_PER_YEAR, zoneId)
+      case _ =>
+        throw QueryExecutionErrors.invalidUnitInTimestampAdd(unit)

Review comment:
       Even for now, Spark catches the mistake at the optimization phase, see the exception comes from:
   ```sql
   spark-sql> select timestampadd(YEER, 1, timestamp'now');
   org.apache.spark.SparkIllegalArgumentException: The value of parameter(s) 'unit' in timestampadd is invalid: YEER
   	at org.apache.spark.sql.errors.QueryExecutionErrors$.invalidUnitInTimestampAdd(QueryExecutionErrors.scala:1970)
   	at org.apache.spark.sql.catalyst.util.DateTimeUtils$.timestampAdd(DateTimeUtils.scala:1199)
   	at org.apache.spark.sql.catalyst.expressions.TimestampAdd.nullSafeEval(datetimeExpressions.scala:3125)
   	at org.apache.spark.sql.catalyst.expressions.TernaryExpression.eval(Expression.scala:753)
   	at org.apache.spark.sql.catalyst.optimizer.ConstantFolding$$anonfun$apply$1$$anonfun$applyOrElse$1.applyOrElse(expressions.scala:69)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35502: [SPARK-38195][SQL] Add the `TIMESTAMPADD()` function

Posted by GitBox <gi...@apache.org>.

HyukjinKwon commented on a change in pull request #35502:
URL: https://github.com/apache/spark/pull/35502#discussion_r808587811



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3057,3 +3057,87 @@ case class ConvertTimezone(
     copy(sourceTz = newFirst, targetTz = newSecond, sourceTs = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, interval, timestamp) - Adds the specified number of units to the given timestamp.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of datetime that you want to add.
+        Supported string values of `unit` are (case insensitive):
+          - "YEAR", "SQL_TSI_YEAR"
+          - "QUARTER", "SQL_TSI_QUARTER" - 3 months
+          - "MONTH", "SQL_TSI_MONTH"
+          - "WEEK", "SQL_TSI_WEEK" - 7 days
+          - "DAY", "DAYOFYEAR", "SQL_TSI_DAY", "SQL_TSI_DAYOFYEAR"
+          - "HOUR", "SQL_TSI_HOUR"
+          - "MINUTE", "SQL_TSI_MINUTE"
+          - "SECOND", "SQL_TSI_SECOND"
+          - "MILLISECOND", "SQL_TSI_FRAC_SECOND" - milliseconds
+          - "MICROSECOND"
+      * interval - this is the number of units of time that you want to add.
+      * timestamp - This is a timestamp (w/ or w/o timezone) to which you want to add.

Review comment:
       nit ... 
   
   ```suggestion
         * timestamp - this is a timestamp (w/ or w/o timezone) to which you want to add.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org