You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/02/22 09:20:49 UTC

[GitHub] [spark] MaxGekk opened a new pull request #35607: [WIP][SPARK-38284][SQL] Add the TIMESTAMPDIFF() function

MaxGekk opened a new pull request #35607:
URL: https://github.com/apache/spark/pull/35607


   ### What changes were proposed in this pull request?
   In the PR, I propose to add new function `TIMESTAMPDIFF` with the following parameters:
   1. `unit` - sthis indicates the units of the difference between the given timestamps. Supported string values of `unit` are (case insensitive):
      - YEAR
      - QUARTER
      - MONTH
      - WEEK
      - DAY, DAYOFYEAR
      - HOUR
      - MINUTE
      - SECOND
      - MILLISECOND
      - MICROSECOND
   2. `timestampStart` - A timestamp which the expression subtracts from `timestampEnd`.
   3. `timestampEnd` - A timestamp from which the expression subtracts `timestampStart`.
   
   ### Why are the changes needed?
   1. To make the migration process from other systems to Spark SQL easier.
   2. To achieve feature parity with other DBMSs.
   
   ### Does this PR introduce _any_ user-facing change?
   No. This is new feature.
   
   ### How was this patch tested?
   By running new tests:
   ```
   $ build/sbt "sql/test:testOnly org.apache.spark.sql.expressions.ExpressionInfoSuite"
   $ build/sbt "sql/testOnly *ExpressionsSchemaSuite"
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r813579311



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3141,3 +3141,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.

Review comment:
       The behavior has been described in PR's description already:
   "... The function calculates the amount in terms of this unit, and returns a **whole** number, representing the number of **complete units between the two timestamps**. "




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35607: [WIP][SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35607:
URL: https://github.com/apache/spark/pull/35607#issuecomment-1048554196


   @HyukjinKwon Is this known issue? One GA is failing with this but I do believe it is not related to my changes.
   ```
   annotations failed mypy checks:
   python/pyspark/pandas/generic.py:585: error: Incompatible return value type (got "Union[ndarray[Any, Any], ExtensionArray]", expected "ndarray[Any, Any]")  [return-value]
   Found 1 error in 1 file (checked 3[24](https://github.com/MaxGekk/spark/runs/5294084279?check_suite_focus=true#step:15:24) source files)
   1
   Error: Process completed with exit code 1.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r814834548



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
##########
@@ -989,4 +989,58 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {
     }
     assert(e.getMessage.contains("invalid: SECS"))
   }
+
+  test("SPARK-38284: difference between two timestamps in units") {
+    outstandingZoneIds.foreach { zid =>
+      assert(timestampDiff("MICROSECOND",
+        date(2022, 2, 14, 11, 27, 0, 0, zid), date(2022, 2, 14, 11, 27, 0, 1, zid), zid) === 1)
+      assert(timestampDiff("MILLISECOND",
+        date(2022, 2, 14, 11, 27, 0, 1000, zid), date(2022, 2, 14, 11, 27, 0, 0, zid), zid) === -1)
+      assert(timestampDiff(
+        "SECOND",
+        date(2022, 2, 14, 11, 27, 0, 1001, zid),
+        date(2022, 2, 14, 11, 27, 0, 1001, zid),
+        zid) === 0)
+      assert(timestampDiff(
+        "MINUTE",
+        date(2022, 2, 14, 11, 0, 1, 1, zid),
+        date(2022, 2, 14, 9, 30, 1, 1, zid),
+        zid) === -90)
+      assert(timestampDiff(
+        "HOUR",
+        date(2022, 2, 14, 11, 0, 1, 0, zid),
+        date(2022, 2, 15, 11, 0, 1, 0, zid),
+        zid) === 24)
+      assert(timestampDiff(
+        "DAY",
+        date(2022, 2, 28, 11, 1, 0, 0, zid),
+        date(2022, 3, 1, 11, 1, 0, 0, zid),
+        zid) === 1)
+      assert(timestampDiff("WEEK",
+        date(2022, 2, 14, 11, 43, 0, 1, zid), date(2022, 2, 21, 11, 43, 0, 1, zid), zid) === 1)

Review comment:
       done. I modified some existing checks.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r815821727



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3131,3 +3131,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units by truncating the fraction part.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.
+        Supported string values of `unit` are (case insensitive):
+          - "YEAR"
+          - "QUARTER" - 3 months
+          - "MONTH"
+          - "WEEK" - 7 days
+          - "DAY"
+          - "HOUR"
+          - "MINUTE"
+          - "SECOND"
+          - "MILLISECOND"
+          - "MICROSECOND"
+      * startTimestamp - A timestamp which the expression subtracts from `endTimestamp`.
+      * endTimestamp - A timestamp from which the expression subtracts `startTimestamp`.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_('HOUR', timestamp_ntz'2022-02-11 20:30:00', timestamp_ntz'2022-02-12 04:30:00');
+       8
+      > SELECT _FUNC_('MONTH', timestamp_ltz'2022-01-01 00:00:00', timestamp_ltz'2022-02-28 00:00:00');
+       1
+      > SELECT _FUNC_(SECOND, date'2022-01-01', timestamp'2021-12-31 23:59:50');
+       -10
+      > SELECT _FUNC_(YEAR, timestamp'2000-01-01 01:02:03.123456', timestamp'2010-01-01 01:02:03.123456');
+       10
+  """,
+  group = "datetime_funcs",
+  since = "3.3.0")
+// scalastyle:on line.size.limit
+case class TimestampDiff(
+    unit: Expression,
+    startTimestamp: Expression,
+    endTimestamp: Expression,
+    timeZoneId: Option[String] = None)
+  extends TernaryExpression
+  with ImplicitCastInputTypes
+  with NullIntolerant
+  with TimeZoneAwareExpression {
+
+  def this(unit: Expression, quantity: Expression, timestamp: Expression) =
+    this(unit, quantity, timestamp, None)
+
+  override def first: Expression = unit
+  override def second: Expression = startTimestamp
+  override def third: Expression = endTimestamp
+
+  override def inputTypes: Seq[AbstractDataType] =
+    Seq(StringType, AnyTimestampType, AnyTimestampType)

Review comment:
       ok seems like a bug and if we require both inputs to be ltz the bug will be fixed.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk closed pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
MaxGekk closed pull request #35607:
URL: https://github.com/apache/spark/pull/35607


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r813580705



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3141,3 +3141,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.

Review comment:
       end-users can't see PR descriptions... We need to put them in the function doc.
   
   BTW what does "whole number" mean? 1.6 years will become 2 years?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r815816857



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3131,3 +3131,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units by truncating the fraction part.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.
+        Supported string values of `unit` are (case insensitive):
+          - "YEAR"
+          - "QUARTER" - 3 months
+          - "MONTH"
+          - "WEEK" - 7 days
+          - "DAY"
+          - "HOUR"
+          - "MINUTE"
+          - "SECOND"
+          - "MILLISECOND"
+          - "MICROSECOND"
+      * startTimestamp - A timestamp which the expression subtracts from `endTimestamp`.
+      * endTimestamp - A timestamp from which the expression subtracts `startTimestamp`.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_('HOUR', timestamp_ntz'2022-02-11 20:30:00', timestamp_ntz'2022-02-12 04:30:00');
+       8
+      > SELECT _FUNC_('MONTH', timestamp_ltz'2022-01-01 00:00:00', timestamp_ltz'2022-02-28 00:00:00');
+       1
+      > SELECT _FUNC_(SECOND, date'2022-01-01', timestamp'2021-12-31 23:59:50');
+       -10
+      > SELECT _FUNC_(YEAR, timestamp'2000-01-01 01:02:03.123456', timestamp'2010-01-01 01:02:03.123456');
+       10
+  """,
+  group = "datetime_funcs",
+  since = "3.3.0")
+// scalastyle:on line.size.limit
+case class TimestampDiff(
+    unit: Expression,
+    startTimestamp: Expression,
+    endTimestamp: Expression,
+    timeZoneId: Option[String] = None)
+  extends TernaryExpression
+  with ImplicitCastInputTypes
+  with NullIntolerant
+  with TimeZoneAwareExpression {
+
+  def this(unit: Expression, quantity: Expression, timestamp: Expression) =
+    this(unit, quantity, timestamp, None)
+
+  override def first: Expression = unit
+  override def second: Expression = startTimestamp
+  override def third: Expression = endTimestamp
+
+  override def inputTypes: Seq[AbstractDataType] =
+    Seq(StringType, AnyTimestampType, AnyTimestampType)

Review comment:
       @cloud-fan good point.
   I just tried running  some tests:
   ```
   > select timestampdiff(second, timestamp_ntz'2021-01-01 00:00:00', timestamp_ltz'2021-01-01 00:00:00'
   28800
   
   > select timestamp_ntz'2021-01-01 00:00:00' - timestamp_ltz'2021-01-01 00:00:00'
   INTERVAL '0 00:00:00' DAY TO SECOND  
   ```
   I think we can simply write the input type as TimestampType and the implicit cast trait will handle the rest input combination.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r813587668



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3141,3 +3141,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.

Review comment:
       whole and complete means 1 year for 1.6 years.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] HyukjinKwon commented on pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
HyukjinKwon commented on pull request #35607:
URL: https://github.com/apache/spark/pull/35607#issuecomment-1050517713


   https://github.com/apache/spark/pull/35607#issuecomment-1048554196 @MaxGekk Just saw this. It's fixed in https://github.com/apache/spark/commit/b46b74ce0521d1d5e7c09cadad0e9639e31214cb


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r815757991



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3131,3 +3131,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units by truncating the fraction part.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.
+        Supported string values of `unit` are (case insensitive):
+          - "YEAR"
+          - "QUARTER" - 3 months
+          - "MONTH"
+          - "WEEK" - 7 days
+          - "DAY"
+          - "HOUR"
+          - "MINUTE"
+          - "SECOND"
+          - "MILLISECOND"
+          - "MICROSECOND"
+      * startTimestamp - A timestamp which the expression subtracts from `endTimestamp`.
+      * endTimestamp - A timestamp from which the expression subtracts `startTimestamp`.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_('HOUR', timestamp_ntz'2022-02-11 20:30:00', timestamp_ntz'2022-02-12 04:30:00');
+       8
+      > SELECT _FUNC_('MONTH', timestamp_ltz'2022-01-01 00:00:00', timestamp_ltz'2022-02-28 00:00:00');
+       1
+      > SELECT _FUNC_(SECOND, date'2022-01-01', timestamp'2021-12-31 23:59:50');
+       -10
+      > SELECT _FUNC_(YEAR, timestamp'2000-01-01 01:02:03.123456', timestamp'2010-01-01 01:02:03.123456');
+       10
+  """,
+  group = "datetime_funcs",
+  since = "3.3.0")
+// scalastyle:on line.size.limit
+case class TimestampDiff(
+    unit: Expression,
+    startTimestamp: Expression,
+    endTimestamp: Expression,
+    timeZoneId: Option[String] = None)
+  extends TernaryExpression
+  with ImplicitCastInputTypes
+  with NullIntolerant
+  with TimeZoneAwareExpression {
+
+  def this(unit: Expression, quantity: Expression, timestamp: Expression) =
+    this(unit, quantity, timestamp, None)
+
+  override def first: Expression = unit
+  override def second: Expression = startTimestamp
+  override def third: Expression = endTimestamp
+
+  override def inputTypes: Seq[AbstractDataType] =
+    Seq(StringType, AnyTimestampType, AnyTimestampType)

Review comment:
       I think another requirement is the start and end timestamp must be of the same type, if one is ntz and the other is ltz, we should cast ntz to ltz. cc @gengliangwang 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] MaxGekk commented on pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
MaxGekk commented on pull request #35607:
URL: https://github.com/apache/spark/pull/35607#issuecomment-1055536966


   All GA tests passed. Merging to master.
   Thank you, @gengliangwang @cloud-fan and @HyukjinKwon for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r815758858



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3131,3 +3131,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units by truncating the fraction part.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.
+        Supported string values of `unit` are (case insensitive):
+          - "YEAR"
+          - "QUARTER" - 3 months
+          - "MONTH"
+          - "WEEK" - 7 days
+          - "DAY"
+          - "HOUR"
+          - "MINUTE"
+          - "SECOND"
+          - "MILLISECOND"
+          - "MICROSECOND"
+      * startTimestamp - A timestamp which the expression subtracts from `endTimestamp`.
+      * endTimestamp - A timestamp from which the expression subtracts `startTimestamp`.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_('HOUR', timestamp_ntz'2022-02-11 20:30:00', timestamp_ntz'2022-02-12 04:30:00');
+       8
+      > SELECT _FUNC_('MONTH', timestamp_ltz'2022-01-01 00:00:00', timestamp_ltz'2022-02-28 00:00:00');
+       1
+      > SELECT _FUNC_(SECOND, date'2022-01-01', timestamp'2021-12-31 23:59:50');
+       -10
+      > SELECT _FUNC_(YEAR, timestamp'2000-01-01 01:02:03.123456', timestamp'2010-01-01 01:02:03.123456');
+       10
+  """,
+  group = "datetime_funcs",
+  since = "3.3.0")
+// scalastyle:on line.size.limit
+case class TimestampDiff(
+    unit: Expression,
+    startTimestamp: Expression,
+    endTimestamp: Expression,
+    timeZoneId: Option[String] = None)
+  extends TernaryExpression
+  with ImplicitCastInputTypes
+  with NullIntolerant
+  with TimeZoneAwareExpression {
+
+  def this(unit: Expression, quantity: Expression, timestamp: Expression) =
+    this(unit, quantity, timestamp, None)
+
+  override def first: Expression = unit
+  override def second: Expression = startTimestamp
+  override def third: Expression = endTimestamp
+
+  override def inputTypes: Seq[AbstractDataType] =
+    Seq(StringType, AnyTimestampType, AnyTimestampType)

Review comment:
       or should we just require both types to be ltz?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r815821269



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3131,3 +3131,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units by truncating the fraction part.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.
+        Supported string values of `unit` are (case insensitive):
+          - "YEAR"
+          - "QUARTER" - 3 months
+          - "MONTH"
+          - "WEEK" - 7 days
+          - "DAY"
+          - "HOUR"
+          - "MINUTE"
+          - "SECOND"
+          - "MILLISECOND"
+          - "MICROSECOND"
+      * startTimestamp - A timestamp which the expression subtracts from `endTimestamp`.
+      * endTimestamp - A timestamp from which the expression subtracts `startTimestamp`.
+  """,
+  examples = """
+    Examples:
+      > SELECT _FUNC_('HOUR', timestamp_ntz'2022-02-11 20:30:00', timestamp_ntz'2022-02-12 04:30:00');
+       8
+      > SELECT _FUNC_('MONTH', timestamp_ltz'2022-01-01 00:00:00', timestamp_ltz'2022-02-28 00:00:00');
+       1
+      > SELECT _FUNC_(SECOND, date'2022-01-01', timestamp'2021-12-31 23:59:50');
+       -10
+      > SELECT _FUNC_(YEAR, timestamp'2000-01-01 01:02:03.123456', timestamp'2010-01-01 01:02:03.123456');
+       10
+  """,
+  group = "datetime_funcs",
+  since = "3.3.0")
+// scalastyle:on line.size.limit
+case class TimestampDiff(
+    unit: Expression,
+    startTimestamp: Expression,
+    endTimestamp: Expression,
+    timeZoneId: Option[String] = None)
+  extends TernaryExpression
+  with ImplicitCastInputTypes
+  with NullIntolerant
+  with TimeZoneAwareExpression {
+
+  def this(unit: Expression, quantity: Expression, timestamp: Expression) =
+    this(unit, quantity, timestamp, None)
+
+  override def first: Expression = unit
+  override def second: Expression = startTimestamp
+  override def third: Expression = endTimestamp
+
+  override def inputTypes: Seq[AbstractDataType] =
+    Seq(StringType, AnyTimestampType, AnyTimestampType)

Review comment:
       hmm, the results between `timestampdiff` and `-` operator are not consistent now?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] gengliangwang commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
gengliangwang commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r815645219



##########
File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
##########
@@ -860,7 +860,8 @@ valueExpression
 
 primaryExpression
     : name=(CURRENT_DATE | CURRENT_TIMESTAMP | CURRENT_USER)                                   #currentLike
-    | TIMESTAMPADD '(' unit=identifier ',' unitsAmount=valueExpression ',' timestamp=valueExpression ')'  #timestampadd
+    | TIMESTAMPADD '(' unit=identifier ',' unitsAmount=valueExpression ',' timestamp=valueExpression ')'     #timestampadd
+    | TIMESTAMPDIFF '(' unit=identifier ',' startTimestamp=valueExpression ',' startEnd=valueExpression ')'  #timestampdiff

Review comment:
       nit: `startEnd` => `endTimestamp`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r813555126



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3141,3 +3141,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.

Review comment:
       shall we define the rounding behavior? e.g. if the actual result is 1.6 year, what do we return?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r813599484



##########
File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
##########
@@ -3141,3 +3141,88 @@ case class TimestampAdd(
     copy(unit = newFirst, quantity = newSecond, timestamp = newThird)
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = "_FUNC_(unit, startTimestamp, endTimestamp) - Gets the difference between the timestamps `endTimestamp` and `startTimestamp` in the specified units.",
+  arguments = """
+    Arguments:
+      * unit - this indicates the units of the difference between the given timestamps.

Review comment:
       ok so it simply truncates the fraction part. Can we clarify it in the function doc?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] [spark] cloud-fan commented on a change in pull request #35607: [SPARK-38284][SQL] Add the `TIMESTAMPDIFF()` function

Posted by GitBox <gi...@apache.org>.
cloud-fan commented on a change in pull request #35607:
URL: https://github.com/apache/spark/pull/35607#discussion_r813875572



##########
File path: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/DateTimeUtilsSuite.scala
##########
@@ -989,4 +989,58 @@ class DateTimeUtilsSuite extends SparkFunSuite with Matchers with SQLHelper {
     }
     assert(e.getMessage.contains("invalid: SECS"))
   }
+
+  test("SPARK-38284: difference between two timestamps in units") {
+    outstandingZoneIds.foreach { zid =>
+      assert(timestampDiff("MICROSECOND",
+        date(2022, 2, 14, 11, 27, 0, 0, zid), date(2022, 2, 14, 11, 27, 0, 1, zid), zid) === 1)
+      assert(timestampDiff("MILLISECOND",
+        date(2022, 2, 14, 11, 27, 0, 1000, zid), date(2022, 2, 14, 11, 27, 0, 0, zid), zid) === -1)
+      assert(timestampDiff(
+        "SECOND",
+        date(2022, 2, 14, 11, 27, 0, 1001, zid),
+        date(2022, 2, 14, 11, 27, 0, 1001, zid),
+        zid) === 0)
+      assert(timestampDiff(
+        "MINUTE",
+        date(2022, 2, 14, 11, 0, 1, 1, zid),
+        date(2022, 2, 14, 9, 30, 1, 1, zid),
+        zid) === -90)
+      assert(timestampDiff(
+        "HOUR",
+        date(2022, 2, 14, 11, 0, 1, 0, zid),
+        date(2022, 2, 15, 11, 0, 1, 0, zid),
+        zid) === 24)
+      assert(timestampDiff(
+        "DAY",
+        date(2022, 2, 28, 11, 1, 0, 0, zid),
+        date(2022, 3, 1, 11, 1, 0, 0, zid),
+        zid) === 1)
+      assert(timestampDiff("WEEK",
+        date(2022, 2, 14, 11, 43, 0, 1, zid), date(2022, 2, 21, 11, 43, 0, 1, zid), zid) === 1)

Review comment:
       can we test more corner cases? e.g. `date(2022, 2, 14, 11, 43, 0, 1, zid), date(2022, 2, 21, 11, 43, 0, 0, zid)` (one week minus 1 second)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org