You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "berkaysynnada (via GitHub)" <gi...@apache.org> on 2023/03/14 16:23:24 UTC

[GitHub] [arrow-datafusion] berkaysynnada opened a new pull request, #5603: Timestamp subtraction and interval operations for `ScalarValue`

berkaysynnada opened a new pull request, #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #5411 and #5412.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   This PR is composed of two parts but served as one PR since the parts are closely related (they share constants, change similar places in the file). 340 lines are added for implementation, and 670 are for tests.
   
   The first support is to timestamp subtraction within `ScalarValue`. In postgre, the behaviour is like: 
   ```
   CREATE TABLE test (
     id INT,
     ts1 TIMESTAMP,
     ts2 TIMESTAMP
     
   );
   ```
   ```
   INSERT INTO test VALUES (1, '2016-03-13 00:00:00', '2017-04-13 12:11:03');
   INSERT INTO test VALUES (2, '2016-03-04 06:00:00', '2016-03-01 20:33:48');
   INSERT INTO test VALUES (3, '2017-03-04 23:59:59', '2016-03-01 12:00:00');
   ```
   ```
   SELECT
   ts2 - ts1 AS diff_ts
   FROM test;
   ```
   <img width="352" alt="image" src="https://user-images.githubusercontent.com/124376117/223647110-d7381366-b087-43a8-a0aa-1ccbc623734d.png">
   
   We can result in the same way with postgre. The difference is shown in days as the largest time unit. However, since we don't have such detailed fields for an interval, `TimestampSecond` or `TimestampMillisecond` timestamp subtractions give result in `IntervalDayTime` variant, and `TimestampMicrosecond` or `TimestampNanosecond` timestamp subtractions give result in `IntervalMonthDayNano` variant without using the month field. However, I need to underline that we can apply this operation only on scalar values. Supporting columnar operations is in the scope of arrow-rs, and it needs to be implemented in a similar way to have consistency.
   
   The second support is comparing, adding and subtracting two `ScalarValue` interval types, `IntervalYearMonth`, `IntervalDayTime`, `IntervalMonthDayNano`. With this PR, we can apply these operations between both the same and different variants. arrow-rs support is required also here to support columnar values.
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   `impl_op` match arms are extended to cover timestamp and interval types. It should be noted that we need the same types of timestamps to apply subtraction. `impl PartialOrd for ScalarValue `and `impl PartialEq for ScalarValue` are extended to handle interval types. However, to be able to compare months and days, we need to assume a month equals to 30 days (postgre behaves in the same way).
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   Yes, there are some tests covering edge cases with timezone options, and randomized tests that first add/subtract an interval type to/from a timestamp, then take the difference between the resulting timestamp and the initial timestamp to assert the equality with the given interval. For the interval operations, some tests are written to cover all variations of operations for different types.
   
   
   
   
   timestamp_next | timestamp_prev | Return as days
   -- | -- | --
   2017-04-16 00:00:00 . 000_000_100 | 2016-03-13 00:00:00 . 000_000_025 | months: 0, days: 399, nanos: 75
   2016-03-13 00:00:00 . 000_000_025 | 2017-04-16 00:00:00 . 000_000_100 | months: 0, days: -399, nanos: -75
   2016-12-16 01:00:00 . 100 | 2016-12-15 00:00:00 . 000 | days: 1, millis: 3600100
   2016-12-15 00:00:00 . 000 | 2016-12-16 01:00:00 . 100 | days: -1, millis: -3600100
   
   
   
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->
   -
   
   # To run end to end queries including timestamp subtraction:
   
   1) `try_new_impl` function in arrow-rs gives an error while validating the schema and columns as such:
   
   > `column types must match schema types, expected Timestamp(Second, None) but found Interval(DayTime) at column index 0`
    
   That expectation of the output column needs to be changed to interval type.
   
   2) In Datafusion, there is a function `coerce_types` returning the output type of applying `op` to an argument of `lhs_type` and `rhs_type`. These codes with the extensions in planner.rs and binary.rs also need to be reshaped. `evaluate` function in datetime.rs has to handle columnar values at each side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1147905129


##########
datafusion/common/Cargo.toml:
##########
@@ -41,10 +41,14 @@ pyarrow = ["pyo3", "arrow/pyarrow"]
 [dependencies]
 apache-avro = { version = "0.14", default-features = false, features = ["snappy"], optional = true }
 arrow = { workspace = true, default-features = false }
+arrow-array = { version = "35.0.0", default-features = false, features = ["chrono-tz"] }

Review Comment:
   This is depending on arrow-array 35 despite the crate as a whole still depending on arrow 34. This is likely to cause confusion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1140406495


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   > TBC I don't think this blocks this PR, if one of the DF reviewers is happy with it, it was just an idle suggestion as it would save reimplementing parsing and timezone handling logic that already exists upstream, and ensures timezones are handled consistently 😄
   
   Thanks for the suggestion. I added the necessary dependencies and features (arrow-array and chrono-tz), and now we can handle the timezones in the form of the example you gave.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#issuecomment-1474875619

   Thank you everyone!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1140110006


##########
datafusion/common/src/scalar.rs:
##########
@@ -4486,4 +4922,623 @@ mod tests {
             assert!(distance.is_none());
         }
     }
+
+    #[test]
+    fn test_scalar_interval_add() {
+        let cases = [
+            (
+                ScalarValue::IntervalYearMonth(Some(IntervalYearMonthType::make_value(

Review Comment:
   I think you can use https://docs.rs/datafusion/20.0.0/datafusion/scalar/enum.ScalarValue.html#method.new_interval_ym to reduce this boilerplate substantially
   
   ```suggestion
                   ScalarValue::new_iterval_ym(1, 12)
   ```
   
   The same for most of the other tests 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1141032924


##########
datafusion/common/src/scalar.rs:
##########
@@ -4486,4 +4900,513 @@ mod tests {
             assert!(distance.is_none());
         }
     }
+
+    #[test]
+    fn test_scalar_interval_add() {
+        let cases = [
+            (
+                ScalarValue::new_interval_ym(1, 12),
+                ScalarValue::new_interval_ym(1, 12),
+                ScalarValue::new_interval_ym(2, 24),
+            ),
+            (
+                ScalarValue::new_interval_dt(1, 999),
+                ScalarValue::new_interval_dt(1, 999),
+                ScalarValue::new_interval_dt(2, 1998),
+            ),
+            (
+                ScalarValue::new_interval_mdn(12, 15, 123_456),
+                ScalarValue::new_interval_mdn(12, 15, 123_456),
+                ScalarValue::new_interval_mdn(24, 30, 246_912),
+            ),
+            (
+                ScalarValue::new_interval_ym(0, 1),
+                ScalarValue::new_interval_dt(29, 86_390),
+                ScalarValue::new_interval_mdn(1, 29, 86_390_000_000),
+            ),
+            (
+                ScalarValue::new_interval_ym(0, 1),
+                ScalarValue::new_interval_mdn(2, 10, 999_999_999),
+                ScalarValue::new_interval_mdn(3, 10, 999_999_999),
+            ),
+            (
+                ScalarValue::new_interval_dt(400, 123_456),
+                ScalarValue::new_interval_ym(1, 1),
+                ScalarValue::new_interval_mdn(13, 400, 123_456_000_000),
+            ),
+            (
+                ScalarValue::new_interval_dt(65, 321),
+                ScalarValue::new_interval_mdn(2, 5, 1_000_000),
+                ScalarValue::new_interval_mdn(2, 70, 322_000_000),
+            ),
+            (
+                ScalarValue::new_interval_mdn(12, 15, 123_456),
+                ScalarValue::new_interval_ym(2, 0),
+                ScalarValue::new_interval_mdn(36, 15, 123_456),
+            ),
+            (
+                ScalarValue::new_interval_mdn(12, 15, 100_000),
+                ScalarValue::new_interval_dt(370, 1),
+                ScalarValue::new_interval_mdn(12, 385, 1_100_000),
+            ),
+        ];
+        for (lhs, rhs, expected) in cases.iter() {
+            let result = lhs.add(rhs).unwrap();
+            let result_commute = rhs.add(lhs).unwrap();
+            assert_eq!(*expected, result, "lhs:{:?} + rhs:{:?}", lhs, rhs);
+            assert_eq!(*expected, result_commute, "lhs:{:?} + rhs:{:?}", rhs, lhs);
+        }
+    }
+
+    #[test]
+    fn test_scalar_interval_sub() {
+        let cases = [
+            (
+                ScalarValue::new_interval_ym(1, 12),
+                ScalarValue::new_interval_ym(1, 12),
+                ScalarValue::new_interval_ym(0, 0),
+            ),
+            (
+                ScalarValue::new_interval_dt(1, 999),
+                ScalarValue::new_interval_dt(1, 999),
+                ScalarValue::new_interval_dt(0, 0),
+            ),
+            (
+                ScalarValue::new_interval_mdn(12, 15, 123_456),
+                ScalarValue::new_interval_mdn(12, 15, 123_456),
+                ScalarValue::new_interval_mdn(0, 0, 0),
+            ),
+            (
+                ScalarValue::new_interval_ym(0, 1),
+                ScalarValue::new_interval_dt(29, 999_999),
+                ScalarValue::new_interval_mdn(1, -29, -999_999_000_000),
+            ),
+            (
+                ScalarValue::new_interval_ym(0, 1),
+                ScalarValue::new_interval_mdn(2, 10, 999_999_999),
+                ScalarValue::new_interval_mdn(-1, -10, -999_999_999),
+            ),
+            (
+                ScalarValue::new_interval_dt(400, 123_456),
+                ScalarValue::new_interval_ym(1, 1),
+                ScalarValue::new_interval_mdn(-13, 400, 123_456_000_000),
+            ),
+            (
+                ScalarValue::new_interval_dt(65, 321),
+                ScalarValue::new_interval_mdn(2, 5, 1_000_000),
+                ScalarValue::new_interval_mdn(-2, 60, 320_000_000),

Review Comment:
   ❤️  
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1139219837


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   In the [example](https://github.com/apache/arrow-rs/pull/3801/files#diff-5f92a7816bbae9b685c2f85ab84a268b85246bfaa14272c5afd339810ad471f3R22) you mentioned, there is a function `string_to_datetime`, taking the timestamp and timezone as a whole string. In our case, we have a timestamp as an integer and a string for the timezone. If I try to implement the part where we parse the timezone string to `Tz`, I need to use `FromStr::from_str` implemented for Tz. However, there are 2 different implementations of that function inside timezone.rs: 
   
   The first one is protected with chrono-tz feature. Even if I add that feature, Tz struct and the functions are under `private` namespace which is not public. 
   
   On the other hand, the second implementation cannot handle timezones such as `"America/Los_Angeles"`, when I try to use it without any modification on arrow and cargo.toml.
   
   I suggest that you can merge this PR, and when the new release that enables us to parse timezone string is announced, I will open a new PR supporting that kind of timezones. 
   
   I don't think it will look good, but I can provide this support by duplicating the parts that will work for me in the arrow code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1140110006


##########
datafusion/common/src/scalar.rs:
##########
@@ -4486,4 +4922,623 @@ mod tests {
             assert!(distance.is_none());
         }
     }
+
+    #[test]
+    fn test_scalar_interval_add() {
+        let cases = [
+            (
+                ScalarValue::IntervalYearMonth(Some(IntervalYearMonthType::make_value(

Review Comment:
   I think you can use https://docs.rs/datafusion/20.0.0/datafusion/scalar/enum.ScalarValue.html#method.new_interval_ym to reduce this boilerplate substantially
   
   ```suggestion
                   ScalarValue::new_interval_ym(1, 12)
   ```
   
   The same for most of the other tests 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1136922654


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   This should probably make use of https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html to parse to a `DateTime<Tz>`
   
   Crucially this handles things like daylight savings time



##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   This should probably make use of https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html to parse to a `DateTime<Tz>`
   
   Crucially this handles things like daylight savings time, where the timezone offset depends on the time in question



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1137254192


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   > corresponding UTC offsets of these timestamps
   
   The timezones may not be of the form `"+02:00"` they might also be `"America/Los_Angeles"` in which case the offset to UTC depends on the date in question. See https://github.com/apache/arrow-rs/pull/3801/files#diff-5f92a7816bbae9b685c2f85ab84a268b85246bfaa14272c5afd339810ad471f3R22
   
   My suggestion is to use the chrono `DateTime` abstraction to handle applying the offset correctly, along with `Tz` to handle parsing the timezone string correctly.
   
   
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1139219837


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   In the [example](https://github.com/apache/arrow-rs/pull/3801/files#diff-5f92a7816bbae9b685c2f85ab84a268b85246bfaa14272c5afd339810ad471f3R22) you mentioned, there is a function `string_to_datetime`, taking the timestamp and timezone as a whole string. In our case, we have a timestamp as an integer and a string for the timezone. If I try to implement the part where we parse the timezone string to `Tz`, I need to use `FromStr::from_str` implemented for Tz. However, there are 2 different implementations of that function inside timezone.rs: 
   The first one is protected with chrono-tz feature. Even if I add that feature, Tz struct and the functions are under `private` namespace which is not public. 
   On the other hand, the second implementation cannot handle timezones such as `"America/Los_Angeles"`, when I try to use it without any modification on arrow and cargo.toml.
   I suggest that you can merge this PR, and when the new release that enables us to parse timezone string is announced, I will open a new PR supporting that kind of timezones. 
   I don't think it will look good, but I can provide this support by duplicating the parts that will work for me in the arrow code.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1139221730


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   There is always a `FromStr` implementation provided, it just becomes "more complete" with the chrono-tz feature enabled. You should be able to use it regardless, give it a go :smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1140721431


##########
datafusion/common/src/scalar.rs:
##########
@@ -4486,4 +4922,623 @@ mod tests {
             assert!(distance.is_none());
         }
     }
+
+    #[test]
+    fn test_scalar_interval_add() {
+        let cases = [
+            (
+                ScalarValue::IntervalYearMonth(Some(IntervalYearMonthType::make_value(

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1136922654


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   This should probably make use of https://docs.rs/arrow-array/latest/arrow_array/timezone/struct.Tz.html to parse to a `DateTime<Tz>`
   
   Crucially this handles things like daylight savings time, where the timezone offset depends on the time in question
   
   There is an example of this here - https://github.com/apache/arrow-rs/pull/3795



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1139236034


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   TBC I don't think this blocks this PR, if one of the DF reviewers is happy with it, it was just an idle suggestion as it would save reimplementing parsing and timezone handling logic that already exists upstream, and ensures timezones are handled consistently :smile: 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1147917401


##########
datafusion/common/Cargo.toml:
##########
@@ -41,10 +41,14 @@ pyarrow = ["pyo3", "arrow/pyarrow"]
 [dependencies]
 apache-avro = { version = "0.14", default-features = false, features = ["snappy"], optional = true }
 arrow = { workspace = true, default-features = false }
+arrow-array = { version = "35.0.0", default-features = false, features = ["chrono-tz"] }

Review Comment:
   Fix in https://github.com/apache/arrow-datafusion/pull/5724



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1137254192


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   > corresponding UTC offsets of these timestamps
   
   The timezones may not be of the form `"+02:00"` they might also be `"America/Los_Angeles"` in which case the offset to UTC depends on the date in question. See https://github.com/apache/arrow-rs/pull/3801/files#diff-5f92a7816bbae9b685c2f85ab84a268b85246bfaa14272c5afd339810ad471f3R22
   
   My suggestion is to use the chrono `DateTime` abstraction to handle applying the offset correctly, along with `Tz` to handle parsing the timezone string correctly. 
   
   In particular [as_datetime](https://docs.rs/arrow-array/latest/arrow_array/temporal_conversions/fn.as_datetime.html) and [as_datetime_with_timezone](https://docs.rs/arrow-array/latest/arrow_array/temporal_conversions/fn.as_datetime_with_timezone.html) handle the cases of no timezone and a timezone respectively. You could perhaps crib from them, or even use them directly
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1137247431


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   Thank you for the review. Do you mean we can use `timestamp_ns_to_datetime()` and other similar functions to parse from i64 to DateTime, instead of `from_timestamp_opt()` which is parsing i64 to NaiveDateTime?
   I cannot figure out how this code may have a problem with things like daylight savings time. Isn't it sufficient knowing two things to find the time difference correctly in any circumstances?: 1) numeric value of timestamps 2) corresponding UTC offsets of these timestamps. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1139221730


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   There is always a `FromStr` implementation provided, it just becomes "more complete" with the chrono-tz feature enabled. You should be able to use it regardless, give it a go :smile: 
   
   >  when the new release that enables us to parse timezone string is announced
   
   This was added months ago, and is in use already within DataFusion



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on a diff in pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on code in PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#discussion_r1139265782


##########
datafusion/common/src/scalar.rs:
##########
@@ -563,46 +843,154 @@ macro_rules! get_sign {
     };
 }
 
+#[derive(Clone, Copy)]
+enum IntervalMode {
+    Milli,
+    Nano,
+}
+
+/// This function computes subtracts `rhs_ts` from `lhs_ts`, taking timezones
+/// into account when given. Units of the resulting interval is specified by
+/// the argument `mode`.
+/// The default behavior of Datafusion is the following:
+/// - When subtracting timestamps at seconds/milliseconds precision, the output
+///   interval will have the type [`IntervalDayTimeType`].
+/// - When subtracting timestamps at microseconds/nanoseconds precision, the
+///   output interval will have the type [`IntervalMonthDayNanoType`].
+fn ts_sub_to_interval(
+    lhs_ts: i64,
+    rhs_ts: i64,
+    lhs_tz: &Option<String>,
+    rhs_tz: &Option<String>,
+    mode: IntervalMode,
+) -> Result<ScalarValue, DataFusionError> {
+    let lhs_dt = with_timezone_to_naive_datetime(lhs_ts, lhs_tz, mode)?;
+    let rhs_dt = with_timezone_to_naive_datetime(rhs_ts, rhs_tz, mode)?;
+    let delta_secs = lhs_dt.signed_duration_since(rhs_dt);
+
+    match mode {
+        IntervalMode::Milli => {
+            let as_millisecs = delta_secs.num_milliseconds();
+            Ok(ScalarValue::IntervalDayTime(Some(
+                IntervalDayTimeType::make_value(
+                    (as_millisecs / MILLISECS_IN_ONE_DAY) as i32,
+                    (as_millisecs % MILLISECS_IN_ONE_DAY) as i32,
+                ),
+            )))
+        }
+        IntervalMode::Nano => {
+            let as_nanosecs = delta_secs.num_nanoseconds().ok_or_else(|| {
+                DataFusionError::Execution(String::from(
+                    "Can not compute timestamp differences with nanosecond precision",
+                ))
+            })?;
+            Ok(ScalarValue::IntervalMonthDayNano(Some(
+                IntervalMonthDayNanoType::make_value(
+                    0,
+                    (as_nanosecs / NANOSECS_IN_ONE_DAY) as i32,
+                    as_nanosecs % NANOSECS_IN_ONE_DAY,
+                ),
+            )))
+        }
+    }
+}
+
+/// This function creates the [`NaiveDateTime`] object corresponding to the
+/// given timestamp using the units (tick size) implied by argument `mode`.
+#[inline]
+fn with_timezone_to_naive_datetime(

Review Comment:
   Sounds good. We will try to make it work with the `FromStr` impl and if we can not for some reason, we will revisit in a follow-on PR. Thanks for taking a look!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ozankabak commented on pull request #5603: Timestamp subtraction and interval operations for `ScalarValue`

Posted by "ozankabak (via GitHub)" <gi...@apache.org>.
ozankabak commented on PR #5603:
URL: https://github.com/apache/arrow-datafusion/pull/5603#issuecomment-1474432417

   @alamb, thanks for reviewing. As you correctly guessed, this is for things like determining window boundaries, interval calculations for pruning etc. It will not be used on raw data when executing queries (we will use arrow kernels there).
   
   @berkaysynnada is working on the arrow kernel parts and we will follow on with another PR when it is ready. It will include both end to end tests (i.e. running queries) and the consistency tests you suggested.
   
   I cleaned up the overly verbose interval construction boilerplate, so this is good to go in terms of the feedback so far. We will follow up with the other PR early next week after this merges.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org