You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alexandreyc (via GitHub)" <gi...@apache.org> on 2023/07/20 13:22:45 UTC

[GitHub] [arrow-rs] alexandreyc opened a new pull request, #4546: Fix timezoned timestamp arithmetic

alexandreyc opened a new pull request, #4546:
URL: https://github.com/apache/arrow-rs/pull/4546

   # Which issue does this PR close?
   
   Closes #4457.
   
   # Rationale for this change
    
   Arithmetic on timezoned timestamps was wrong.
   
   # Are there any user-facing changes?
   
   Yes. 
   
   Methods `add_year_months`, `add_day_time`, `add_month_day_nano`, `subtract_year_months`, `subtract_day_time` and `subtract_month_day_nano` on `TimestampSecondType`, `TimestampMillisecondType`, `TimestampMicrosecondType` and `TimestampNanosecondType` now take an additonal parameter `tz: Tz` as these operations are inherently timezone-dependent. Maybe there is a way to do things differently and not break the API. Feel free to suggest any idea.
   
   # Tests
   
   I tested the results against PostgreSQL 14. Feel free to propose new test cases that you find interesting.
   
   Here is the script to reproduce the results:
   
   ```
   postgres=# create temporary table tests as (
   
       select
           (timestamp '1970-01-28 23:00:00' at time zone 'Europe/Paris') as datetime,
           (interval '0 year 1 month') as interval_year_month,
           (interval '0 day 0 millisecond') as interval_day_time,
           (interval '1 month 0 day 0 microsecond') as interval_month_day_nano
   
       union all
   
       select
           (timestamp '1970-01-01 00:00:00' at time zone 'Europe/Paris') as datetime,
           (interval '5 year 34 month') as interval_year_month,
           (interval '5 day 454000 millisecond') as interval_day_time,
           (interval '344 month 34 day -43000000 microsecond') as interval_month_day_nano
   
       union all
   
       select
           (timestamp '2010-04-01 04:00:20' at time zone 'Europe/Paris') as datetime,
           (interval '-2 year 4 month') as interval_year_month,
           (interval '-34 day 0 millisecond') as interval_day_time,
           (interval '-593 month -33 day 13000000 microsecond') as interval_month_day_nano
   
       union all
   
       select
           (timestamp '1960-01-30 04:23:20' at time zone 'Europe/Paris') as datetime,
           (interval '7 year -4 month') as interval_year_month,
           (interval '7 day -4000 millisecond') as interval_day_time,
           (interval '5 month 2 day 493000000 microsecond') as interval_month_day_nano
   
       union all
   
       select
           (timestamp '2023-03-25 14:00:00' at time zone 'Europe/Paris') as datetime,
           (interval '0 year 1 month') as interval_year_month,
           (interval '1 day 0 millisecond') as interval_day_time,
           (interval '1 month 0 day 0 microsecond') as interval_month_day_nano
   
   );
   SELECT 5
   postgres=# select * from tests
   postgres=# select * from tests;
           datetime        | interval_year_month | interval_day_time |       interval_month_day_nano
   ------------------------+---------------------+-------------------+--------------------------------------
    1970-01-28 23:00:00+01 | 1 mon               | 00:00:00          | 1 mon
    1970-01-01 00:00:00+01 | 7 years 10 mons     | 5 days 00:07:34   | 28 years 8 mons 34 days -00:00:43
    2010-04-01 04:00:20+02 | -1 years -8 mons    | -34 days          | -49 years -5 mons -33 days +00:00:13
    1960-01-30 04:23:20+01 | 6 years 8 mons      | 7 days -00:00:04  | 5 mons 2 days 00:08:13
    2023-03-25 14:00:00+01 | 1 mon               | 1 day             | 1 mon
   (5 rows)
   
   postgres=# select datetime + interval_year_month from tests;
           ?column?
   ------------------------
    1970-02-28 23:00:00+01
    1977-11-01 00:00:00+01
    2008-08-01 04:00:20+02
    1966-09-30 04:23:20+01
    2023-04-25 14:00:00+02
   (5 rows)
   
   postgres=# select (datetime + interval_year_month) - interval_year_month from tests;
           ?column?
   ------------------------
    1970-01-28 23:00:00+01
    1970-01-01 00:00:00+01
    2010-04-01 04:00:20+02
    1960-01-30 04:23:20+01
    2023-03-25 14:00:00+01
   (5 rows)
   
   postgres=# select datetime + interval_day_time from tests;
           ?column?
   ------------------------
    1970-01-28 23:00:00+01
    1970-01-06 00:07:34+01
    2010-02-26 04:00:20+01
    1960-02-06 04:23:16+01
    2023-03-26 14:00:00+02
   (5 rows)
   
   postgres=# select (datetime + interval_day_time) - interval_day_time from tests;
           ?column?
   ------------------------
    1970-01-28 23:00:00+01
    1970-01-01 00:00:00+01
    2010-04-01 04:00:20+02
    1960-01-30 04:23:20+01
    2023-03-25 14:00:00+01
   (5 rows)
   
   postgres=# select datetime + interval_month_day_nano from tests;
           ?column?
   ------------------------
    1970-02-28 23:00:00+01
    1998-10-04 23:59:17+02
    1960-09-29 04:00:33+01
    1960-07-02 04:31:33+01
    2023-04-25 14:00:00+02
   (5 rows)
   
   postgres=# select (datetime + interval_month_day_nano) - interval_month_day_nano from tests;
           ?column?
   ------------------------
    1970-01-28 23:00:00+01
    1970-01-02 00:00:00+01
    2010-04-02 04:00:20+02
    1960-01-31 04:23:20+01
    2023-03-25 14:00:00+01
   (5 rows)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1270724734


##########
arrow-array/src/types.rs:
##########
@@ -454,25 +469,13 @@ impl TimestampSecondType {
     /// * `timestamp` - The date on which to perform the operation
     /// * `delta` - The interval to add
     pub fn subtract_day_time(
-        timestamp: <TimestampSecondType as ArrowPrimitiveType>::Native,
+        timestamp: <Self as ArrowPrimitiveType>::Native,
         delta: <IntervalDayTimeType as ArrowPrimitiveType>::Native,
-    ) -> Result<<TimestampSecondType as ArrowPrimitiveType>::Native, ArrowError> {
+        tz: Tz,
+    ) -> Result<<Self as ArrowPrimitiveType>::Native, ArrowError> {
         let (days, ms) = IntervalDayTimeType::to_parts(delta);
-        let res = NaiveDateTime::from_timestamp_opt(timestamp, 0).ok_or_else(|| {
-            ArrowError::ComputeError("Timestamp out of range".to_string())
-        })?;
-        let res = res
-            .checked_sub_signed(Duration::days(days as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        let res = res
-            .checked_sub_signed(Duration::milliseconds(ms as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        TimestampSecondType::make_value(res)
-            .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))
+        let delta = IntervalDayTimeType::make_value(-days, -ms);

Review Comment:
   Perhaps we could use subtraction instead of negation followed by addition?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1272316901


##########
arrow-array/src/types.rs:
##########
@@ -350,6 +352,59 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
+fn add_year_months<T: ArrowTimestampType>(
+    timestamp: <T as ArrowPrimitiveType>::Native,
+    delta: <IntervalYearMonthType as ArrowPrimitiveType>::Native,
+    tz: Tz,
+) -> Result<<T as ArrowPrimitiveType>::Native, ArrowError> {
+    let months = IntervalYearMonthType::to_months(delta);
+    let res = as_datetime_with_timezone::<T>(timestamp, tz)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = shift_months_datetime(res, months)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = res.naive_utc();
+    T::make_value(res)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))

Review Comment:
   After the last commits, the error can now be of two kinds: timestamp or interval out of range.
   
   I'm not sure if it's really important to be able to distinguish between those errors... If not we can definitely return an Option here (though we change the API even more).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1273921418


##########
arrow-array/src/types.rs:
##########
@@ -350,650 +350,6 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
-impl TimestampSecondType {

Review Comment:
   I think removing these methods will cause non-trivial downstream friction, as they are used quite extensively by DataFusion, etc... Perhaps we could deprecate them instead of removing them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1272443931


##########
arrow-array/src/types.rs:
##########
@@ -454,25 +469,13 @@ impl TimestampSecondType {
     /// * `timestamp` - The date on which to perform the operation
     /// * `delta` - The interval to add
     pub fn subtract_day_time(
-        timestamp: <TimestampSecondType as ArrowPrimitiveType>::Native,
+        timestamp: <Self as ArrowPrimitiveType>::Native,
         delta: <IntervalDayTimeType as ArrowPrimitiveType>::Native,
-    ) -> Result<<TimestampSecondType as ArrowPrimitiveType>::Native, ArrowError> {
+        tz: Tz,
+    ) -> Result<<Self as ArrowPrimitiveType>::Native, ArrowError> {
         let (days, ms) = IntervalDayTimeType::to_parts(delta);
-        let res = NaiveDateTime::from_timestamp_opt(timestamp, 0).ok_or_else(|| {
-            ArrowError::ComputeError("Timestamp out of range".to_string())
-        })?;
-        let res = res
-            .checked_sub_signed(Duration::days(days as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        let res = res
-            .checked_sub_signed(Duration::milliseconds(ms as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        TimestampSecondType::make_value(res)
-            .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))
+        let delta = IntervalDayTimeType::make_value(-days, -ms);

Review Comment:
   Right, I didn't know about `unsigned_abs`... Should be good now!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1272379514


##########
arrow-array/src/types.rs:
##########
@@ -454,25 +469,13 @@ impl TimestampSecondType {
     /// * `timestamp` - The date on which to perform the operation
     /// * `delta` - The interval to add
     pub fn subtract_day_time(
-        timestamp: <TimestampSecondType as ArrowPrimitiveType>::Native,
+        timestamp: <Self as ArrowPrimitiveType>::Native,
         delta: <IntervalDayTimeType as ArrowPrimitiveType>::Native,
-    ) -> Result<<TimestampSecondType as ArrowPrimitiveType>::Native, ArrowError> {
+        tz: Tz,
+    ) -> Result<<Self as ArrowPrimitiveType>::Native, ArrowError> {
         let (days, ms) = IntervalDayTimeType::to_parts(delta);
-        let res = NaiveDateTime::from_timestamp_opt(timestamp, 0).ok_or_else(|| {
-            ArrowError::ComputeError("Timestamp out of range".to_string())
-        })?;
-        let res = res
-            .checked_sub_signed(Duration::days(days as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        let res = res
-            .checked_sub_signed(Duration::milliseconds(ms as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        TimestampSecondType::make_value(res)
-            .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))
+        let delta = IntervalDayTimeType::make_value(-days, -ms);

Review Comment:
   You should be able to use a combination of abs_unsigned and using the sign to select between addition and subtraction to avoid overflow?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1273921418


##########
arrow-array/src/types.rs:
##########
@@ -350,650 +350,6 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
-impl TimestampSecondType {

Review Comment:
   I think removing these methods may cause non-trivial downstream friction. Perhaps we could deprecate them as a first step instead of removing them?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1272318457


##########
arrow-arith/Cargo.toml:
##########
@@ -46,3 +46,4 @@ num = { version = "0.4", default-features = false, features = ["std"] }
 
 [features]
 simd = ["arrow-array/simd"]
+chrono-tz = ["arrow-array/chrono-tz"]

Review Comment:
   I will do this change once we've settled on the rest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1272483885


##########
arrow-array/src/types.rs:
##########
@@ -350,6 +352,59 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
+fn add_year_months<T: ArrowTimestampType>(
+    timestamp: <T as ArrowPrimitiveType>::Native,
+    delta: <IntervalYearMonthType as ArrowPrimitiveType>::Native,
+    tz: Tz,
+) -> Result<<T as ArrowPrimitiveType>::Native, ArrowError> {
+    let months = IntervalYearMonthType::to_months(delta);
+    let res = as_datetime_with_timezone::<T>(timestamp, tz)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = shift_months_datetime(res, months)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = res.naive_utc();
+    T::make_value(res)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))

Review Comment:
   Done.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1274796434


##########
arrow-array/src/types.rs:
##########
@@ -350,650 +350,6 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
-impl TimestampSecondType {

Review Comment:
   I reverted the last commit to undo this change because I'm not sure where to put these methods if we deprecate them and don't make `TimestampOp` public... Maybe we can create another trait? I'm not sure it's worth it...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1269591805


##########
arrow-array/src/types.rs:
##########
@@ -350,6 +352,59 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
+fn add_year_months<T: ArrowTimestampType>(
+    timestamp: <T as ArrowPrimitiveType>::Native,
+    delta: <IntervalYearMonthType as ArrowPrimitiveType>::Native,
+    tz: Tz,
+) -> Result<<T as ArrowPrimitiveType>::Native, ArrowError> {
+    let months = IntervalYearMonthType::to_months(delta);
+    let res = as_datetime_with_timezone::<T>(timestamp, tz)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = shift_months_datetime(res, months)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = res.naive_utc();
+    T::make_value(res)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))

Review Comment:
   Given the error is always the same, perhaps this method could just return an Option, and this error mapping be handled in the caller?



##########
arrow-arith/Cargo.toml:
##########
@@ -46,3 +46,4 @@ num = { version = "0.4", default-features = false, features = ["std"] }
 
 [features]
 simd = ["arrow-array/simd"]
+chrono-tz = ["arrow-array/chrono-tz"]

Review Comment:
   Typically the way we have handled this is to put these integration style tests in the top-level arrow, as opposed to introducing feature flags on child crates



##########
arrow-array/src/types.rs:
##########
@@ -454,25 +469,13 @@ impl TimestampSecondType {
     /// * `timestamp` - The date on which to perform the operation
     /// * `delta` - The interval to add
     pub fn subtract_day_time(
-        timestamp: <TimestampSecondType as ArrowPrimitiveType>::Native,
+        timestamp: <Self as ArrowPrimitiveType>::Native,
         delta: <IntervalDayTimeType as ArrowPrimitiveType>::Native,
-    ) -> Result<<TimestampSecondType as ArrowPrimitiveType>::Native, ArrowError> {
+        tz: Tz,
+    ) -> Result<<Self as ArrowPrimitiveType>::Native, ArrowError> {
         let (days, ms) = IntervalDayTimeType::to_parts(delta);
-        let res = NaiveDateTime::from_timestamp_opt(timestamp, 0).ok_or_else(|| {
-            ArrowError::ComputeError("Timestamp out of range".to_string())
-        })?;
-        let res = res
-            .checked_sub_signed(Duration::days(days as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        let res = res
-            .checked_sub_signed(Duration::milliseconds(ms as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        TimestampSecondType::make_value(res)
-            .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))
+        let delta = IntervalDayTimeType::make_value(-days, -ms);

Review Comment:
   I'm not sure if it matters, but potentially this negation could overflow. In practice I think this will always result in timestamp overflow anyway, so perhaps this doesn't matter



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1269879790


##########
arrow-array/src/types.rs:
##########
@@ -454,25 +469,13 @@ impl TimestampSecondType {
     /// * `timestamp` - The date on which to perform the operation
     /// * `delta` - The interval to add
     pub fn subtract_day_time(
-        timestamp: <TimestampSecondType as ArrowPrimitiveType>::Native,
+        timestamp: <Self as ArrowPrimitiveType>::Native,
         delta: <IntervalDayTimeType as ArrowPrimitiveType>::Native,
-    ) -> Result<<TimestampSecondType as ArrowPrimitiveType>::Native, ArrowError> {
+        tz: Tz,
+    ) -> Result<<Self as ArrowPrimitiveType>::Native, ArrowError> {
         let (days, ms) = IntervalDayTimeType::to_parts(delta);
-        let res = NaiveDateTime::from_timestamp_opt(timestamp, 0).ok_or_else(|| {
-            ArrowError::ComputeError("Timestamp out of range".to_string())
-        })?;
-        let res = res
-            .checked_sub_signed(Duration::days(days as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        let res = res
-            .checked_sub_signed(Duration::milliseconds(ms as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        TimestampSecondType::make_value(res)
-            .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))
+        let delta = IntervalDayTimeType::make_value(-days, -ms);

Review Comment:
   The only way to overflow an `x: i32` with negation is when `x = i32::MIN = -2_147_483_648`.
   
   It should not be a problem for years, months or days because they would overflow the timestamp anyway. However it can be problematic for milliseconds and nanoseconds.
   
   We can use checked negation for those operations and return an "Interval out of range" error. The only downside I see is performance wise since this check must be done for every array element.
   
   What's your opinion?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold merged pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold merged PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#issuecomment-1650082711

   I just pushed a commit that make the code simpler and shorter (IMHO).
   
   In summary, I moved all arithmetic methods to the `TimestampOp` trait. This introduces API changes because users now need to have the trait in scope to be able to call these methods. I also slightly renamed some methods.
   
   What's your opinion on this change?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1273920965


##########
arrow-arith/src/numeric.rs:
##########
@@ -341,50 +344,201 @@ fn float_op<T: ArrowPrimitiveType>(
     Ok(Arc::new(array))
 }
 
-/// Arithmetic trait for timestamp arrays
-trait TimestampOp: ArrowTimestampType {
-    type Duration: ArrowPrimitiveType<Native = i64>;
+/// Add the given number of months to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn add_months_datetime<Tz: TimeZone>(
+    dt: DateTime<Tz>,
+    months: i32,
+) -> Option<DateTime<Tz>> {
+    match months.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_add_months(Months::new(months as u32)),
+        Ordering::Less => dt.checked_sub_months(Months::new(months.unsigned_abs())),
+    }
+}
 
-    fn add_year_month(timestamp: i64, delta: i32) -> Result<i64, ArrowError>;
-    fn add_day_time(timestamp: i64, delta: i64) -> Result<i64, ArrowError>;
-    fn add_month_day_nano(timestamp: i64, delta: i128) -> Result<i64, ArrowError>;
+/// Add the given number of days to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn add_days_datetime<Tz: TimeZone>(dt: DateTime<Tz>, days: i32) -> Option<DateTime<Tz>> {
+    match days.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_add_days(Days::new(days as u64)),
+        Ordering::Less => dt.checked_sub_days(Days::new(days.unsigned_abs() as u64)),
+    }
+}
 
-    fn sub_year_month(timestamp: i64, delta: i32) -> Result<i64, ArrowError>;
-    fn sub_day_time(timestamp: i64, delta: i64) -> Result<i64, ArrowError>;
-    fn sub_month_day_nano(timestamp: i64, delta: i128) -> Result<i64, ArrowError>;
+/// Substract the given number of months to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn sub_months_datetime<Tz: TimeZone>(
+    dt: DateTime<Tz>,
+    months: i32,
+) -> Option<DateTime<Tz>> {
+    match months.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_sub_months(Months::new(months as u32)),
+        Ordering::Less => dt.checked_add_months(Months::new(months.unsigned_abs())),
+    }
 }
 
-macro_rules! timestamp {
-    ($t:ty, $d:ty) => {
-        impl TimestampOp for $t {
-            type Duration = $d;
+/// Substract the given number of days to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn sub_days_datetime<Tz: TimeZone>(dt: DateTime<Tz>, days: i32) -> Option<DateTime<Tz>> {
+    match days.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_sub_days(Days::new(days as u64)),
+        Ordering::Less => dt.checked_add_days(Days::new(days.unsigned_abs() as u64)),
+    }
+}
 
-            fn add_year_month(left: i64, right: i32) -> Result<i64, ArrowError> {
-                Self::add_year_months(left, right)
-            }
+/// Arithmetic trait for timestamp arrays
+pub trait TimestampOp: ArrowTimestampType + Sized {

Review Comment:
   I'm not a fan of making this trait public, as it is intended as an internal implementation detail of how the kernel chooses to implement arithmetic. I could definitely see it evolving in future to support vectorisation or some other extensions



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#issuecomment-1652066631

   Thank you this looks good to me, I will file a follow on PR to remove the chrono-tz feature


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1275179427


##########
arrow-arith/Cargo.toml:
##########
@@ -46,3 +46,4 @@ num = { version = "0.4", default-features = false, features = ["std"] }
 
 [features]
 simd = ["arrow-array/simd"]
+chrono-tz = ["arrow-array/chrono-tz"]

Review Comment:
   https://github.com/apache/arrow-rs/pull/4571
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alexandreyc commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "alexandreyc (via GitHub)" <gi...@apache.org>.
alexandreyc commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1272311550


##########
arrow-array/src/types.rs:
##########
@@ -454,25 +469,13 @@ impl TimestampSecondType {
     /// * `timestamp` - The date on which to perform the operation
     /// * `delta` - The interval to add
     pub fn subtract_day_time(
-        timestamp: <TimestampSecondType as ArrowPrimitiveType>::Native,
+        timestamp: <Self as ArrowPrimitiveType>::Native,
         delta: <IntervalDayTimeType as ArrowPrimitiveType>::Native,
-    ) -> Result<<TimestampSecondType as ArrowPrimitiveType>::Native, ArrowError> {
+        tz: Tz,
+    ) -> Result<<Self as ArrowPrimitiveType>::Native, ArrowError> {
         let (days, ms) = IntervalDayTimeType::to_parts(delta);
-        let res = NaiveDateTime::from_timestamp_opt(timestamp, 0).ok_or_else(|| {
-            ArrowError::ComputeError("Timestamp out of range".to_string())
-        })?;
-        let res = res
-            .checked_sub_signed(Duration::days(days as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        let res = res
-            .checked_sub_signed(Duration::milliseconds(ms as i64))
-            .ok_or_else(|| {
-                ArrowError::ComputeError("Timestamp out of range".to_string())
-            })?;
-        TimestampSecondType::make_value(res)
-            .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))
+        let delta = IntervalDayTimeType::make_value(-days, -ms);

Review Comment:
   I think that ultimately we cannot avoid a checked negation because chrono's `Months::new` accepts only an `u32` and arrow's months are `i32`. Same thing for days.
   
   I've added checked negation everywhere it's needed. 
   
   I may have missed a simple solution without checked ops... Feel free to suggest any idea!
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1273920965


##########
arrow-arith/src/numeric.rs:
##########
@@ -341,50 +344,201 @@ fn float_op<T: ArrowPrimitiveType>(
     Ok(Arc::new(array))
 }
 
-/// Arithmetic trait for timestamp arrays
-trait TimestampOp: ArrowTimestampType {
-    type Duration: ArrowPrimitiveType<Native = i64>;
+/// Add the given number of months to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn add_months_datetime<Tz: TimeZone>(
+    dt: DateTime<Tz>,
+    months: i32,
+) -> Option<DateTime<Tz>> {
+    match months.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_add_months(Months::new(months as u32)),
+        Ordering::Less => dt.checked_sub_months(Months::new(months.unsigned_abs())),
+    }
+}
 
-    fn add_year_month(timestamp: i64, delta: i32) -> Result<i64, ArrowError>;
-    fn add_day_time(timestamp: i64, delta: i64) -> Result<i64, ArrowError>;
-    fn add_month_day_nano(timestamp: i64, delta: i128) -> Result<i64, ArrowError>;
+/// Add the given number of days to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn add_days_datetime<Tz: TimeZone>(dt: DateTime<Tz>, days: i32) -> Option<DateTime<Tz>> {
+    match days.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_add_days(Days::new(days as u64)),
+        Ordering::Less => dt.checked_sub_days(Days::new(days.unsigned_abs() as u64)),
+    }
+}
 
-    fn sub_year_month(timestamp: i64, delta: i32) -> Result<i64, ArrowError>;
-    fn sub_day_time(timestamp: i64, delta: i64) -> Result<i64, ArrowError>;
-    fn sub_month_day_nano(timestamp: i64, delta: i128) -> Result<i64, ArrowError>;
+/// Substract the given number of months to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn sub_months_datetime<Tz: TimeZone>(
+    dt: DateTime<Tz>,
+    months: i32,
+) -> Option<DateTime<Tz>> {
+    match months.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_sub_months(Months::new(months as u32)),
+        Ordering::Less => dt.checked_add_months(Months::new(months.unsigned_abs())),
+    }
 }
 
-macro_rules! timestamp {
-    ($t:ty, $d:ty) => {
-        impl TimestampOp for $t {
-            type Duration = $d;
+/// Substract the given number of days to the given datetime.
+///
+/// Returns `None` when it will result in overflow.
+fn sub_days_datetime<Tz: TimeZone>(dt: DateTime<Tz>, days: i32) -> Option<DateTime<Tz>> {
+    match days.cmp(&0) {
+        Ordering::Equal => Some(dt),
+        Ordering::Greater => dt.checked_sub_days(Days::new(days as u64)),
+        Ordering::Less => dt.checked_add_days(Days::new(days.unsigned_abs() as u64)),
+    }
+}
 
-            fn add_year_month(left: i64, right: i32) -> Result<i64, ArrowError> {
-                Self::add_year_months(left, right)
-            }
+/// Arithmetic trait for timestamp arrays
+pub trait TimestampOp: ArrowTimestampType + Sized {

Review Comment:
   I'm not a fan of making this trait public, as it is intended as an internal implementation detail of how the kernel chooses to implement arithmetic. I could definitely see it evolving in future to support vectorisation or some other extensions.
   
   I think I would prefer to just expose the dyn kernels



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1272377392


##########
arrow-array/src/types.rs:
##########
@@ -350,6 +352,59 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
+fn add_year_months<T: ArrowTimestampType>(
+    timestamp: <T as ArrowPrimitiveType>::Native,
+    delta: <IntervalYearMonthType as ArrowPrimitiveType>::Native,
+    tz: Tz,
+) -> Result<<T as ArrowPrimitiveType>::Native, ArrowError> {
+    let months = IntervalYearMonthType::to_months(delta);
+    let res = as_datetime_with_timezone::<T>(timestamp, tz)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = shift_months_datetime(res, months)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))?;
+    let res = res.naive_utc();
+    T::make_value(res)
+        .ok_or_else(|| ArrowError::ComputeError("Timestamp out of range".to_string()))

Review Comment:
   I don't think distinguishing them is important



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on a diff in pull request #4546: Fix timezoned timestamp arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #4546:
URL: https://github.com/apache/arrow-rs/pull/4546#discussion_r1273921418


##########
arrow-array/src/types.rs:
##########
@@ -350,650 +350,6 @@ impl ArrowTimestampType for TimestampNanosecondType {
     }
 }
 
-impl TimestampSecondType {

Review Comment:
   I think removing these methods will cause non-trivial downstream friction, as they are used quite extensively by DataFusion, etc...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org