You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "berkaysynnada (via GitHub)" <gi...@apache.org> on 2023/03/28 14:58:45 UTC

[GitHub] [arrow-datafusion] berkaysynnada opened a new pull request, #5764: timestamp interval arithmetic query

berkaysynnada opened a new pull request, #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #5704 and #194.
   
   # Rationale for this change
   
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   We can handle such queries now:
   `SELECT val, ts1 - ts2 AS ts_diff FROM table_a`
   `SELECT val, interval1 - interval2 AS interval_diff FROM table_a`
   `SELECT val, ts1 - interval1 AS ts_interval_diff FROM table_a`
   `SELECT val, interval1 + ts1 AS interval_ts_sub FROM table_a`
   
   
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   
   
     | - | +
   -- | -- | --
   timestamp op timestamp (same type) | OK: second and millisecond types give results in daytime(day+millisecond), microsecond and nanosecond types give result in monthdaynano(month+day+nano, but month field is not used) | N/A
   timestamp op timestamp (different types) | N/A | N/A
   interval op interval (same type) | OK: operations are done field by field, gives the same type | OK: operations are done field by field, gives the same type
   interval op interval (different types) | OK: give result in monthdaynano type | OK: give result in monthdaynano type
   timestamp op interval | OK: give result in the type of the timestamp | OK: give result in the type of the timestamp
   interval op timestamp | N/A | OK: the same of timestamp + interval 
   
   
   Some match expressions in planner.rs, binary.rs, and datetime.rs are extended. Coerced types and allowable operations are shown in the table. 
   
   I try to use existing scalar value functions as much as possible to not duplicate. However, in arrow.rs, subtraction and addition functions are for numeric types, hence I need to add some functions to call with `binary` function. 
   
   In datetime.rs, the `evaluate` function was written to accept only "Array + Scalar" or "Scalar + Scalar" values to evaluate. It is extended to accept "Array + Array", and 4 different variations of that case (Timestamp op Timestamp, Interval op Interval, Timestamp op Interval, Interval op Timestamp) are implemented. "Array + Scalar" evaluations are done with `unary` function in arrow.rs, and I follow the similar pattern with `try_binary_op` function. `try_binary_op` function is a modified version of `binary` function in arrow-rs. The only difference is that it returns `Result` and creates the buffer with `try_from_trusted_len_iter`. Otherwise, we had to unwrap the op function sent to binary.
   
   # Are these changes tested?
   
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   Yes, there are tests for each match in timestamp.slt. However, tables with intervals cannot be created like `INTERVAL '1 second'`, since some work is needed in arrow-rs for casting. Timestamp difference case with timezone is also left in timestamp.rs because of a similar reason.
   
   # Are there any user-facing changes?
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1152867872


##########
datafusion/common/src/scalar.rs:
##########
@@ -919,12 +1056,14 @@ fn with_timezone_to_naive_datetime(
 /// This function creates the [`NaiveDateTime`] object corresponding to the
 /// given timestamp, whose tick size is specified by `UNIT_NANOS`.
 #[inline]
-fn ticks_to_naive_datetime<const UNIT_NANOS: i64>(ticks: i64) -> Result<NaiveDateTime> {
-    NaiveDateTime::from_timestamp_opt(
-        (ticks * UNIT_NANOS) / 1_000_000_000,
-        ((ticks * UNIT_NANOS) % 1_000_000_000) as u32,
-    )
-    .ok_or_else(|| {
+fn ticks_to_naive_datetime<const UNIT_NANOS: i128>(ticks: i64) -> Result<NaiveDateTime> {

Review Comment:
   I think if we can't parse the timezone we should return an error, silently continuing I think would potentially be quite confusing



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1152826516


##########
datafusion/common/src/scalar.rs:
##########
@@ -919,12 +1056,14 @@ fn with_timezone_to_naive_datetime(
 /// This function creates the [`NaiveDateTime`] object corresponding to the
 /// given timestamp, whose tick size is specified by `UNIT_NANOS`.
 #[inline]
-fn ticks_to_naive_datetime<const UNIT_NANOS: i64>(ticks: i64) -> Result<NaiveDateTime> {
-    NaiveDateTime::from_timestamp_opt(
-        (ticks * UNIT_NANOS) / 1_000_000_000,
-        ((ticks * UNIT_NANOS) % 1_000_000_000) as u32,
-    )
-    .ok_or_else(|| {
+fn ticks_to_naive_datetime<const UNIT_NANOS: i128>(ticks: i64) -> Result<NaiveDateTime> {

Review Comment:
   If one of the timestamps has an incompatible timezone, or has not timezone data, they are both treated as they don't have a timezone data. Is it an acceptable behavior?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1490122844

   @tustvold I tried to fix the issues you mention, can you please take a quick look?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1492240696

   >  I plan to insert that removal work to sym hash join PR, if it is not a problem.
   
   If possible, I would recommend a separate PR (that your sym hash join builds on) that moves the code -- this should speed up reviews as each will be smaller and more focused


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1492214820

   > Thank you @berkaysynnada 🙇 . I think this issue:
   > 
   > > removing the arithmetic code to binary.rs
   > 
   > This is the most valuable part in my opinion as it pays down tech debt and sets us up for a easier migration / porting of the code upstream to arrow-rs -- and since you probably still have all the timestamp / kernel context in your head, you are probably likely to do it more quickly than someone who needs to get up to speed
   
   I am working on symmetric hash join with temporal type inputs, and hence I need to modify `evaluate_array` function in datetime.rs, where the evaluations of Array vs. Scalar values are done (newly added match arms here also use some of these arithmetic functions). I plan to insert that removal work to sym hash join PR, if it is not a problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on pull request #5764: timestamp interval arithmetic query

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1488268241

   If I add 1 month to `2023-03-01T00:00:00 +02` is the output `2023-04-01T00:00:00 +02` or `2023-03-29T00:00:00 +02`.
   
   I _believe_ this PR implements the latter as it performs arithmetic with respect to the UTC epoch? Is that the desired behaviour?
   
   https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=11112707c7217ecca3ef64ceb984beb6 contains an example showing the difference


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5764: timestamp interval arithmetic query

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1151156279


##########
datafusion/physical-expr/src/expressions/datetime.rs:
##########
@@ -239,6 +251,542 @@ pub fn evaluate_array(
     Ok(ColumnarValue::Array(ret))
 }
 
+macro_rules! ts_sub_op {
+    ($lhs:ident, $rhs:ident, $lhs_tz:ident, $rhs_tz:ident, $coef:expr, $caster:expr, $op:expr, $ts_unit:expr, $mode:expr, $type_in:ty, $type_out:ty) => {{
+        let prim_array_lhs = $caster(&$lhs)?;
+        let prim_array_rhs = $caster(&$rhs)?;
+        let ret = Arc::new(try_binary_op::<$type_in, $type_in, _, $type_out>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |ts1, ts2| {
+                let (lhs_tz, rhs_tz) =
+                    (parse_timezones($lhs_tz), parse_timezones($rhs_tz));
+                Ok($op(
+                    $ts_unit(&with_timezone_to_naive_datetime::<$mode>(
+                        ts1.mul_wrapping($coef),
+                        &lhs_tz,
+                    )?),
+                    $ts_unit(&with_timezone_to_naive_datetime::<$mode>(
+                        ts2.mul_wrapping($coef),
+                        &rhs_tz,
+                    )?),
+                ))
+            },
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! interval_op {
+    ($lhs:ident, $rhs:ident, $caster:expr, $op:expr, $sign:ident, $type_in:ty) => {{
+        let prim_array_lhs = $caster(&$lhs)?;
+        let prim_array_rhs = $caster(&$rhs)?;
+        let ret = Arc::new(binary::<$type_in, $type_in, _, $type_in>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |interval1, interval2| $op(interval1, interval2, $sign),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! interval_cross_op {
+    ($lhs:ident, $rhs:ident, $caster1:expr, $caster2:expr, $op:expr, $sign:ident, $commute:ident, $type_in1:ty, $type_in2:ty) => {{
+        let prim_array_lhs = $caster1(&$lhs)?;
+        let prim_array_rhs = $caster2(&$rhs)?;
+        let ret = Arc::new(binary::<$type_in1, $type_in2, _, IntervalMonthDayNanoType>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |interval1, interval2| $op(interval1, interval2, $sign, $commute),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! ts_interval_op {
+    ($lhs:ident, $rhs:ident, $caster1:expr, $caster2:expr, $op:expr, $sign:ident, $type_in1:ty, $type_in2:ty) => {{
+        let prim_array_lhs = $caster1(&$lhs)?;
+        let prim_array_rhs = $caster2(&$rhs)?;
+        let ret = Arc::new(try_binary_op::<$type_in1, $type_in2, _, $type_in1>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |ts, interval| Ok($op(ts, interval as i128, $sign)?),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+// This function evaluates temporal array operations, such as timestamp - timestamp, interval + interval,
+// timestamp + interval, and interval + timestamp. It takes two arrays as input and an integer sign representing
+// the operation (+1 for addition and -1 for subtraction). It returns a ColumnarValue as output, which can hold
+// either a scalar or an array.
+pub fn evaluate_temporal_arrays(
+    array_lhs: &ArrayRef,
+    sign: i32,
+    array_rhs: &ArrayRef,
+) -> Result<ColumnarValue> {
+    let ret = match (array_lhs.data_type(), array_rhs.data_type()) {
+        // Timestamp - Timestamp operations, operands of only the same types are supported.
+        (DataType::Timestamp(_, _), DataType::Timestamp(_, _)) => {
+            ts_array_op(array_lhs, array_rhs)?
+        }
+        // Interval (+ , -) Interval operations
+        (DataType::Interval(_), DataType::Interval(_)) => {
+            interval_array_op(array_lhs, array_rhs, sign)?
+        }
+        // Timestamp (+ , -) Interval and Interval + Timestamp operations
+        // Interval - Timestamp operation is not rational hence not supported
+        (DataType::Timestamp(_, _), DataType::Interval(_)) => {
+            ts_interval_array_op(array_lhs, sign, array_rhs)?
+        }
+        (DataType::Interval(_), DataType::Timestamp(_, _)) if sign == 1 => {
+            ts_interval_array_op(array_rhs, sign, array_lhs)?
+        }
+        (_, _) => Err(DataFusionError::Execution(format!(
+            "Invalid array types for DateIntervalExpr: {:?} {} {:?}",
+            array_lhs.data_type(),
+            sign,
+            array_rhs.data_type()
+        )))?,
+    };
+    Ok(ColumnarValue::Array(ret))
+}
+
+#[inline]
+unsafe fn build_primitive_array<O: ArrowPrimitiveType>(
+    len: usize,
+    buffer: Buffer,
+    null_count: usize,
+    null_buffer: Option<Buffer>,
+) -> PrimitiveArray<O> {
+    PrimitiveArray::from(ArrayData::new_unchecked(
+        O::DATA_TYPE,
+        len,
+        Some(null_count),
+        null_buffer,
+        0,
+        vec![buffer],
+        vec![],
+    ))
+}
+
+pub fn try_binary_op<A, B, F, O>(

Review Comment:
   https://docs.rs/arrow-arith/latest/arrow_arith/arity/fn.try_binary.html



##########
datafusion/common/src/scalar.rs:
##########
@@ -885,24 +1011,35 @@ fn ts_sub_to_interval(
     }
 }
 
+// This function parses the timezone from string to Tz.
+// If it cannot parse or timezone field is [`None`], it returns [`None`].
+pub fn parse_timezones(tz: &Option<String>) -> Option<Tz> {
+    if let Some(tz) = tz {
+        let parsed_tz: Option<Tz> = FromStr::from_str(tz)
+            .map_err(|_| {

Review Comment:
   Why map_err only to discard it?



##########
datafusion/common/src/scalar.rs:
##########
@@ -919,12 +1056,14 @@ fn with_timezone_to_naive_datetime(
 /// This function creates the [`NaiveDateTime`] object corresponding to the
 /// given timestamp, whose tick size is specified by `UNIT_NANOS`.
 #[inline]
-fn ticks_to_naive_datetime<const UNIT_NANOS: i64>(ticks: i64) -> Result<NaiveDateTime> {
-    NaiveDateTime::from_timestamp_opt(
-        (ticks * UNIT_NANOS) / 1_000_000_000,
-        ((ticks * UNIT_NANOS) % 1_000_000_000) as u32,
-    )
-    .ok_or_else(|| {
+fn ticks_to_naive_datetime<const UNIT_NANOS: i128>(ticks: i64) -> Result<NaiveDateTime> {

Review Comment:
   https://docs.rs/arrow-array/latest/arrow_array/temporal_conversions/fn.as_datetime.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1153265676


##########
datafusion/common/src/scalar.rs:
##########
@@ -1011,18 +1011,82 @@ fn ts_sub_to_interval<const TIME_MOD: bool>(
     }
 }
 
-// This function parses the timezone from string to Tz.
-// If it cannot parse or timezone field is [`None`], it returns [`None`].
-pub fn parse_timezones(tz: &Option<String>) -> Option<Tz> {
+/// This function parses the timezone from string to Tz.
+/// If it cannot parse or timezone field is [`None`], it returns [`None`].
+pub fn parse_timezones(tz: &Option<String>) -> Result<Option<Tz>> {
     if let Some(tz) = tz {
-        let parsed_tz: Option<Tz> = FromStr::from_str(tz)
-            .map_err(|_| {
-                DataFusionError::Execution("cannot parse given timezone".to_string())
-            })
-            .ok();
-        parsed_tz
+        let parsed_tz: Tz = FromStr::from_str(tz).map_err(|_| {
+            DataFusionError::Execution("cannot parse given timezone".to_string())

Review Comment:
   It would be nice if the error contained the problematic timezone. Something like
   ```rust
           let parsed_tz: Tz = FromStr::from_str(tz).map_err(|e| {
               DataFusionError::Execution(format!("cannot parse '{tz}' as timezone: {e}".to_string())
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1153379384


##########
datafusion/physical-expr/Cargo.toml:
##########
@@ -44,6 +44,7 @@ unicode_expressions = ["unicode-segmentation"]
 [dependencies]
 ahash = { version = "0.8", default-features = false, features = ["runtime-rng"] }
 arrow = { workspace = true }
+arrow-array = { version = "34.0.0", default-features = false, features = ["chrono-tz"] }

Review Comment:
   In https://github.com/apache/arrow-datafusion/pull/5794



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: timestamp interval arithmetic query

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1488320187

   > > If I add 1 month to 2023-03-01T00:00:00 +02 is the output 2023-04-01T00:00:00 +02 or 2023-03-29T00:00:00 +02.
   > 
   > If I'm not mistaken, pgsql's result will be the former.
   
   Yes, it alters only the month and year fields for month arithmetics. This PR behaves in the same way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5764: timestamp interval arithmetic query

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1487566497

   I plan to review this carefully tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1153370777


##########
datafusion/core/tests/sqllogictests/test_files/timestamps.slt:
##########
@@ -261,6 +261,110 @@ SELECT INTERVAL '8' MONTH + '2000-01-01T00:00:00'::timestamp;
 ----
 2000-09-01T00:00:00
 
+# Interval columns are created with timestamp subtraction in subquery since they are not supported yet

Review Comment:
   I think with https://github.com/apache/arrow-datafusion/pull/5792 we can now write better tests here -- specifically we can create interval constants. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5764: timestamp interval arithmetic query

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1151698714


##########
datafusion/physical-expr/src/expressions/datetime.rs:
##########
@@ -239,6 +251,542 @@ pub fn evaluate_array(
     Ok(ColumnarValue::Array(ret))
 }
 
+macro_rules! ts_sub_op {
+    ($lhs:ident, $rhs:ident, $lhs_tz:ident, $rhs_tz:ident, $coef:expr, $caster:expr, $op:expr, $ts_unit:expr, $mode:expr, $type_in:ty, $type_out:ty) => {{
+        let prim_array_lhs = $caster(&$lhs)?;
+        let prim_array_rhs = $caster(&$rhs)?;
+        let ret = Arc::new(try_binary_op::<$type_in, $type_in, _, $type_out>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |ts1, ts2| {
+                let (lhs_tz, rhs_tz) =
+                    (parse_timezones($lhs_tz), parse_timezones($rhs_tz));
+                Ok($op(
+                    $ts_unit(&with_timezone_to_naive_datetime::<$mode>(
+                        ts1.mul_wrapping($coef),
+                        &lhs_tz,
+                    )?),
+                    $ts_unit(&with_timezone_to_naive_datetime::<$mode>(
+                        ts2.mul_wrapping($coef),
+                        &rhs_tz,
+                    )?),
+                ))
+            },
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! interval_op {
+    ($lhs:ident, $rhs:ident, $caster:expr, $op:expr, $sign:ident, $type_in:ty) => {{
+        let prim_array_lhs = $caster(&$lhs)?;
+        let prim_array_rhs = $caster(&$rhs)?;
+        let ret = Arc::new(binary::<$type_in, $type_in, _, $type_in>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |interval1, interval2| $op(interval1, interval2, $sign),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! interval_cross_op {
+    ($lhs:ident, $rhs:ident, $caster1:expr, $caster2:expr, $op:expr, $sign:ident, $commute:ident, $type_in1:ty, $type_in2:ty) => {{
+        let prim_array_lhs = $caster1(&$lhs)?;
+        let prim_array_rhs = $caster2(&$rhs)?;
+        let ret = Arc::new(binary::<$type_in1, $type_in2, _, IntervalMonthDayNanoType>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |interval1, interval2| $op(interval1, interval2, $sign, $commute),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! ts_interval_op {
+    ($lhs:ident, $rhs:ident, $caster1:expr, $caster2:expr, $op:expr, $sign:ident, $type_in1:ty, $type_in2:ty) => {{
+        let prim_array_lhs = $caster1(&$lhs)?;
+        let prim_array_rhs = $caster2(&$rhs)?;
+        let ret = Arc::new(try_binary_op::<$type_in1, $type_in2, _, $type_in1>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |ts, interval| Ok($op(ts, interval as i128, $sign)?),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+// This function evaluates temporal array operations, such as timestamp - timestamp, interval + interval,
+// timestamp + interval, and interval + timestamp. It takes two arrays as input and an integer sign representing
+// the operation (+1 for addition and -1 for subtraction). It returns a ColumnarValue as output, which can hold
+// either a scalar or an array.
+pub fn evaluate_temporal_arrays(
+    array_lhs: &ArrayRef,
+    sign: i32,
+    array_rhs: &ArrayRef,
+) -> Result<ColumnarValue> {
+    let ret = match (array_lhs.data_type(), array_rhs.data_type()) {
+        // Timestamp - Timestamp operations, operands of only the same types are supported.
+        (DataType::Timestamp(_, _), DataType::Timestamp(_, _)) => {
+            ts_array_op(array_lhs, array_rhs)?
+        }
+        // Interval (+ , -) Interval operations
+        (DataType::Interval(_), DataType::Interval(_)) => {
+            interval_array_op(array_lhs, array_rhs, sign)?
+        }
+        // Timestamp (+ , -) Interval and Interval + Timestamp operations
+        // Interval - Timestamp operation is not rational hence not supported
+        (DataType::Timestamp(_, _), DataType::Interval(_)) => {
+            ts_interval_array_op(array_lhs, sign, array_rhs)?
+        }
+        (DataType::Interval(_), DataType::Timestamp(_, _)) if sign == 1 => {
+            ts_interval_array_op(array_rhs, sign, array_lhs)?
+        }
+        (_, _) => Err(DataFusionError::Execution(format!(
+            "Invalid array types for DateIntervalExpr: {:?} {} {:?}",
+            array_lhs.data_type(),
+            sign,
+            array_rhs.data_type()
+        )))?,
+    };
+    Ok(ColumnarValue::Array(ret))
+}
+
+#[inline]
+unsafe fn build_primitive_array<O: ArrowPrimitiveType>(
+    len: usize,
+    buffer: Buffer,
+    null_count: usize,
+    null_buffer: Option<Buffer>,
+) -> PrimitiveArray<O> {
+    PrimitiveArray::from(ArrayData::new_unchecked(
+        O::DATA_TYPE,
+        len,
+        Some(null_count),
+        null_buffer,
+        0,
+        vec![buffer],
+        vec![],
+    ))
+}
+
+pub fn try_binary_op<A, B, F, O>(
+    a: &PrimitiveArray<A>,
+    b: &PrimitiveArray<B>,
+    op: F,
+) -> Result<PrimitiveArray<O>, ArrowError>
+where
+    A: ArrowPrimitiveType,
+    B: ArrowPrimitiveType,
+    O: ArrowPrimitiveType,
+    F: Fn(A::Native, B::Native) -> Result<O::Native, ArrowError>,
+{
+    if a.len() != b.len() {
+        return Err(ArrowError::ComputeError(
+            "Cannot perform binary operation on arrays of different length".to_string(),
+        ));
+    }
+    let len = a.len();
+
+    if a.is_empty() {
+        return Ok(PrimitiveArray::from(ArrayData::new_empty(&O::DATA_TYPE)));
+    }
+
+    let null_buffer = combine_option_bitmap(&[a.data(), b.data()], len);
+    let null_count = null_buffer
+        .as_ref()
+        .map(|x| len - x.count_set_bits_offset(0, len))
+        .unwrap_or_default();
+
+    let values = a.values().iter().zip(b.values()).map(|(l, r)| op(*l, *r));
+    // JUSTIFICATION
+    //  Benefit
+    //      ~60% speedup
+    //  Soundness
+    //      `values` is an iterator with a known size from a PrimitiveArray
+    let buffer = unsafe { Buffer::try_from_trusted_len_iter(values) }?;
+
+    Ok(unsafe { build_primitive_array(len, buffer, null_count, null_buffer) })
+}
+
+/// Performs a timestamp subtraction operation on two arrays and returns the resulting array.
+fn ts_array_op(array_lhs: &ArrayRef, array_rhs: &ArrayRef) -> Result<ArrayRef> {
+    match (array_lhs.data_type(), array_rhs.data_type()) {
+        (
+            DataType::Timestamp(TimeUnit::Second, opt_tz_lhs),
+            DataType::Timestamp(TimeUnit::Second, opt_tz_rhs),
+        ) => Ok(ts_sub_op!(
+            array_lhs,
+            array_rhs,
+            opt_tz_lhs,
+            opt_tz_rhs,
+            1000i64,
+            as_timestamp_second_array,
+            seconds_sub,
+            NaiveDateTime::timestamp,
+            MILLISECOND_MODE,
+            TimestampSecondType,
+            IntervalDayTimeType
+        )),
+        (
+            DataType::Timestamp(TimeUnit::Millisecond, opt_tz_lhs),
+            DataType::Timestamp(TimeUnit::Millisecond, opt_tz_rhs),
+        ) => Ok(ts_sub_op!(
+            array_lhs,
+            array_rhs,
+            opt_tz_lhs,
+            opt_tz_rhs,
+            1i64,
+            as_timestamp_millisecond_array,
+            milliseconds_sub,
+            NaiveDateTime::timestamp_millis,
+            MILLISECOND_MODE,
+            TimestampMillisecondType,
+            IntervalDayTimeType
+        )),
+        (
+            DataType::Timestamp(TimeUnit::Microsecond, opt_tz_lhs),
+            DataType::Timestamp(TimeUnit::Microsecond, opt_tz_rhs),
+        ) => Ok(ts_sub_op!(
+            array_lhs,
+            array_rhs,
+            opt_tz_lhs,
+            opt_tz_rhs,
+            1000i64,
+            as_timestamp_microsecond_array,
+            microseconds_sub,
+            NaiveDateTime::timestamp_micros,
+            NANOSECOND_MODE,
+            TimestampMicrosecondType,
+            IntervalMonthDayNanoType
+        )),
+        (
+            DataType::Timestamp(TimeUnit::Nanosecond, opt_tz_lhs),
+            DataType::Timestamp(TimeUnit::Nanosecond, opt_tz_rhs),
+        ) => Ok(ts_sub_op!(
+            array_lhs,
+            array_rhs,
+            opt_tz_lhs,
+            opt_tz_rhs,
+            1i64,
+            as_timestamp_nanosecond_array,
+            nanoseconds_sub,
+            NaiveDateTime::timestamp_nanos,
+            NANOSECOND_MODE,
+            TimestampNanosecondType,
+            IntervalMonthDayNanoType
+        )),
+        (_, _) => Err(DataFusionError::Execution(format!(
+            "Invalid array types for Timestamp subtraction: {:?} - {:?}",
+            array_lhs.data_type(),
+            array_rhs.data_type()
+        ))),
+    }
+}
+/// Performs an interval operation on two arrays and returns the resulting array.
+/// The operation sign determines whether to perform addition or subtraction.
+/// The data type and unit of the two input arrays must match the supported combinations.
+fn interval_array_op(
+    array_lhs: &ArrayRef,
+    array_rhs: &ArrayRef,
+    sign: i32,
+) -> Result<ArrayRef> {
+    match (array_lhs.data_type(), array_rhs.data_type()) {
+        (
+            DataType::Interval(IntervalUnit::YearMonth),
+            DataType::Interval(IntervalUnit::YearMonth),
+        ) => Ok(interval_op!(
+            array_lhs,
+            array_rhs,
+            as_interval_ym_array,
+            op_ym,
+            sign,
+            IntervalYearMonthType
+        )),
+        (
+            DataType::Interval(IntervalUnit::YearMonth),
+            DataType::Interval(IntervalUnit::DayTime),
+        ) => Ok(interval_cross_op!(
+            array_lhs,
+            array_rhs,
+            as_interval_ym_array,
+            as_interval_dt_array,
+            op_ym_dt,
+            sign,
+            false,
+            IntervalYearMonthType,
+            IntervalDayTimeType
+        )),
+        (
+            DataType::Interval(IntervalUnit::YearMonth),
+            DataType::Interval(IntervalUnit::MonthDayNano),
+        ) => Ok(interval_cross_op!(
+            array_lhs,
+            array_rhs,
+            as_interval_ym_array,
+            as_interval_mdn_array,
+            op_ym_mdn,
+            sign,
+            false,
+            IntervalYearMonthType,
+            IntervalMonthDayNanoType
+        )),
+        (
+            DataType::Interval(IntervalUnit::DayTime),
+            DataType::Interval(IntervalUnit::YearMonth),
+        ) => Ok(interval_cross_op!(
+            array_rhs,
+            array_lhs,
+            as_interval_ym_array,
+            as_interval_dt_array,
+            op_ym_dt,
+            sign,
+            true,
+            IntervalYearMonthType,
+            IntervalDayTimeType
+        )),
+        (
+            DataType::Interval(IntervalUnit::DayTime),
+            DataType::Interval(IntervalUnit::DayTime),
+        ) => Ok(interval_op!(
+            array_lhs,
+            array_rhs,
+            as_interval_dt_array,
+            op_dt,
+            sign,
+            IntervalDayTimeType
+        )),
+        (
+            DataType::Interval(IntervalUnit::DayTime),
+            DataType::Interval(IntervalUnit::MonthDayNano),
+        ) => Ok(interval_cross_op!(
+            array_lhs,
+            array_rhs,
+            as_interval_dt_array,
+            as_interval_mdn_array,
+            op_dt_mdn,
+            sign,
+            false,
+            IntervalDayTimeType,
+            IntervalMonthDayNanoType
+        )),
+        (
+            DataType::Interval(IntervalUnit::MonthDayNano),
+            DataType::Interval(IntervalUnit::YearMonth),
+        ) => Ok(interval_cross_op!(
+            array_rhs,
+            array_lhs,
+            as_interval_ym_array,
+            as_interval_mdn_array,
+            op_ym_mdn,
+            sign,
+            true,
+            IntervalYearMonthType,
+            IntervalMonthDayNanoType
+        )),
+        (
+            DataType::Interval(IntervalUnit::MonthDayNano),
+            DataType::Interval(IntervalUnit::DayTime),
+        ) => Ok(interval_cross_op!(
+            array_rhs,
+            array_lhs,
+            as_interval_dt_array,
+            as_interval_mdn_array,
+            op_dt_mdn,
+            sign,
+            true,
+            IntervalDayTimeType,
+            IntervalMonthDayNanoType
+        )),
+        (
+            DataType::Interval(IntervalUnit::MonthDayNano),
+            DataType::Interval(IntervalUnit::MonthDayNano),
+        ) => Ok(interval_op!(
+            array_lhs,
+            array_rhs,
+            as_interval_mdn_array,
+            op_mdn,
+            sign,
+            IntervalMonthDayNanoType
+        )),
+        (_, _) => Err(DataFusionError::Execution(format!(
+            "Invalid array types for Interval operation: {:?} {} {:?}",
+            array_lhs.data_type(),
+            sign,
+            array_rhs.data_type()
+        ))),
+    }
+}
+/// Performs a timestamp/interval operation on two arrays and returns the resulting array.
+/// The operation sign determines whether to perform addition or subtraction.
+/// The data type and unit of the two input arrays must match the supported combinations.
+fn ts_interval_array_op(

Review Comment:
   I think this discards the timezone of the input



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] doki23 commented on pull request #5764: timestamp interval arithmetic query

Posted by "doki23 (via GitHub)" <gi...@apache.org>.
doki23 commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1488289654

   > If I add 1 month to 2023-03-01T00:00:00 +02 is the output 2023-04-01T00:00:00 +02 or 2023-03-29T00:00:00 +02.
   
   If I'm not mistaken, pgsql's result will be the former.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1490585449

   > 
   
   Thanks for the support. I add these issues to my to-do's and will open the PRs as I progress.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1491548025

   > 
   
   
   
   > > @alamb Thanks for the support. I add these issues to my to-do's and will open the PRs as I progress.
   > 
   > Thanks @berkaysynnada -- can you be specific about which items you have added to the todo list?
   
   I meant #5803, which you have completed, and removing the arithmetic code to binary.rs, but I can spare time for the issues that you see as relevant in #5753 and [#3958](https://github.com/apache/arrow-rs/issues/3958) 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: timestamp interval arithmetic query

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1488317572

   > If I add 1 month to `2023-03-01T00:00:00 +02` is the output `2023-04-01T00:00:00 +02` or `2023-03-29T00:00:00 +02`.
   > 
   > I _believe_ this PR implements the latter as it performs arithmetic with respect to the UTC epoch? Is that the desired behaviour?
   > 
   > https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=11112707c7217ecca3ef64ceb984beb6 contains an example showing the difference
   
   ```
   #[tokio::test]
   async fn interval_ts_add() -> Result<()> {
       let ctx = SessionContext::new();
       let table_a = {
           let schema = Arc::new(Schema::new(vec![
               Field::new("ts1", DataType::Timestamp(TimeUnit::Second, None), false),
               Field::new(
                   "interval1",
                   DataType::Interval(IntervalUnit::YearMonth),
                   false,
               ),
           ]));
           let array1 = PrimitiveArray::<TimestampSecondType>::from_iter_values(vec![
               1_677_628_800i64, // 2023-03-01T00:00:00
           ]);
           let array2 =
               PrimitiveArray::<IntervalYearMonthType>::from_iter_values(vec![1i32]);
           let data = RecordBatch::try_new(
               schema.clone(),
               vec![Arc::new(array1), Arc::new(array2)],
           )?;
           let table = MemTable::try_new(schema, vec![vec![data]])?;
           Arc::new(table)
       };
   
       ctx.register_table("table_a", table_a)?;
       let sql = "SELECT ts1, ts1 + interval1 from table_a";
       let actual = execute_to_batches(&ctx, sql).await;
       let expected = vec![
           "+---------------------+---------------------------------+",
           "| ts1                 | table_a.ts1 + table_a.interval1 |",
           "+---------------------+---------------------------------+",
           "| 2023-03-01T00:00:00 | 2023-04-01T00:00:00             |",
           "+---------------------+---------------------------------+",
       ];
       assert_batches_eq!(expected, &actual);
       Ok(())
   }
   ```
   The former result is produced, you can reproduce it with this test. Existing `do_date_time_math` functionality is adopted. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1152340222


##########
datafusion/core/tests/sqllogictests/test_files/timestamps.slt:
##########
@@ -237,4 +237,110 @@ SELECT INTERVAL '8' YEAR + '2000-01-01T00:00:00'::timestamp;
 query P
 SELECT INTERVAL '8' MONTH + '2000-01-01T00:00:00'::timestamp;
 ----
-2000-09-01T00:00:00
\ No newline at end of file
+2000-09-01T00:00:00
+
+# Interval columns are created with timestamp subtraction in subquery since they are not supported yet
+statement ok
+create table foo (val int, ts1 timestamp, ts2 timestamp) as values 
+(1, '2023-03-15T15:00:20.000000123'::timestamp, '2023-01-20T23:00:00.000000099'::timestamp),
+(2, '2023-02-28T12:01:55.000123456'::timestamp, '2000-02-23T11:00:00.123000001'::timestamp),
+(3, '2033-11-02T23:22:13.000123456'::timestamp, '1990-03-01T00:00:00.333000001'::timestamp),
+(4, '2003-07-11T01:31:15.000123456'::timestamp, '2045-04-11T15:00:00.000000001'::timestamp);
+
+# Timestamp - Timestamp
+query I?
+SELECT val, ts1 - ts2 FROM foo ORDER BY ts2 - ts1;
+----
+4 0 years 0 mons -15250 days -13 hours -28 mins -44.999876545 secs
+3 0 years 0 mons 15952 days 23 hours 22 mins 12.667123455 secs
+2 0 years 0 mons 8406 days 1 hours 1 mins 54.877123455 secs
+1 0 years 0 mons 53 days 16 hours 0 mins 20.000000024 secs
+
+# Interval - Interval
+query ?
+SELECT subq.interval1 - subq.interval2
+FROM (
+  SELECT ts1 - ts2 AS interval1,
+  ts2 - ts1 AS interval2
+  FROM foo
+) AS subq;
+----
+0 years 0 mons 106 days 32 hours 0 mins 40.000000048 secs
+0 years 0 mons 16812 days 2 hours 3 mins 49.754246910 secs
+0 years 0 mons 31904 days 46 hours 44 mins 25.334246910 secs
+0 years 0 mons -30500 days -26 hours -57 mins -29.999753090 secs
+
+# Interval + Interval
+query ?
+SELECT subq.interval1 + subq.interval2
+FROM (
+  SELECT ts1 - ts2 AS interval1,
+  ts2 - ts1 AS interval2
+  FROM foo
+) AS subq;
+----
+0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs
+0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs
+0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs
+0 years 0 mons 0 days 0 hours 0 mins 0.000000000 secs
+
+# Timestamp - Interval
+query P
+SELECT subq.ts1 - subq.interval1
+FROM (
+  SELECT ts1,
+  ts1 - ts2 AS interval1
+  FROM foo
+) AS subq;
+----
+2023-01-20T23:00:00.000000099
+2000-02-23T11:00:00.123000001
+1990-03-01T00:00:00.333000001
+2045-04-11T15:00:00.000000001
+
+# Interval + Timestamp
+query P
+SELECT subq.interval1 + subq.ts1
+FROM (
+  SELECT ts1,
+  ts1 - ts2 AS interval1
+  FROM foo
+) AS subq;
+----
+2023-05-08T07:00:40.000000147
+2046-03-05T13:03:49.877246911
+2077-07-07T22:44:25.667246911
+1961-10-08T12:02:30.000246911
+
+# Timestamp + Interval
+query P
+SELECT  subq.ts1 + subq.interval1
+FROM (
+  SELECT ts1,
+  ts1 - ts2 AS interval1
+  FROM foo
+) AS subq;
+----
+2023-05-08T07:00:40.000000147
+2046-03-05T13:03:49.877246911
+2077-07-07T22:44:25.667246911
+1961-10-08T12:02:30.000246911
+
+# Timestamp + Timestamp => error

Review Comment:
   👍 



##########
datafusion/physical-expr/src/expressions/datetime.rs:
##########
@@ -142,14 +138,30 @@ impl PhysicalExpr for DateTimeIntervalExpr {
                 return Err(DataFusionError::Internal(msg.to_string()));
             }
         };
+        // RHS is first checked. If it is a Scalar, there are 2 options:

Review Comment:
   Longer term I think it would be good to move the date_time arithmetic into https://github.com/apache/arrow-datafusion/tree/main/datafusion/physical-expr/src/expressions/binary as these really are binary operations
   
   That would also set us up so when the kernels are added to arrow-rs (aka part of https://github.com/apache/arrow-rs/issues/3958) it would be easier to migrate. 
   
   I like how this PR followed the existing pattern in `DateTimeIntervalExpr` even if that may not be our ideal end state



##########
datafusion/physical-expr/src/expressions/datetime.rs:
##########
@@ -239,6 +251,542 @@ pub fn evaluate_array(
     Ok(ColumnarValue::Array(ret))
 }
 
+macro_rules! ts_sub_op {
+    ($lhs:ident, $rhs:ident, $lhs_tz:ident, $rhs_tz:ident, $coef:expr, $caster:expr, $op:expr, $ts_unit:expr, $mode:expr, $type_in:ty, $type_out:ty) => {{
+        let prim_array_lhs = $caster(&$lhs)?;
+        let prim_array_rhs = $caster(&$rhs)?;
+        let ret = Arc::new(try_binary_op::<$type_in, $type_in, _, $type_out>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |ts1, ts2| {
+                let (lhs_tz, rhs_tz) =
+                    (parse_timezones($lhs_tz), parse_timezones($rhs_tz));
+                Ok($op(
+                    $ts_unit(&with_timezone_to_naive_datetime::<$mode>(
+                        ts1.mul_wrapping($coef),
+                        &lhs_tz,
+                    )?),
+                    $ts_unit(&with_timezone_to_naive_datetime::<$mode>(
+                        ts2.mul_wrapping($coef),
+                        &rhs_tz,
+                    )?),
+                ))
+            },
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! interval_op {
+    ($lhs:ident, $rhs:ident, $caster:expr, $op:expr, $sign:ident, $type_in:ty) => {{
+        let prim_array_lhs = $caster(&$lhs)?;
+        let prim_array_rhs = $caster(&$rhs)?;
+        let ret = Arc::new(binary::<$type_in, $type_in, _, $type_in>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |interval1, interval2| $op(interval1, interval2, $sign),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! interval_cross_op {
+    ($lhs:ident, $rhs:ident, $caster1:expr, $caster2:expr, $op:expr, $sign:ident, $commute:ident, $type_in1:ty, $type_in2:ty) => {{
+        let prim_array_lhs = $caster1(&$lhs)?;
+        let prim_array_rhs = $caster2(&$rhs)?;
+        let ret = Arc::new(binary::<$type_in1, $type_in2, _, IntervalMonthDayNanoType>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |interval1, interval2| $op(interval1, interval2, $sign, $commute),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+macro_rules! ts_interval_op {
+    ($lhs:ident, $rhs:ident, $caster1:expr, $caster2:expr, $op:expr, $sign:ident, $type_in1:ty, $type_in2:ty) => {{
+        let prim_array_lhs = $caster1(&$lhs)?;
+        let prim_array_rhs = $caster2(&$rhs)?;
+        let ret = Arc::new(try_binary_op::<$type_in1, $type_in2, _, $type_in1>(
+            prim_array_lhs,
+            prim_array_rhs,
+            |ts, interval| Ok($op(ts, interval as i128, $sign)?),
+        )?) as ArrayRef;
+        ret
+    }};
+}
+// This function evaluates temporal array operations, such as timestamp - timestamp, interval + interval,
+// timestamp + interval, and interval + timestamp. It takes two arrays as input and an integer sign representing
+// the operation (+1 for addition and -1 for subtraction). It returns a ColumnarValue as output, which can hold
+// either a scalar or an array.
+pub fn evaluate_temporal_arrays(
+    array_lhs: &ArrayRef,
+    sign: i32,
+    array_rhs: &ArrayRef,
+) -> Result<ColumnarValue> {
+    let ret = match (array_lhs.data_type(), array_rhs.data_type()) {
+        // Timestamp - Timestamp operations, operands of only the same types are supported.
+        (DataType::Timestamp(_, _), DataType::Timestamp(_, _)) => {
+            ts_array_op(array_lhs, array_rhs)?
+        }
+        // Interval (+ , -) Interval operations
+        (DataType::Interval(_), DataType::Interval(_)) => {
+            interval_array_op(array_lhs, array_rhs, sign)?
+        }
+        // Timestamp (+ , -) Interval and Interval + Timestamp operations
+        // Interval - Timestamp operation is not rational hence not supported
+        (DataType::Timestamp(_, _), DataType::Interval(_)) => {
+            ts_interval_array_op(array_lhs, sign, array_rhs)?
+        }
+        (DataType::Interval(_), DataType::Timestamp(_, _)) if sign == 1 => {
+            ts_interval_array_op(array_rhs, sign, array_lhs)?
+        }
+        (_, _) => Err(DataFusionError::Execution(format!(
+            "Invalid array types for DateIntervalExpr: {:?} {} {:?}",
+            array_lhs.data_type(),
+            sign,
+            array_rhs.data_type()
+        )))?,
+    };
+    Ok(ColumnarValue::Array(ret))
+}
+
+#[inline]
+unsafe fn build_primitive_array<O: ArrowPrimitiveType>(
+    len: usize,
+    buffer: Buffer,
+    null_count: usize,
+    null_buffer: Option<Buffer>,
+) -> PrimitiveArray<O> {
+    PrimitiveArray::from(ArrayData::new_unchecked(
+        O::DATA_TYPE,
+        len,
+        Some(null_count),
+        null_buffer,
+        0,
+        vec![buffer],
+        vec![],
+    ))
+}
+
+pub fn try_binary_op<A, B, F, O>(

Review Comment:
   I tried this and it seems to work well. @berkaysynnada  I made a PR here for your consideration https://github.com/synnada-ai/arrow-datafusion/pull/68
   
   It also has the very nice property of removing `unsafe` code from this PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on pull request #5764: timestamp interval arithmetic query

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1488333351

   > The former result is produced
   
   This example has a timezone of `None`, not `+02:00` as is necessary to demonstrate the potential inconsistency?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: timestamp interval arithmetic query

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1487066242

   @alamb Is it better to carry the modified binary function `try_binary_op` to arrow? If you agree we can create a quick PR. Can you also give me a suggestion about how to handle when an incorrect timezone is given?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: timestamp interval arithmetic query

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1488467743

   > > The former result is produced
   > 
   > This example has a timezone of `None`, not `+02:00` as is necessary to demonstrate the potential inconsistency?
   
   Now I understand what you mean. Postgre gives the result as in the former. If no objection, I will fix it that way.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1153371981


##########
datafusion/physical-expr/Cargo.toml:
##########
@@ -44,6 +44,7 @@ unicode_expressions = ["unicode-segmentation"]
 [dependencies]
 ahash = { version = "0.8", default-features = false, features = ["runtime-rng"] }
 arrow = { workspace = true }
+arrow-array = { version = "34.0.0", default-features = false, features = ["chrono-tz"] }

Review Comment:
   The rest of datafusion now uses arrow 36, but this uses arrow 34
   
   ```suggestion
   arrow-array = { workspace = true }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1490735934

   >  @alamb Thanks for the support. I add these issues to my to-do's and will open the PRs as I progress.
   
   Thanks @berkaysynnada  -- can you be specific about which items you have added to the todo list?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] berkaysynnada commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "berkaysynnada (via GitHub)" <gi...@apache.org>.
berkaysynnada commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1489784338

   > First of all, thank you so much @berkaysynnada
   > 
   > I think this is a significant improvement to DataFusion -- while longer term I would prefer to see the interval arithmetic logic moved into arrow-rs, starting with an implementation in the DataFusion repo has worked well in the past and I think will work well here too.
   > 
   > Can you please respond to @tustvold 's comments? I think they are good questions, but then I think we could merge this PR and file a follow on tickets
   > 
   > 1. Move the arithmetic code into binary.rs (following the existing models, as a step towards getting them upstream in arrow).
   > 2. File a ticket about not handling timezones properly
   > 
   > cc @waitingkuo @avantgardnerio @andygrove @liukun4515
   
   I am working on @tustvold 's comments, and when I finalize them I will commit. Thanks for the support of `try_binary`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#discussion_r1153266211


##########
datafusion/physical-expr/src/expressions/datetime.rs:
##########
@@ -348,63 +340,6 @@ pub fn evaluate_temporal_arrays(
     Ok(ColumnarValue::Array(ret))
 }
 
-#[inline]

Review Comment:
   🎉 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5764: Support timestamp and interval arithmetic

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5764:
URL: https://github.com/apache/arrow-datafusion/pull/5764#issuecomment-1491998637

   Thank you @berkaysynnada 🙇  . I think this issue:
   
   >  removing the arithmetic code to binary.rs
   
   This is the most valuable part in my opinion as it  pays down tech debt and  sets us up for a easier migration / porting of the code upstream to arrow-rs -- and since you probably still have all the timestamp / kernel context in your head, you are probably likely to do it more quickly than someone who needs to get up to speed
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org