You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "Weijun-H (via GitHub)" <gi...@apache.org> on 2023/04/05 09:56:11 UTC

[GitHub] [arrow-rs] Weijun-H opened a new pull request, #4020: feat: cast from/to interval and duration

Weijun-H opened a new pull request, #4020:
URL: https://github.com/apache/arrow-rs/pull/4020

   # Which issue does this PR close?
   
   <!--
   We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123.
   -->
   
   Closes #3998
   
   # Rationale for this change
    
   <!--
   Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
   Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.
   -->
   
   # What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   # Are there any user-facing changes?
   
   
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1158899802


##########
arrow-cast/src/cast.rs:
##########
@@ -458,6 +460,93 @@ where
     }
 }
 
+/// Cast the array from interval to duration
+fn cast_interval_to_duration<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<IntervalMonthDayNanoArray>()
+        .unwrap();
+
+    let mut builder = PrimitiveBuilder::<D>::new();
+
+    let scale = match D::DATA_TYPE {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {
+            let v = array.value(i) / scale;
+            if v > i64::MAX as i128 {
+                if cast_options.safe {
+                    builder.append_null();
+                } else {
+                    return Err(ArrowError::ComputeError(format!(
+                        "Cannot cast to {:?}. Overflowing on {:?}",
+                        D::DATA_TYPE,
+                        v
+                    )));
+                }
+            } else {
+                builder.append_value(v as i64);
+            }
+        }
+    }
+
+    Ok(Arc::new(builder.finish()) as ArrayRef)
+}
+
+/// Cast the array from duration and interval
+fn cast_duration_to_interval<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array.as_any().downcast_ref::<PrimitiveArray<D>>().unwrap();
+
+    let mut builder = IntervalMonthDayNanoBuilder::new();
+
+    let scale = match array.data_type() {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {
+            let v = array.value(i) as i128;
+            let v = v.mul_checked(scale);
+            match v {
+                Ok(v) => builder.append_value(v),
+                Err(e) => {
+                    if cast_options.safe {
+                        builder.append_null()
+                    } else {
+                        return Err(ArrowError::ComputeError(format!(
+                            "Cannot cast to {:?}. Overflowing on {:?}",
+                            D::DATA_TYPE,
+                            e

Review Comment:
   `e` is `ArrowError`, so you will get nested `ArrowError`. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] Weijun-H commented on pull request #4020: feat: cast from/to interval and duration

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#issuecomment-1510298083

   > I think this PR is ready to go. I don't totally understand why [#4020 (files)](https://github.com/apache/arrow-rs/pull/4020/files#r1167579187) is not needed, but the test coverage
   > 
   > ```
   >         let array = vec![i64::MAX];
   >         let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
   >             array.clone(),
   >             &DEFAULT_CAST_OPTIONS,
   >         )
   >         .unwrap();
   >         assert!(!casted_array.is_valid(0));
   > ```
   > 
   > covers the case I was thinking of so 👍
   > 
   > Thank you @Weijun-H
   
   Because v is `i64` in [#4020 (files)](https://github.com/apache/arrow-rs/pull/4020/files#r1167579187), `checked_mul` will return `None` when overflowing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1158894954


##########
arrow-cast/src/cast.rs:
##########
@@ -458,6 +460,93 @@ where
     }
 }
 
+/// Cast the array from interval to duration
+fn cast_interval_to_duration<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<IntervalMonthDayNanoArray>()
+        .unwrap();
+
+    let mut builder = PrimitiveBuilder::<D>::new();
+
+    let scale = match D::DATA_TYPE {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {
+            let v = array.value(i) / scale;
+            if v > i64::MAX as i128 {
+                if cast_options.safe {
+                    builder.append_null();
+                } else {
+                    return Err(ArrowError::ComputeError(format!(
+                        "Cannot cast to {:?}. Overflowing on {:?}",
+                        D::DATA_TYPE,
+                        v
+                    )));
+                }
+            } else {
+                builder.append_value(v as i64);
+            }
+        }
+    }
+
+    Ok(Arc::new(builder.finish()) as ArrayRef)
+}
+
+/// Cast the array from duration and interval
+fn cast_duration_to_interval<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array.as_any().downcast_ref::<PrimitiveArray<D>>().unwrap();

Review Comment:
   Can we propagate the error?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1167582233


##########
arrow-cast/src/cast.rs:
##########
@@ -8430,6 +8430,13 @@ mod tests {
         assert_eq!(casted_array.value(0), 1234567000000000);
 
         let array = vec![i64::MAX];
+        let casted_array = cast_from_duration_to_interval::<DurationSecondType>(
+            array.clone(),
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
+        assert_eq!(casted_array.value(0), 0);

Review Comment:
   I think this check should be that the casted_array.valid(0) is false, right? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1167485528


##########
arrow-cast/src/cast.rs:
##########
@@ -8386,136 +8386,219 @@ mod tests {
         assert_eq!("Invalid argument error: 1234567000 is too large to store in a Decimal256 of precision 7. Max is 9999999", err.unwrap_err().to_string());
     }
 
+    /// helper function to test casting from duration to interval
+    fn cast_from_duration_to_interval<T: ArrowTemporalType>(
+        array: Vec<i64>,
+        cast_options: &CastOptions,
+    ) -> Result<PrimitiveArray<IntervalMonthDayNanoType>, ArrowError>
+    where
+        arrow_array::PrimitiveArray<T>: From<Vec<i64>>,
+    {
+        let array = PrimitiveArray::<T>::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array = cast_with_options(
+            &array,
+            &DataType::Interval(IntervalUnit::MonthDayNano),
+            cast_options,
+        )?;
+        casted_array
+            .as_any()
+            .downcast_ref::<IntervalMonthDayNanoArray>()
+            .ok_or_else(|| {
+                ArrowError::ComputeError(
+                    "Failed to downcast to IntervalMonthDayNanoArray".to_string(),
+                )
+            })
+            .cloned()
+    }
+
     #[test]
     fn test_cast_from_duration_to_interval() {
         // from duration second to interval month day nano
         let array = vec![1234567];
-        let array = DurationSecondArray::from(array);
-        let array = Arc::new(array) as ArrayRef;
-        let casted_array =
-            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
-        let casted_array = casted_array
-            .as_any()
-            .downcast_ref::<IntervalMonthDayNanoArray>()
-            .unwrap();
+        let casted_array = cast_from_duration_to_interval::<DurationSecondType>(
+            array,
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
         assert_eq!(
             casted_array.data_type(),
             &DataType::Interval(IntervalUnit::MonthDayNano)
         );
         assert_eq!(casted_array.value(0), 1234567000000000);
 
+        let array = vec![i64::MAX];
+        let casted_array = cast_from_duration_to_interval::<DurationSecondType>(
+            array,
+            &CastOptions { safe: false },
+        )
+        .unwrap();
+        assert_eq!(casted_array.value(0), 9223372036854775807000000000);
+
         // from duration millisecond to interval month day nano
         let array = vec![1234567];
-        let array = DurationMillisecondArray::from(array);
-        let array = Arc::new(array) as ArrayRef;
-        let casted_array =
-            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
-        let casted_array = casted_array
-            .as_any()
-            .downcast_ref::<IntervalMonthDayNanoArray>()
-            .unwrap();
+        let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
+            array,
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
         assert_eq!(
             casted_array.data_type(),
             &DataType::Interval(IntervalUnit::MonthDayNano)
         );
         assert_eq!(casted_array.value(0), 1234567000000);
 
+        let array = vec![i64::MAX];
+        let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
+            array,
+            &CastOptions { safe: false },
+        )
+        .unwrap();
+        assert_eq!(casted_array.value(0), 9223372036854775807000000);

Review Comment:
   So if I convert this value back:
   
   ```rust
       let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(9223372036854775807000000);
       println!("months: {months}, days: {days}, nanos: {nanos}");
   
   ```
   
   it prints out:
   
   ```
   months: 0, days: 499999, nanos: -1000000
   
   ```
   
   Which doesn't seem right for `9223372036854775807` milliseconds 
   
   I think what is happening can be explained here (the nanoseconds fields have overflowed into the days)
   
   https://docs.rs/arrow-array/37.0.0/src/arrow_array/types.rs.html#435-445
   
   ```
           /*
           https://github.com/apache/arrow/blob/02c8598d264c839a5b5cf3109bfd406f3b8a6ba5/cpp/src/arrow/type.h#L1475
           struct MonthDayNanos {
               int32_t months;
               int32_t days;
               int64_t nanoseconds;
           }
           128     112     96      80      64      48      32      16      0
           +-------+-------+-------+-------+-------+-------+-------+-------+
           |     months    |      days     |             nanos             |
           +-------+-------+-------+-------+-------+-------+-------+-------+
           */
   ```
   
   I think what should happen is 
   1. (preferred): the cast should return an error if the nanosecond field would have overflowed
   2. (secondarily): if the nanosecond field overflows the overflow should be converted into days
   
   I think erroring (at least at first) is preferred because it avoids questions like "how many seconds are there in a day" which is not actually a fixed quantity given leap seconds, etc.
   
   If someone has need of casting such durations to intervals, we can then sort out the mechanics



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] metesynnada commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "metesynnada (via GitHub)" <gi...@apache.org>.
metesynnada commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1161265013


##########
arrow-cast/src/cast.rs:
##########
@@ -458,6 +460,93 @@ where
     }
 }
 
+/// Cast the array from interval to duration
+fn cast_interval_to_duration<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<IntervalMonthDayNanoArray>()
+        .unwrap();
+
+    let mut builder = PrimitiveBuilder::<D>::new();
+
+    let scale = match D::DATA_TYPE {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {
+            let v = array.value(i) / scale;
+            if v > i64::MAX as i128 {
+                if cast_options.safe {
+                    builder.append_null();
+                } else {
+                    return Err(ArrowError::ComputeError(format!(
+                        "Cannot cast to {:?}. Overflowing on {:?}",
+                        D::DATA_TYPE,
+                        v
+                    )));
+                }
+            } else {
+                builder.append_value(v as i64);
+            }
+        }
+    }
+
+    Ok(Arc::new(builder.finish()) as ArrayRef)
+}
+
+/// Cast the array from duration and interval
+fn cast_duration_to_interval<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array.as_any().downcast_ref::<PrimitiveArray<D>>().unwrap();
+
+    let mut builder = IntervalMonthDayNanoBuilder::new();
+
+    let scale = match array.data_type() {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {

Review Comment:
   `PrimitiveArray::from_trusted_len_iter` may fit here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1167579187


##########
arrow-cast/src/cast.rs:
##########
@@ -458,6 +460,122 @@ where
     }
 }
 
+/// Cast the array from interval to duration
+fn cast_interval_to_duration<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<IntervalMonthDayNanoArray>()
+        .ok_or_else(|| {
+            ArrowError::ComputeError(
+                "Internal Error: Cannot cast interval to IntervalArray of expected type"
+                    .to_string(),
+            )
+        })?;
+
+    let scale = match D::DATA_TYPE {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    if cast_options.safe {
+        let iter = array.iter().map(|v| {
+            v.and_then(|v| {
+                let v = v / scale;
+                if v > i64::MAX as i128 {
+                    None
+                } else {
+                    Some(v as i64)
+                }
+            })
+        });
+        Ok(Arc::new(unsafe {
+            PrimitiveArray::<D>::from_trusted_len_iter(iter)
+        }))
+    } else {
+        let vec = array
+            .iter()
+            .map(|v| {
+                v.map(|v| {
+                    let v = v / scale;
+                    if v > i64::MAX as i128 {
+                        Err(ArrowError::ComputeError(format!(
+                            "Cannot cast to {:?}. Overflowing on {:?}",
+                            D::DATA_TYPE,
+                            v
+                        )))
+                    } else {
+                        Ok(v as i64)
+                    }
+                })
+                .transpose()
+            })
+            .collect::<Result<Vec<_>, _>>()?;
+        Ok(Arc::new(unsafe {
+            PrimitiveArray::<D>::from_trusted_len_iter(vec.iter())
+        }))
+    }
+}
+
+/// Cast the array from duration and interval
+fn cast_duration_to_interval<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<PrimitiveArray<D>>()
+        .ok_or_else(|| {
+            ArrowError::ComputeError(
+                "Internal Error: Cannot cast duration to DurationArray of expected type"
+                    .to_string(),
+            )
+        })?;
+
+    let scale = match array.data_type() {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    if cast_options.safe {
+        let iter = array
+            .iter()
+            .map(|v| v.and_then(|v| v.checked_mul(scale).map(|v| v as i128)));

Review Comment:
   I think we need to check the overflow in the safe case as well -- and and set the result to Null / NONE when overflow happened



##########
arrow-cast/src/cast.rs:
##########
@@ -8246,4 +8387,217 @@ mod tests {
         );
         assert_eq!("Invalid argument error: 1234567000 is too large to store in a Decimal256 of precision 7. Max is 9999999", err.unwrap_err().to_string());
     }
+
+    /// helper function to test casting from duration to interval
+    fn cast_from_duration_to_interval<T: ArrowTemporalType>(
+        array: Vec<i64>,
+        cast_options: &CastOptions,
+    ) -> Result<PrimitiveArray<IntervalMonthDayNanoType>, ArrowError>
+    where
+        arrow_array::PrimitiveArray<T>: From<Vec<i64>>,
+    {
+        let array = PrimitiveArray::<T>::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array = cast_with_options(
+            &array,
+            &DataType::Interval(IntervalUnit::MonthDayNano),
+            cast_options,
+        )?;
+        casted_array
+            .as_any()
+            .downcast_ref::<IntervalMonthDayNanoArray>()
+            .ok_or_else(|| {
+                ArrowError::ComputeError(
+                    "Failed to downcast to IntervalMonthDayNanoArray".to_string(),
+                )
+            })
+            .cloned()
+    }
+
+    #[test]
+    fn test_cast_from_duration_to_interval() {
+        // from duration second to interval month day nano
+        let array = vec![1234567];
+        let casted_array = cast_from_duration_to_interval::<DurationSecondType>(
+            array,
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Interval(IntervalUnit::MonthDayNano)
+        );
+        assert_eq!(casted_array.value(0), 1234567000000000);
+
+        let array = vec![i64::MAX];
+        let casted_array = cast_from_duration_to_interval::<DurationSecondType>(
+            array,
+            &CastOptions { safe: false },
+        );
+        assert!(casted_array.is_err());
+
+        // from duration millisecond to interval month day nano
+        let array = vec![1234567];
+        let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
+            array,
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Interval(IntervalUnit::MonthDayNano)
+        );
+        assert_eq!(casted_array.value(0), 1234567000000);
+
+        let array = vec![i64::MAX];
+        let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
+            array,
+            &CastOptions { safe: false },
+        );
+        assert!(casted_array.is_err());
+
+        // from duration microsecond to interval month day nano
+        let array = vec![1234567];
+        let casted_array = cast_from_duration_to_interval::<DurationMicrosecondType>(
+            array,
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Interval(IntervalUnit::MonthDayNano)
+        );
+        assert_eq!(casted_array.value(0), 1234567000);
+
+        let array = vec![i64::MAX];

Review Comment:
   Can you also please add a test here (and the cases below) showing what happens when an overflow happens with DEFAULT_CAST_OPTIONS? I would expect the result to be Null



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1158897023


##########
arrow-cast/src/cast.rs:
##########
@@ -458,6 +460,93 @@ where
     }
 }
 
+/// Cast the array from interval to duration
+fn cast_interval_to_duration<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<IntervalMonthDayNanoArray>()
+        .unwrap();
+
+    let mut builder = PrimitiveBuilder::<D>::new();
+
+    let scale = match D::DATA_TYPE {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {
+            let v = array.value(i) / scale;
+            if v > i64::MAX as i128 {
+                if cast_options.safe {
+                    builder.append_null();
+                } else {
+                    return Err(ArrowError::ComputeError(format!(
+                        "Cannot cast to {:?}. Overflowing on {:?}",
+                        D::DATA_TYPE,
+                        v
+                    )));
+                }
+            } else {
+                builder.append_value(v as i64);
+            }
+        }
+    }
+
+    Ok(Arc::new(builder.finish()) as ArrayRef)
+}
+
+/// Cast the array from duration and interval
+fn cast_duration_to_interval<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array.as_any().downcast_ref::<PrimitiveArray<D>>().unwrap();
+
+    let mut builder = IntervalMonthDayNanoBuilder::new();
+
+    let scale = match array.data_type() {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {
+            let v = array.value(i) as i128;
+            let v = v.mul_checked(scale);

Review Comment:
   For scale 1 case, maybe we can skip this call?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] metesynnada commented on pull request #4020: feat: cast from/to interval and duration

Posted by "metesynnada (via GitHub)" <gi...@apache.org>.
metesynnada commented on PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#issuecomment-1503613624

   LGTM, thanks for the effort.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] viirya commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "viirya (via GitHub)" <gi...@apache.org>.
viirya commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1158896572


##########
arrow-cast/src/cast.rs:
##########
@@ -458,6 +460,93 @@ where
     }
 }
 
+/// Cast the array from interval to duration
+fn cast_interval_to_duration<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<IntervalMonthDayNanoArray>()
+        .unwrap();
+
+    let mut builder = PrimitiveBuilder::<D>::new();
+
+    let scale = match D::DATA_TYPE {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {
+            let v = array.value(i) / scale;
+            if v > i64::MAX as i128 {
+                if cast_options.safe {
+                    builder.append_null();
+                } else {
+                    return Err(ArrowError::ComputeError(format!(
+                        "Cannot cast to {:?}. Overflowing on {:?}",
+                        D::DATA_TYPE,
+                        v
+                    )));
+                }
+            } else {
+                builder.append_value(v as i64);
+            }
+        }
+    }
+
+    Ok(Arc::new(builder.finish()) as ArrayRef)
+}
+
+/// Cast the array from duration and interval
+fn cast_duration_to_interval<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array.as_any().downcast_ref::<PrimitiveArray<D>>().unwrap();
+
+    let mut builder = IntervalMonthDayNanoBuilder::new();
+
+    let scale = match array.data_type() {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    for i in 0..array.len() {
+        if array.is_null(i) {
+            builder.append_null();
+        } else {

Review Comment:
   For array with no nulls, I think we can have a faster path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1167249827


##########
arrow-cast/src/cast.rs:
##########
@@ -458,6 +460,120 @@ where
     }
 }
 
+/// Cast the array from interval to duration
+fn cast_interval_to_duration<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<IntervalMonthDayNanoArray>()
+        .ok_or_else(|| {
+            ArrowError::ComputeError(
+                "Internal Error: Cannot cast interval to IntervalArray of expected type"
+                    .to_string(),
+            )
+        })?;
+
+    let scale = match D::DATA_TYPE {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    if cast_options.safe {
+        let iter = array.iter().map(|v| {
+            v.and_then(|v| {
+                let v = v / scale;
+                if v > i64::MAX as i128 {
+                    None
+                } else {
+                    Some(v as i64)
+                }
+            })
+        });
+        Ok(Arc::new(unsafe {
+            PrimitiveArray::<D>::from_trusted_len_iter(iter)
+        }))
+    } else {
+        let vec = array
+            .iter()
+            .map(|v| {
+                v.map(|v| {
+                    let v = v / scale;
+                    if v > i64::MAX as i128 {
+                        Err(ArrowError::ComputeError(format!(
+                            "Cannot cast to {:?}. Overflowing on {:?}",
+                            D::DATA_TYPE,
+                            v
+                        )))
+                    } else {
+                        Ok(v as i64)
+                    }
+                })
+                .transpose()
+            })
+            .collect::<Result<Vec<_>, _>>()?;
+        Ok(Arc::new(unsafe {
+            PrimitiveArray::<D>::from_trusted_len_iter(vec.iter())
+        }))
+    }
+}
+
+/// Cast the array from duration and interval
+fn cast_duration_to_interval<D: ArrowTemporalType<Native = i64>>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef, ArrowError> {
+    let array = array
+        .as_any()
+        .downcast_ref::<PrimitiveArray<D>>()
+        .ok_or_else(|| {
+            ArrowError::ComputeError(
+                "Internal Error: Cannot cast duration to DurationArray of expected type"
+                    .to_string(),
+            )
+        })?;
+
+    let scale = match array.data_type() {
+        DataType::Duration(TimeUnit::Second) => 1_000_000_000,
+        DataType::Duration(TimeUnit::Millisecond) => 1_000_000,
+        DataType::Duration(TimeUnit::Microsecond) => 1_000,
+        DataType::Duration(TimeUnit::Nanosecond) => 1,
+        _ => unreachable!(),
+    };
+
+    if cast_options.safe {
+        let iter = array
+            .iter()
+            .map(|v| v.and_then(|v| ((v as i128).checked_mul(scale))));

Review Comment:
   I am not sure this code correctly converts `v` into an `IntervalMonthDayNanoType` -- the `IntervalMonthDayNanoType` is represented as two fields but this is treating it as a single native type)
   
   I think we should construct IntervalMonthDayNano with this method:
   
   https://docs.rs/arrow/37.0.0/arrow/datatypes/struct.IntervalMonthDayNanoType.html#method.make_value
   
   So something like
   ```something
   let months = 0;
   let days = 0;
   let nanos: i64 = v.checked_mul(scale)?;
   Ok(IntervalMonthDayNanoType::make_value(months, days, nanos))
   ```
   
   



##########
arrow-cast/src/cast.rs:
##########
@@ -8246,4 +8385,137 @@ mod tests {
         );
         assert_eq!("Invalid argument error: 1234567000 is too large to store in a Decimal256 of precision 7. Max is 9999999", err.unwrap_err().to_string());
     }
+
+    #[test]
+    fn test_cast_from_duration_to_interval() {
+        // from duration second to interval month day nano
+        let array = vec![1234567];

Review Comment:
   Can you also please test a value if `i64::MAX` here and ensure an error is returned?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1167251885


##########
arrow-cast/src/cast.rs:
##########
@@ -8246,4 +8385,137 @@ mod tests {
         );
         assert_eq!("Invalid argument error: 1234567000 is too large to store in a Decimal256 of precision 7. Max is 9999999", err.unwrap_err().to_string());
     }
+
+    #[test]
+    fn test_cast_from_duration_to_interval() {
+        // from duration second to interval month day nano
+        let array = vec![1234567];
+        let array = DurationSecondArray::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array =
+            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
+        let casted_array = casted_array
+            .as_any()
+            .downcast_ref::<IntervalMonthDayNanoArray>()
+            .unwrap();
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Interval(IntervalUnit::MonthDayNano)
+        );
+        assert_eq!(casted_array.value(0), 1234567000000000);
+
+        // from duration millisecond to interval month day nano
+        let array = vec![1234567];
+        let array = DurationMillisecondArray::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array =
+            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
+        let casted_array = casted_array
+            .as_any()
+            .downcast_ref::<IntervalMonthDayNanoArray>()
+            .unwrap();
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Interval(IntervalUnit::MonthDayNano)
+        );
+        assert_eq!(casted_array.value(0), 1234567000000);
+
+        // from duration microsecond to interval month day nano
+        let array = vec![1234567];
+        let array = DurationMicrosecondArray::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array =
+            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
+        let casted_array = casted_array
+            .as_any()
+            .downcast_ref::<IntervalMonthDayNanoArray>()
+            .unwrap();
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Interval(IntervalUnit::MonthDayNano)
+        );
+        assert_eq!(casted_array.value(0), 1234567000);
+
+        // from duration nanosecond to interval month day nano
+        let array = vec![1234567];
+        let array = DurationNanosecondArray::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array =
+            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
+        let casted_array = casted_array
+            .as_any()
+            .downcast_ref::<IntervalMonthDayNanoArray>()
+            .unwrap();
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Interval(IntervalUnit::MonthDayNano)
+        );
+        assert_eq!(casted_array.value(0), 1234567);
+    }
+
+    #[test]
+    fn test_cast_from_interval_to_duration() {
+        // from interval month day nano to duration second
+        let array = vec![1234567];
+        let array = IntervalMonthDayNanoArray::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array = cast(&array, &DataType::Duration(TimeUnit::Second)).unwrap();
+        let casted_array = casted_array
+            .as_any()
+            .downcast_ref::<DurationSecondArray>()
+            .unwrap();
+
+        assert_eq!(
+            casted_array.data_type(),
+            &DataType::Duration(TimeUnit::Second)
+        );
+        assert_eq!(casted_array.value(0), 0);
+
+        // from interval month day nano to duration millisecond
+        let array = vec![1234567];

Review Comment:
   You could probably refactor this repetition into a function and avoid so much boiler plate in the tests



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#discussion_r1167485528


##########
arrow-cast/src/cast.rs:
##########
@@ -8386,136 +8386,219 @@ mod tests {
         assert_eq!("Invalid argument error: 1234567000 is too large to store in a Decimal256 of precision 7. Max is 9999999", err.unwrap_err().to_string());
     }
 
+    /// helper function to test casting from duration to interval
+    fn cast_from_duration_to_interval<T: ArrowTemporalType>(
+        array: Vec<i64>,
+        cast_options: &CastOptions,
+    ) -> Result<PrimitiveArray<IntervalMonthDayNanoType>, ArrowError>
+    where
+        arrow_array::PrimitiveArray<T>: From<Vec<i64>>,
+    {
+        let array = PrimitiveArray::<T>::from(array);
+        let array = Arc::new(array) as ArrayRef;
+        let casted_array = cast_with_options(
+            &array,
+            &DataType::Interval(IntervalUnit::MonthDayNano),
+            cast_options,
+        )?;
+        casted_array
+            .as_any()
+            .downcast_ref::<IntervalMonthDayNanoArray>()
+            .ok_or_else(|| {
+                ArrowError::ComputeError(
+                    "Failed to downcast to IntervalMonthDayNanoArray".to_string(),
+                )
+            })
+            .cloned()
+    }
+
     #[test]
     fn test_cast_from_duration_to_interval() {
         // from duration second to interval month day nano
         let array = vec![1234567];
-        let array = DurationSecondArray::from(array);
-        let array = Arc::new(array) as ArrayRef;
-        let casted_array =
-            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
-        let casted_array = casted_array
-            .as_any()
-            .downcast_ref::<IntervalMonthDayNanoArray>()
-            .unwrap();
+        let casted_array = cast_from_duration_to_interval::<DurationSecondType>(
+            array,
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
         assert_eq!(
             casted_array.data_type(),
             &DataType::Interval(IntervalUnit::MonthDayNano)
         );
         assert_eq!(casted_array.value(0), 1234567000000000);
 
+        let array = vec![i64::MAX];
+        let casted_array = cast_from_duration_to_interval::<DurationSecondType>(
+            array,
+            &CastOptions { safe: false },
+        )
+        .unwrap();
+        assert_eq!(casted_array.value(0), 9223372036854775807000000000);
+
         // from duration millisecond to interval month day nano
         let array = vec![1234567];
-        let array = DurationMillisecondArray::from(array);
-        let array = Arc::new(array) as ArrayRef;
-        let casted_array =
-            cast(&array, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
-        let casted_array = casted_array
-            .as_any()
-            .downcast_ref::<IntervalMonthDayNanoArray>()
-            .unwrap();
+        let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
+            array,
+            &DEFAULT_CAST_OPTIONS,
+        )
+        .unwrap();
         assert_eq!(
             casted_array.data_type(),
             &DataType::Interval(IntervalUnit::MonthDayNano)
         );
         assert_eq!(casted_array.value(0), 1234567000000);
 
+        let array = vec![i64::MAX];
+        let casted_array = cast_from_duration_to_interval::<DurationMillisecondType>(
+            array,
+            &CastOptions { safe: false },
+        )
+        .unwrap();
+        assert_eq!(casted_array.value(0), 9223372036854775807000000);

Review Comment:
   So if I convert this value back into fields:
   
   ```rust
       let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(9223372036854775807000000);
       println!("months: {months}, days: {days}, nanos: {nanos}");
   
   ```
   
   it prints out:
   
   ```
   months: 0, days: 499999, nanos: -1000000
   
   ```
   
   Which doesn't seem right for `9223372036854775807` milliseconds 
   
   I think what is happening can be explained here (the nanoseconds fields have overflowed into the days)
   
   https://docs.rs/arrow-array/37.0.0/src/arrow_array/types.rs.html#435-445
   
   ```
           /*
           https://github.com/apache/arrow/blob/02c8598d264c839a5b5cf3109bfd406f3b8a6ba5/cpp/src/arrow/type.h#L1475
           struct MonthDayNanos {
               int32_t months;
               int32_t days;
               int64_t nanoseconds;
           }
           128     112     96      80      64      48      32      16      0
           +-------+-------+-------+-------+-------+-------+-------+-------+
           |     months    |      days     |             nanos             |
           +-------+-------+-------+-------+-------+-------+-------+-------+
           */
   ```
   
   I think what should happen is 
   1. (preferred): the cast should return an error if the nanosecond field would have overflowed
   2. (secondarily): if the nanosecond field overflows the overflow should be converted into days
   
   I think erroring (at least at first) is preferred because it avoids questions like "how many seconds are there in a day" which is not actually a fixed quantity given leap seconds, etc.
   
   If someone has need of casting such durations to intervals, we can then sort out the mechanics



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020#issuecomment-1509754962

   Thanks @Weijun-H  -- I think this is getting close. From my perspecitive all that is left is handling nanosecond overflow correctly and this will be good to go


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb merged pull request #4020: feat: cast from/to interval and duration

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #4020:
URL: https://github.com/apache/arrow-rs/pull/4020


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org