You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/27 13:53:11 UTC

[GitHub] [arrow-rs] alamb opened a new issue, #3959: Support casting to/from different interval units (`YearDay` --> `MonthDayNano`) etc

alamb opened a new issue, #3959:
URL: https://github.com/apache/arrow-rs/issues/3959

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   In DataFusion we are trying to add interval support, including from data that comes from non-SQL sources. This data can have Intervals of type:
   
   ```
   DataType::Interval(IntervalUnit::YearMonth)
   DataType::Interval(IntervalUnit::DayTime)
   DataType::Interval(IntervalUnit::MonthDayNano)
   ```
   
   However, there is no easy way currently to convert between such types
   
   **Describe the solution you'd like**
   Add support for `cast`ing intervals to cast kernel : https://github.com/apache/arrow-rs/blob/master/arrow-cast/src/cast.rs#L18-L36
   
   The following casts should be always supported as they are lossless
   * `Interval(YearMonth)` -> `DataType::Interval(MonthDayNano)`
   * `Interval(DayTime)` -> `DataType::Interval(MonthDayNano)`
   
   These casts should not be supported as the data ranges are different
   * `Interval(YearMonth)` -> `Interval(DayTime)`
   * `Interval(DayTime)` -> `Interval(YearMonth)`
   
   These casts should behave like the other arithmetic kernels
   * `DataType::Interval(MonthDayNano)` -> `Interval(YearMonth)`
   * `DataType::Interval(MonthDayNano)` -> `Interval(DayTime)`
   
   
   
   Specifically, they should follow the meaning of `safe` in https://docs.rs/arrow/35.0.0/arrow/compute/struct.CastOptions.html#structfield.safe
   
   
   
   
   
   
   
   
   **Describe alternatives you've considered**
   <!--
   A clear and concise description of any alternative solutions or features you've considered.
   -->
   
   **Additional context**
   
   # Example Interval casts
   
   ```rust
   fn cast_interval_units() {
       // want to be able to cast to/from different interval units
       let interval_year_month = IntervalYearMonthArray::from(vec![
           // 1 year 5 months
           Some(IntervalYearMonthType::make_value(1, 5)),
           None,
       ]);
   
       // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CastError("Casting from Interval(YearMonth) to Interval(MonthDayNano) not supported")', src/main.rs:55:112
       //let interval_month_day_nanos = cast(&interval_year_month, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
   
       let interval_day_time = IntervalDayTimeArray::from(vec![
           // 2 days 7 milliseconds
           Some(IntervalDayTimeType::make_value(2, 7)),
           None,
       ]);
   
       // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CastError("Casting from Interval(DayTime) to Interval(MonthDayNano) not supported")', src/main.rs:65:110
       // let interval_month_day_nanos = cast(&interval_day_time, &DataType::Interval(IntervalUnit::MonthDayNano)).unwrap();
   
   
       // Somewhat trickier is how to go from MonthDayNano back to lower precision intervals:
       let interval_month_day_nano = IntervalMonthDayNanoArray::from(vec![
           // 1 month 5 days 0 nanoseconds
           // (could losslessly cast to Interval(YearMonth) but not Interval(DayTime)
           Some(IntervalMonthDayNanoType::make_value(1, 5, 0)),
   
           // 0 months 2 days and 7 milliseconds
           // (could losslessly cast to Interval(DayTime) but not Interval(MonthDay)
           Some(IntervalMonthDayNanoType::make_value(0, 2, 7 * 1_000_000)),
   
   
           // 0 months 2 days and 8 nanosecond
           // (can not be losslessly cast to either Interval(DayTime) or Interval(MonthDay)
           Some(IntervalMonthDayNanoType::make_value(0, 2, 7)),
           None,
       ]);
   
   
   ```
   
   
   # Example integer casts
   
   ```rust
   fn cast_ints() {
       let arr = Int32Array::from(vec![1,400]);
       let out = cast(&arr, &DataType::UInt8).unwrap();
   
       println!(
           "output:\n{}",
           pretty_format_columns("out", &[out]).unwrap()
       );
   
       let options = CastOptions {
           safe: false
       };
   
       // thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: CastError("Can't cast value 400 to type UInt8")', src/main.rs:61:67
       // let out = cast_with_options(&arr, &DataType::UInt8, &options).unwrap();
   
       // println!(
       //     "output:\n{}",
       //     pretty_format_columns("out", &[out]).unwrap()
       // );
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1485163733

   > These casts should behave like the other arithmetic kernels
   
   I think they should behave the same as the current timestamp casts, which I believe allow truncation but error on overflow (this needs to be verified)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3959: Support casting to/from different interval units (`YearDay` --> `MonthDayNano`) etc

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1485142802

   FYI @tustvold I believe this is consistent with what we discussed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold commented on issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1563297206

   > any thoughts?
   
   I don't think anything to do with intervals is a good first issue :sweat_smile: As for what the semantics for casting between intervals should be, I seem to remember postgres assumes a month has 30 days or something, but whoever picks up this ticket would likely need to do some research and use this to build consensus on the correct approach.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1563289352

   🤔  maybe not such a good first issue (unless there is a clear semantic) -- @tustvold  any thoughts?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] tustvold closed issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold closed issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)
URL: https://github.com/apache/arrow-rs/issues/3959


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1485640880

   I think this is a good first issue because the semantics are well defined and there are some existing examples


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] 2010YOUY01 commented on issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "2010YOUY01 (via GitHub)" <gi...@apache.org>.
2010YOUY01 commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1562003684

   The independence of fields inside `Interval` type makes it really tricky 🤔 
   Suppose a `DataType::Interval(IntervalUnit::MonthDayNano)` has value "1 months 40 days 0 nanos", and it's cast to `DataType::Interval(IntervalUnit::YearMonth)`, should it be truncated to "1 months" or "2 months"


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1485165085

   > I think they should behave the same as the current timestamp casts, which I believe allow truncation but error on overflow (this needs to be verified)
   
   I will verify


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] mr-brobot commented on issue #3959: Support casting to/from different interval units (eg `YearDay` --> `MonthDayNano`, etc)

Posted by "mr-brobot (via GitHub)" <gi...@apache.org>.
mr-brobot commented on issue #3959:
URL: https://github.com/apache/arrow-rs/issues/3959#issuecomment-1575683148

   @alamb @tustvold Heads up I think this might be closed by #4182 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org