You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/06 13:21:49 UTC

[GitHub] [arrow-rs] alamb opened a new issue #527: Add temporal kernels for arithmetic with timestamps and durations

alamb opened a new issue #527:
URL: https://github.com/apache/arrow-rs/issues/527


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   I am trying to arithmetically create date / time columns. In my case I want to add some duration (7 days) to a Date32 column 
   
   **Describe the solution you'd like**
   
   I would like temporal kernels that allowed operations such as:
   
   TimestampNanosecondArray + Duration --> TimestampNanosecondArray
   Date32Array + Duration --> Date32Array
   etc.
   
   I would like to be able to do something like
   ```rust
       let date_strings = vec![
           Some("2020-01-01"),
           Some("2020-01-02"),
           Some("2020-01-03"),
           None,
           Some("2020-01-04"),
       ];
       let arr_date32 = arrow::compute::cast(&arr_string, &DataType::Date32).expect("casting to date");
   
       // note there is no such thing as DurationArray at this time
       let offset_seconds = DurationArray::from_days(vec![1, 2, 3, 4, 5]);
       let arr_date32_offset = compute::kernels::add(arr_date32, offset_seconds)
   
       // expect the output is 
       // 2021-01-02
       // 2021-01-04
       // 2021-01-06
       // None
       // 2021-01-08
   ```
   
   **Describe alternatives you've considered**
   It might be nice to extend the `arithmetic` kernels to take `Arc<dyn Array>` rather than the `PrimitiveArray` as they do now and do the casting / calling specific implementations directly. 
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880628261


   > Or at least, leveraging existing math kernels would make a ton of sense.
   
   I agree -- it may be that since the date/time arrays are `PrimitiveArrays` it might already work. But I remember problems when I tried.
   
   My selfish initial desire is to be able to write a filter like "the last 5 minutes" in DataFusion so something like
   
   ```sql
   SELECT ...
   WHERE
      timestamp_column > (now() - interval `5 minutes')
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-885826018


   > From an implementation, I think we can do this using chrono and chronoutil. Something like:
   
   That looks like a great idea to me. Nice


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] bjchambers commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
bjchambers commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-883716820


   I have some kernels doing time arithmetic for a personal project. I think it is helpful to consider more than just durations since, those are the easiest case. Specifically, we should also be able to add intervals to timestamps.
   
   I think it may be difficult to do these using existing arithmetic kernels. For the case of fixed durations (5000 seconds) it may be possible, but when handling intervals we'll need special kernels (eg., add 5 calendar days). One possible interface would be something like `add_time<Time, Delta>(time: &PrimitiveArray<Time>, delta: &PrimitiveArray<Delta>) -> PrimitiveArray<Time>`, for some traits that define temporal types and deltas.
   
   Then we can use `chrono` and `chronoutil` to support adding a duration or relative duration by having implementations of something like an `ArrowTimeDelta` trait for the data types that represent fixed or relative durations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] bjchambers commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
bjchambers commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-890560059


   Added a rough draft. The temporal values are a bit odd -- it looks like there is `as_datetime` but that doesn't respect the timestamp of the input, and seems like it would be doing a match inside the loop, so I added a method to `ArrowTimestampType`. Not sure if there is a better way to operate on the date time -- maybe a method that returns a closure that gets inlined away?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] Jimexist commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
Jimexist commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880028583


   anyone taking this? also is there any reference implementation in other languages?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] bjchambers commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
bjchambers commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-885250399


   I don't think that entirely works for two reasons -- first, there are actually multiple `Duration` types (in Rust) for the same `DataType`. Second, the delta may also be an interval. Specifically, it needs to be constrained in a way that supports the following arrays:
   
   * `PrimitiveArray<DurationSecondType>` - add a fixed number of seconds
   * `PrimitiveArray<DurationMicrosecondType>` - add a fixed number of microseconds
   * `PrimitiveArray<DurationMillisecondType>` - add a fixed number of milliseconds
   * `PrimitiveArray<DurationNanosecondType>` - add a fixed number of nanoseconds
   * `PrimitiveArray<IntervalDayTimeType>` - add a number of calendar days (and seconds) (description says this is a pair of i32 packed into an i64)
   * `PrimitiveArray<IntervalYearMonthType>` - add a number of calendar months
   
   From an implementation, I think we can do this using `chrono` and `chronoutil`. Something like:
   
   ```
   pub trait ArrowTimeDelta: ArrowPrimitiveType {
       fn add_to(time: NaiveDateTime, value: Self::Native) -> NaiveDateTime;
   }
   
   impl ArrowTimeDelta<RelativeDuration> for IntervalDayTimeType {
       fn get_delta(value: Self::Native) -> RelativeDuration {
           // DayTime is represented as a 64 bit value -- 32 bit day and 32 bit seconds.
           let days = RelativeDuration::days(value >> 32);
           let seconds = RelativeDuration::seconds((value as i32) as i64);
           days + seconds
       }
   }
   
   impl ArrowTimeDelta for IntervalYearMonthType {
       fn add_to(time: NaiveDateTime, value: Self::Native) -> NaiveDateTime {
           // YearMonth is represented as a 32 bit value containing the number of months.
           time + RelativeDuration::months(value)
       }
   }
   
   impl ArrowTimeDelta for DurationSecondType {
       fn add_to(time: NaiveDateTime, value: Self::Native) -> NaiveDateTime {
           time + Duration::seconds(value)
       }
   }
   
   impl ArrowTimeDelta for DurationMillisecondType {
       fn add_to(time: NaiveDateTime, value: Self::Native) -> NaiveDateTime {
           time + Duration::milliseconds(value)
       }
   }
   
   impl ArrowTimeDelta for DurationMicrosecondType {
       fn add_to(time: NaiveDateTime, value: Self::Native) -> NaiveDateTime {
           time + Duration::microseconds(value)
       }
   }
   
   impl ArrowTimeDelta for DurationNanosecondType {
       fn add_to(time: NaiveDateTime, value: Self::Native) -> NaiveDateTime {
           time + Duration::nanoseconds(value)
       }
   }
   ```
   
   Then the implementation is basically iterating over a pair of arrays and callling `add_to`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-874755310


   FYI @velvia 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880175659


   @Jimexist  no one that I know of is taking this on. If you were interested that would be awesome
   
   The reference implementation would probably still be postgres in my mind. There is a nice table https://www.postgresql.org/docs/current/functions-datetime.html#OPERATORS-DATETIME-TABLE that describes all the operators, input types, and output types. I think it helps because certain combinations, such as `date + date` don't really make logical sense so having the reference to guide what is needed would help a lot
   
   @velvia / @andygrove / @jorgecarleitao  any thoughts?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880667448


   > I would say that we need to generalize the compute arithmetic kernels to support &dyn Array and then specialize based on DataType or bail if the operation is not valid for that datatype.
   
   I very much support this notion -- I have filed https://github.com/apache/arrow-rs/issues/558 to track it.
   
   I can take a crack at the proposed interface if this seems reasonable to people


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] jorgecarleitao commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880649326


   I would say that we need to generalize the compute arithmetic kernels to support `&dyn Array` and then specialize based on `DataType` or bail if the operation is not valid for that datatype.
   
   We do this in arrow2 thanks to @elferherrera, here: https://github.com/jorgecarleitao/arrow2/blob/main/src/compute/arithmetics/mod.rs#L126


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] velvia commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
velvia commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880418323


   This is definitely a good feature.
   
   It seems to me that in most cases, that time/duration manipulations could be based on existing arithmetic kernels, that'd be one way to go about it.
   ie some combination of cast(date32 -> i32) -> math add kernel -> cast back for example.
   
   Or at least, leveraging existing math kernels would make a ton of sense.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-885218175


   Yeah, handling intervals will take some finagling / handling intervals that aren't fixed numbers of seconds.
   
   However, I think the type system can represent such durations (using a `DataType::Duration`: https://github.com/apache/arrow-rs/blob/master/arrow/src/datatypes/datatype.rs#L97) so we can perhaps use `PrimitiveArray<Duration>` when you refer to `Delta` above


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] nevi-me commented on issue #527: Add temporal kernels for arithmetic with timestamps and durations

Posted by GitBox <gi...@apache.org>.
nevi-me commented on issue #527:
URL: https://github.com/apache/arrow-rs/issues/527#issuecomment-880177931


   this also relates to #45 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org