You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/01 17:39:40 UTC

[GitHub] [arrow-rs] bjchambers opened a new pull request #651: rough draft of time arithmetic

bjchambers opened a new pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651


   # Which issue does this PR close?
   
   Closes 527.
   
   # What changes are included in this PR?
   
   Adds an `ArrowTimeDeltaType` for primitive types that can be added to time values. These allow writing the generic `add_time` kernel in a straightforward way.
   
   The `add_time` kernel for time arithmetic.
   
   # Are there any user-facing changes?
   
   Adds the `add_time` kernel for time arithmetic.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb closed pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#issuecomment-893080333


   Thanks @bjchambers  -- I hope to check this out tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] bjchambers commented on pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
bjchambers commented on pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#issuecomment-1001649891


   Apologies for also losing the thread on this one. I'm not sure where we want to take this. The main thread of discussion above seems to be on whether (and how) we want to support arithmetic on all types.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#discussion_r683376794



##########
File path: arrow/Cargo.toml
##########
@@ -50,6 +50,7 @@ regex = "1.3"
 lazy_static = "1.4"
 packed_simd = { version = "0.3", optional = true, package = "packed_simd_2" }
 chrono = "0.4"
+chronoutil = "0.2"

Review comment:
       since `chronoutil` appears to only used in tests, perhaps we could put it in `dev-dependencies` instead

##########
File path: arrow/src/compute/kernels/temporal.rs
##########
@@ -166,8 +168,62 @@ where
     Ok(b.finish())
 }
 
+/// Add the given `time_delta` to each time in the `date_time` array.
+///
+/// # Errors
+/// If the arrays of different lengths.
+pub fn add_time<DateTime, TimeDelta>(
+    date_time: &PrimitiveArray<DateTime>,
+    time_delta: &PrimitiveArray<TimeDelta>,
+) -> Result<PrimitiveArray<DateTime>>
+where
+    DateTime: ArrowTimestampType,

Review comment:
       `ArrowTimestampType` type seems to only allow timestamp calculations, not for example adding a `Date32` and a `Duration`
   
   https://docs.rs/arrow/5.1.0/arrow/datatypes/trait.ArrowTimestampType.html 
   
   I wonder if it would make sense to define a `ArrowDateTimeType` or something that was implemented for all the timestamp types as well as `Date32`, `Time32`, etc and then use `ArrowDateTimeType` instead of `ArrowTimestampType` here

##########
File path: arrow/src/datatypes/types.rs
##########
@@ -161,25 +169,106 @@ impl ArrowTemporalType for DurationNanosecondType {}
 pub trait ArrowTimestampType: ArrowTemporalType {
     /// Returns the `TimeUnit` of this timestamp.
     fn get_time_unit() -> TimeUnit;
+
+    /// Return the `chrono::NavieDateTime` from the given value.
+    fn to_datetime(value: Self::Native) -> NaiveDateTime;
+
+    fn from_datetime(datetime: NaiveDateTime) -> Self::Native;
 }
 
 impl ArrowTimestampType for TimestampSecondType {
     fn get_time_unit() -> TimeUnit {
         TimeUnit::Second
     }
+
+    fn to_datetime(value: Self::Native) -> NaiveDateTime {
+        timestamp_s_to_datetime(value)
+    }
+
+    fn from_datetime(datetime: NaiveDateTime) -> Self::Native {
+        datetime.timestamp()
+    }
 }
 impl ArrowTimestampType for TimestampMillisecondType {
     fn get_time_unit() -> TimeUnit {
         TimeUnit::Millisecond
     }
+
+    fn to_datetime(value: Self::Native) -> NaiveDateTime {
+        timestamp_ms_to_datetime(value)
+    }
+
+    fn from_datetime(datetime: NaiveDateTime) -> Self::Native {
+        datetime.timestamp_millis()
+    }
 }
 impl ArrowTimestampType for TimestampMicrosecondType {
     fn get_time_unit() -> TimeUnit {
         TimeUnit::Microsecond
     }
+
+    fn to_datetime(value: Self::Native) -> NaiveDateTime {
+        timestamp_us_to_datetime(value)
+    }
+
+    fn from_datetime(datetime: NaiveDateTime) -> Self::Native {
+        datetime.timestamp_nanos() / 1000
+    }
 }
 impl ArrowTimestampType for TimestampNanosecondType {
     fn get_time_unit() -> TimeUnit {
         TimeUnit::Nanosecond
     }
+
+    fn to_datetime(value: Self::Native) -> NaiveDateTime {
+        timestamp_ns_to_datetime(value)
+    }
+
+    fn from_datetime(datetime: NaiveDateTime) -> Self::Native {
+        datetime.timestamp_nanos()
+    }
+}
+
+pub trait ArrowTimeDeltaType: ArrowPrimitiveType {

Review comment:
       👍 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on a change in pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb commented on a change in pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#discussion_r776704911



##########
File path: arrow/src/compute/kernels/temporal.rs
##########
@@ -166,8 +168,62 @@ where
     Ok(b.finish())
 }
 
+/// Add the given `time_delta` to each time in the `date_time` array.
+///
+/// # Errors
+/// If the arrays of different lengths.
+pub fn add_time<DateTime, TimeDelta>(
+    date_time: &PrimitiveArray<DateTime>,
+    time_delta: &PrimitiveArray<TimeDelta>,
+) -> Result<PrimitiveArray<DateTime>>
+where
+    DateTime: ArrowTimestampType,

Review comment:
       > The alternative would be to have fewer limitations at the type level and instead make it a runtime error to pass an invalid time type
   
   This is my personal preference. The rationale being:
   1. Any code that uses this will likely be dynamically switching on types anyways (as the kernels would get `&dyn Array`)
   2. The overhead of a runtime check once per pair of arrays is likely quite small compared to the actual calculation being performed
   3. Encoding logic in trait bounds, while powerful, makes the code harder to use, especially for those relatively new to Rust (see, for example, [`tower::Service`](https://docs.rs/tower/0.4.11/tower/trait.Service.html)) -- given the runtime overhead is likely so small, I don't think the complexity is worth it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] houqp commented on a change in pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
houqp commented on a change in pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#discussion_r685619334



##########
File path: arrow/src/compute/kernels/temporal.rs
##########
@@ -166,8 +168,62 @@ where
     Ok(b.finish())
 }
 
+/// Add the given `time_delta` to each time in the `date_time` array.
+///
+/// # Errors
+/// If the arrays of different lengths.
+pub fn add_time<DateTime, TimeDelta>(
+    date_time: &PrimitiveArray<DateTime>,
+    time_delta: &PrimitiveArray<TimeDelta>,
+) -> Result<PrimitiveArray<DateTime>>
+where
+    DateTime: ArrowTimestampType,

Review comment:
       +1 for a more generic `ArrowTimestampType` trait.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#issuecomment-893080333


   Thanks @bjchambers  -- I hope to check this out tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] codecov-commenter commented on pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#issuecomment-890682046


   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/651?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#651](https://codecov.io/gh/apache/arrow-rs/pull/651?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (e670b4a) into [master](https://codecov.io/gh/apache/arrow-rs/commit/b38a4b6c29ba8b9be02460183c61de86bd9ba7df?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (b38a4b6) will **decrease** coverage by `0.01%`.
   > The diff coverage is `68.11%`.
   
   > :exclamation: Current head e670b4a differs from pull request most recent head e1a757b. Consider uploading reports for the commit e1a757b to get more accurate results
   [![Impacted file tree graph](https://codecov.io/gh/apache/arrow-rs/pull/651/graphs/tree.svg?width=650&height=150&src=pr&token=pq9V9qWZ1N&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)](https://codecov.io/gh/apache/arrow-rs/pull/651?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   
   ```diff
   @@            Coverage Diff             @@
   ##           master     #651      +/-   ##
   ==========================================
   - Coverage   82.50%   82.48%   -0.02%     
   ==========================================
     Files         168      168              
     Lines       47237    47306      +69     
   ==========================================
   + Hits        38971    39019      +48     
   - Misses       8266     8287      +21     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/651?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/datatypes/types.rs](https://codecov.io/gh/apache/arrow-rs/pull/651/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2RhdGF0eXBlcy90eXBlcy5ycw==) | `46.15% <33.33%> (-42.74%)` | :arrow_down: |
   | [arrow/src/compute/kernels/temporal.rs](https://codecov.io/gh/apache/arrow-rs/pull/651/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy90ZW1wb3JhbC5ycw==) | `90.36% <94.87%> (+1.38%)` | :arrow_up: |
   | [arrow/src/array/transform/boolean.rs](https://codecov.io/gh/apache/arrow-rs/pull/651/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L3RyYW5zZm9ybS9ib29sZWFuLnJz) | `84.61% <0.00%> (+7.69%)` | :arrow_up: |
   
   ------
   
   [Continue to review full report at Codecov](https://codecov.io/gh/apache/arrow-rs/pull/651?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation)
   > `Δ = absolute <relative> (impact)`, `ø = not affected`, `? = missing data`
   > Powered by [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/651?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Last update [b38a4b6...e1a757b](https://codecov.io/gh/apache/arrow-rs/pull/651?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] bjchambers commented on a change in pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
bjchambers commented on a change in pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#discussion_r686454285



##########
File path: arrow/Cargo.toml
##########
@@ -50,6 +50,7 @@ regex = "1.3"
 lazy_static = "1.4"
 packed_simd = { version = "0.3", optional = true, package = "packed_simd_2" }
 chrono = "0.4"
+chronoutil = "0.2"

Review comment:
       `chronoutil` is used in the implementation of `ArrowTimeDeltaType` to perform arithmetic involving relative time units such as months/days/etc.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#issuecomment-1020465460


   Closing this one down to keep the review list clean. Please reopen if that is a mistake


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb closed pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb closed pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#issuecomment-1020465460


   Closing this one down to keep the review list clean. Please reopen if that is a mistake


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] alamb commented on pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#issuecomment-998101046


   Marking PRs that are over a month old as stale -- please let us know if there is additional work planned or if we should close them. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-rs] bjchambers commented on a change in pull request #651: rough draft of time arithmetic

Posted by GitBox <gi...@apache.org>.
bjchambers commented on a change in pull request #651:
URL: https://github.com/apache/arrow-rs/pull/651#discussion_r686455865



##########
File path: arrow/src/compute/kernels/temporal.rs
##########
@@ -166,8 +168,62 @@ where
     Ok(b.finish())
 }
 
+/// Add the given `time_delta` to each time in the `date_time` array.
+///
+/// # Errors
+/// If the arrays of different lengths.
+pub fn add_time<DateTime, TimeDelta>(
+    date_time: &PrimitiveArray<DateTime>,
+    time_delta: &PrimitiveArray<TimeDelta>,
+) -> Result<PrimitiveArray<DateTime>>
+where
+    DateTime: ArrowTimestampType,

Review comment:
       One potential issue there is that you can't necessarily add months to a `Time32`, since the latter doesn't have a date component. It would be possible to express that as a trait, but it would be more complex. For instance:
   
   ```
   trait ArrowTimeDeltaType<T: ArrowTemporalType> {
     fn add_time(temporal: T::Native, delta: Self::Native) -> T::Native;
   }
   
   impl ArrowTimeDeltaType<Date32> for IntervalYearMonthType {
     ...
   }
   
   ...
   ```
   
   Specifically, instead of having the trait allow adding the time delta to *anything* that is a `DateTime`, it could have an additional parameter indicating what it can be added to. This would require more implementation blocks, would express what kinds of arithmetic were supported.
   
   Eg., we could have `impl ArrowTimeDeltaType<Time32> for DurationSecondType { ... }` but we wouldn't have one for `IntervalYearMonthType`.
   
   Does that level of encoding the supported operations in the trait make sense? The alternative would be to have fewer limitations at the type level and instead make it a runtime error to pass an invalid time type. In that world, the delta could have two functions -- one for adding to a `DateTime` and one for adding to a `Time`, and could choose to throw an exception if it wasn't supported on `Time`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org