You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/08/01 03:23:37 UTC

[GitHub] [arrow-rs] stuartcarnie opened a new pull request, #2251: feat: Implement string cast operations for Time32 and Time64

stuartcarnie opened a new pull request, #2251:
URL: https://github.com/apache/arrow-rs/pull/2251

   # Which issue does this PR close?
   
   Closes #2053 and helps apache/arrow-datafusion#2883.
   
   # Rationale for this change
    
   N/A
   
   # What changes are included in this PR?
   
   Implements cast operations following precedence of existing implementations.
   
   # Are there any user-facing changes?
   
   The `cast` API now supports string -> Time32 and Time64 transformations.
   
   <!---
   If there are any breaking changes to public APIs, please add the `breaking change` label.
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] ursabot commented on pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#issuecomment-1202405148

   Benchmark runs are scheduled for baseline = ed9fc565f6b3bf0653ff342b523dbd1e2192d847 and contender = 9a4b1c99d7e5a3bd3c6e3bce3ba0ee154720827f. 9a4b1c99d7e5a3bd3c6e3bce3ba0ee154720827f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/ac0c9cdfc82847d5b5f665cd8f77f0d7...28dfc479a1874c52b89c795d663fe175/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/653461bc5deb4f5399cb9fbd6b4c604e...dd374a22c18a40758a962ed5fb674406/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/a9693d1879a24ed1933d45ad76e0c810...c69e12537a4847ffa9555734805745af/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/da74caae5796421eaccb66e6009fb203...347b98834e444b64ae266ca03dec564a/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb merged pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934903273


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };

Review Comment:
   It took me a while to grok this -- I think it is leap second handling
   https://docs.rs/chrono/0.4.19/chrono/trait.Timelike.html#tymethod.nanosecond
   
   Maybe we could add a comment explaining what was going on
   
   ```suggestion
       // handle leap second
       // see https://docs.rs/chrono/0.4.19/chrono/trait.Timelike.html#tymethod.nanosecond
       let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
   ```



##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {

Review Comment:
   https://github.com/apache/arrow-rs/blob/master/arrow/src/temporal_conversions.rs may be a good place to put these functions too (so they have a chance of being found / reused)



##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -136,9 +137,25 @@ pub fn can_cast_types(from_type: &DataType, to_type: &DataType) -> bool {
 
         (Utf8, LargeUtf8) => true,
         (LargeUtf8, Utf8) => true,
-        (Utf8, Date32 | Date64 | Timestamp(TimeUnit::Nanosecond, None)) => true,
+        (Utf8,
+            Date32
+            | Date64
+            | Time32(TimeUnit::Second)

Review Comment:
   ❤️ 



##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32

Review Comment:
   There is probably a good reason, but my feeble mind can't figure it out. Why is it important to break up `frac` and `adjust`? 
   
   For example, isn't this equivalent?
   
   ```suggestion
       (sec + frac / NANOS_PER_MILLI) as i32
   ```



##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });
+
+        // Benefit:
+        //     20% performance improvement
+        // Soundness:
+        //     The iterator is trustedLen because it comes from an `StringArray`.
+        unsafe { Time32SecondArray::from_trusted_len_iter(iter) }
+    } else {
+        let vec = (0..string_array.len())
+            .map(|i| {
+                if string_array.is_null(i) {
+                    Ok(None)
+                } else {
+                    let string = string_array
+                        .value(i);
+                    chrono::Duration::days(3);

Review Comment:
   The
   ```
                       chrono::Duration::days(3);
   ```
   
   Seems like a leftover?
   



##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   I think the same type of transformation can be applied to the iterator below this as well. 



##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   I think this will be faster as it will not require checking bounds for calls to `is_null` or `value`:
   
   ```suggestion
           let iter = string_array
               .iter()
               .flat_map(|v| {
                   v.map(|v| {
                       v.parse::<chrono::NaiveTime>()
                           .map(|time| seconds_since_midnight(&time))
                           .ok()
                   })
               });
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934943090


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   Sounds good – it looks like the string to date transformation functions would benefit from the same treatment, but I'll leave that to another PR so as to not cloud this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934934282


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };

Review Comment:
   Ah, it can read unix TZif files to get the historical data https://github.com/chronotope/chrono/blob/ab688c384f7c797466fca69d69bfd5ee3c2cba96/src/offset/local/tz_info/timezone.rs#L662
   
   https://manpages.ubuntu.com/manpages/xenial/man5/tzfile.5.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934921087


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };

Review Comment:
   Chrono docs https://docs.rs/chrono/0.3.1/chrono/naive/time/index.html



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934943090


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   Sounds good – it looks like the string to date transformation functions would benefit from the same treatment, but I'll leave that to another PR so as to not cloudy this one.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] codecov-commenter commented on pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#issuecomment-1201799597

   # [Codecov](https://codecov.io/gh/apache/arrow-rs/pull/2251?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2251](https://codecov.io/gh/apache/arrow-rs/pull/2251?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (7340b19) into [master](https://codecov.io/gh/apache/arrow-rs/commit/3032a521c9691d4569a9d277046304bd4e4098fb?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (3032a52) will **decrease** coverage by `0.03%`.
   > The diff coverage is `73.44%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #2251      +/-   ##
   ==========================================
   - Coverage   82.29%   82.26%   -0.04%     
   ==========================================
     Files         243      245       +2     
     Lines       62443    62863     +420     
   ==========================================
   + Hits        51387    51713     +326     
   - Misses      11056    11150      +94     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-rs/pull/2251?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [arrow/src/compute/kernels/cast.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2NvbXB1dGUva2VybmVscy9jYXN0LnJz) | `94.12% <73.44%> (-1.71%)` | :arrow_down: |
   | [...row/src/array/builder/string\_dictionary\_builder.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2J1aWxkZXIvc3RyaW5nX2RpY3Rpb25hcnlfYnVpbGRlci5ycw==) | `90.64% <0.00%> (-0.72%)` | :arrow_down: |
   | [parquet/src/encodings/encoding/dict\_encoder.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvZW5jb2RpbmdzL2VuY29kaW5nL2RpY3RfZW5jb2Rlci5ycw==) | `90.74% <0.00%> (-0.49%)` | :arrow_down: |
   | [...rquet/src/arrow/record\_reader/definition\_levels.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvcmVjb3JkX3JlYWRlci9kZWZpbml0aW9uX2xldmVscy5ycw==) | `88.60% <0.00%> (-0.43%)` | :arrow_down: |
   | [parquet/src/column/writer/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvY29sdW1uL3dyaXRlci9tb2QucnM=) | `92.85% <0.00%> (-0.15%)` | :arrow_down: |
   | [arrow/src/array/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L21vZC5ycw==) | `100.00% <0.00%> (ø)` | |
   | [parquet/src/arrow/arrow\_writer/byte\_array.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyb3dfd3JpdGVyL2J5dGVfYXJyYXkucnM=) | `76.72% <0.00%> (ø)` | |
   | [arrow/src/array/array\_fixed\_size\_list.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-YXJyb3cvc3JjL2FycmF5L2FycmF5X2ZpeGVkX3NpemVfbGlzdC5ycw==) | `92.91% <0.00%> (ø)` | |
   | [parquet/src/arrow/arrow\_writer/mod.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyb3dfd3JpdGVyL21vZC5ycw==) | `97.66% <0.00%> (+0.01%)` | :arrow_up: |
   | [parquet/src/arrow/arrow\_reader.rs](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-cGFycXVldC9zcmMvYXJyb3cvYXJyb3dfcmVhZGVyLnJz) | `95.12% <0.00%> (+0.12%)` | :arrow_up: |
   | ... and [10 more](https://codecov.io/gh/apache/arrow-rs/pull/2251/diff?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | |
   
   Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934943239


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });
+
+        // Benefit:
+        //     20% performance improvement
+        // Soundness:
+        //     The iterator is trustedLen because it comes from an `StringArray`.
+        unsafe { Time32SecondArray::from_trusted_len_iter(iter) }
+    } else {
+        let vec = (0..string_array.len())
+            .map(|i| {
+                if string_array.is_null(i) {
+                    Ok(None)
+                } else {
+                    let string = string_array
+                        .value(i);
+                    chrono::Duration::days(3);

Review Comment:
   Indeed – it is gone now, thanks for the spot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934957161


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   I tried, but unfortunately it fails at runtime with a panic:
   
   ```
   thread 'compute::kernels::cast::tests::test_cast_string_to_time32second' panicked at 'trusted_len_unzip requires an upper limit', arrow/src/array/array_primitive.rs:470:25
   ```
   
   at
   
   https://github.com/apache/arrow-rs/blob/91b2d7e66b066d54035774625b29ff226c755172/arrow/src/array/array_primitive.rs#L469-L470
   
   The previous `iter` was ultimately of type `Range<usize>`, which returns `Some(_)` for the `size_hint`:
   
   https://github.com/rust-lang/rust/blob/fe3342816a282949f014caa05ea2e669ff9d3d3c/library/core/src/iter/range.rs#L714-L722
   
   whereas our new `iter` does not, as it does not know the size. 
   
   There are probably good reasons, but it is unfortunate that `size_hint` was not a separate trait, so this could be caught at compile time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r935569173


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   I found a way to do this cleanly -- in https://github.com/apache/arrow-rs/pull/2284



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934957161


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   I tried, but unfortunately it fails at runtime with a panic:
   
   ```
   thread 'compute::kernels::cast::tests::test_cast_string_to_time32second' panicked at 'trusted_len_unzip requires an upper limit', arrow/src/array/array_primitive.rs:470:25
   ```
   
   at
   
   https://github.com/apache/arrow-rs/blob/91b2d7e66b066d54035774625b29ff226c755172/arrow/src/array/array_primitive.rs#L469-L470
   
   The previous `iter` was ultimately of type `Range<usize>`, which returns `Some(_)` for the `size_hint`:
   
   https://github.com/rust-lang/rust/blob/fe3342816a282949f014caa05ea2e669ff9d3d3c/library/core/src/iter/range.rs#L714-L722
   
   whereas our new `iter` does not. 
   
   There are probably good reasons, but it is unfortunate that `size_hint` was not a separate trait, so this could be caught at compile time.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934927554


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };

Review Comment:
   Looks like they can model them, but don't keep a database of them? https://github.com/chronotope/chrono/blob/ab688c384f7c797466fca69d69bfd5ee3c2cba96/src/naive/time/mod.rs#L480



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r935481593


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32
+}
+
+/// Casts generic string arrays to `Time32SecondArray`
+fn cast_string_to_time32second<Offset: OffsetSizeTrait>(
+    array: &dyn Array,
+    cast_options: &CastOptions,
+) -> Result<ArrayRef> {
+    let string_array = array
+        .as_any()
+        .downcast_ref::<GenericStringArray<Offset>>()
+        .unwrap();
+
+    let array = if cast_options.safe {
+        let iter = (0..string_array.len()).map(|i| {
+            if string_array.is_null(i) {
+                None
+            } else {
+                string_array
+                    .value(i)
+                    .parse::<chrono::NaiveTime>()
+                    .map(|time| seconds_since_midnight(&time))
+                    .ok()
+            }
+        });

Review Comment:
   Thanks for checking @stuartcarnie  -- I wonder if flat_map is the problem -- let me take another crack at this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r935482492


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -2854,6 +3172,102 @@ mod tests {
         }
     }
 
+    #[test]
+    fn test_cast_string_to_time32second() {
+        let a1 = Arc::new(StringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second
+            Some("08:08:61.091323414"), // not valid
+            Some("Not a valid time"),
+            None,
+        ])) as ArrayRef;
+        let a2 = Arc::new(LargeStringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second
+            Some("08:08:61.091323414"), // not valid
+            Some("Not a valid time"),
+            None,
+        ])) as ArrayRef;
+        for array in &[a1, a2] {
+            let b = cast(array, &DataType::Time32(TimeUnit::Second)).unwrap();
+            let c = b.as_any().downcast_ref::<Time32SecondArray>().unwrap();
+            assert_eq!(29315, c.value(0));
+            assert_eq!(29340, c.value(1));
+            assert!(c.is_null(2));
+            assert!(c.is_null(3));
+            assert!(c.is_null(4));
+        }
+    }
+
+    #[test]
+    fn test_cast_string_to_time32millisecond() {
+        let a1 = Arc::new(StringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second
+            Some("08:08:61.091323414"), // not valid
+            Some("Not a valid time"),
+            None,
+        ])) as ArrayRef;
+        let a2 = Arc::new(LargeStringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second

Review Comment:
   💯  for the leap second



##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -2854,6 +3172,102 @@ mod tests {
         }
     }
 
+    #[test]
+    fn test_cast_string_to_time32second() {

Review Comment:
   👍  these are great tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934942068


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };
+    (sec + adjust) as i32
+}
+
+fn milliseconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    /// The number of nanoseconds per millisecond.
+    const NANOS_PER_MILLI: u32 = 1_000_000;
+    /// The number of milliseconds per second.
+    const MILLIS_PER_SEC: u32 = 1_000;
+
+    let sec = time.num_seconds_from_midnight() * MILLIS_PER_SEC;
+    let frac = time.nanosecond();
+    let (frac, adjust) = if frac < 1_000_000_000 {
+        (frac, 0)
+    } else {
+        (frac - 1_000_000_000, MILLIS_PER_SEC)
+    };
+    (sec + adjust + frac / NANOS_PER_MILLI) as i32

Review Comment:
   Yes indeed, I concluded the same thing and pushed up the changes, thanks! I will move the functions back inline as I originally had them, before I overcomplicated it 😂  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] avantgardnerio commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934918636


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };

Review Comment:
   I'm a bit confused about this myself. It was my understanding that these were added manually by timekeepers? Does chrono keep a list of historical leap seconds? https://en.wikipedia.org/wiki/Leap_second#:~:text=Between%201972%20and%202020%2C%20a,every%2021%20months%2C%20on%20average.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] stuartcarnie commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
stuartcarnie commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r934940959


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -1584,6 +1625,303 @@ fn cast_string_to_date64<Offset: OffsetSizeTrait>(
     Ok(Arc::new(array) as ArrayRef)
 }
 
+fn seconds_since_midnight(time: &chrono::NaiveTime) -> i32 {
+    let sec = time.num_seconds_from_midnight();
+    let frac = time.nanosecond();
+    let adjust = if frac < 1_000_000_000 { 0 } else { 1 };

Review Comment:
   It is possible to parse a time with a leap second, using the 60th second:
   
   https://docs.rs/chrono/0.3.1/chrono/naive/time/index.html#reading-and-writing-leap-seconds
   
   The leap second is stored as a fractional second of 1_000_000_000 nanoseconds, but I realise now that I don't need any logic to handle it. The code is much simpler 😂



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-rs] alamb commented on a diff in pull request #2251: feat: Implement string cast operations for Time32 and Time64

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2251:
URL: https://github.com/apache/arrow-rs/pull/2251#discussion_r935500422


##########
arrow/src/compute/kernels/cast.rs:
##########
@@ -2854,6 +3172,102 @@ mod tests {
         }
     }
 
+    #[test]
+    fn test_cast_string_to_time32second() {
+        let a1 = Arc::new(StringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second
+            Some("08:08:61.091323414"), // not valid
+            Some("Not a valid time"),
+            None,
+        ])) as ArrayRef;
+        let a2 = Arc::new(LargeStringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second
+            Some("08:08:61.091323414"), // not valid
+            Some("Not a valid time"),
+            None,
+        ])) as ArrayRef;
+        for array in &[a1, a2] {
+            let b = cast(array, &DataType::Time32(TimeUnit::Second)).unwrap();
+            let c = b.as_any().downcast_ref::<Time32SecondArray>().unwrap();
+            assert_eq!(29315, c.value(0));
+            assert_eq!(29340, c.value(1));
+            assert!(c.is_null(2));
+            assert!(c.is_null(3));
+            assert!(c.is_null(4));
+        }
+    }
+
+    #[test]
+    fn test_cast_string_to_time32millisecond() {
+        let a1 = Arc::new(StringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second
+            Some("08:08:61.091323414"), // not valid
+            Some("Not a valid time"),
+            None,
+        ])) as ArrayRef;
+        let a2 = Arc::new(LargeStringArray::from(vec![
+            Some("08:08:35.091323414"),
+            Some("08:08:60.091323414"), // leap second
+            Some("08:08:61.091323414"), // not valid
+            Some("Not a valid time"),
+            None,
+        ])) as ArrayRef;
+        for array in &[a1, a2] {
+            let b = cast(array, &DataType::Time32(TimeUnit::Millisecond)).unwrap();
+            let c = b.as_any().downcast_ref::<Time32MillisecondArray>().unwrap();
+            assert_eq!(29315091, c.value(0));
+            assert_eq!(29340091, c.value(1));
+            assert!(c.is_null(2));
+            assert!(c.is_null(3));
+            assert!(c.is_null(4));
+        }

Review Comment:
   I didn't see any tests for the fallable path (aka that throws an error). I want to have another crack at using the iterators so I'll add those tests in a follow up



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org