You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/29 19:33:15 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #2984: [Minor] Extract interval parsing logic, add unit tests

alamb opened a new pull request, #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984

   # Which issue does this PR close?
   
   N/A
   
    # Rationale for this change
   
   Something I noticed while working on https://github.com/apache/arrow-datafusion/pull/2981/files was that the parsing of strings like `3 days` to interval constants was in the sql parser logic directly which made it somewhat hard to follow
   
   Since https://github.com/sqlparser-rs/sqlparser-rs/pull/517  added support for arbitrary `Expr`s as interval arguments (rather than just strings), I thought refactoring this code into a function would be a step towards supporting generic arguments
   
   It also made it it easier to write tests
   
   I was thinking eventually we could use some of the code from @avantgardnerio from https://github.com/apache/arrow-rs/pull/2031  in Arrow but before I did that  I wanted to make sure the existing code had reasonable test coverage (to make sure we didn't break things).
   
   # What changes are included in this PR?
   1. Pull interval parsing logic into its own module and function. I tried to not make any logical changes
   2. Add some (very) basic tests
   3. Add functions to easily create `ScalarValue::Interval*`
   
   # Are there any user-facing changes?
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please add the `api change` label.
   -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
codecov-commenter commented on PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#issuecomment-1200147953

   # [Codecov](https://codecov.io/gh/apache/arrow-datafusion/pull/2984?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) Report
   > Merging [#2984](https://codecov.io/gh/apache/arrow-datafusion/pull/2984?src=pr&el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (46851dc) into [master](https://codecov.io/gh/apache/arrow-datafusion/commit/2d23860df9e449e5ec26307d82d8a30b7e133bb6?el=desc&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) (2d23860) will **increase** coverage by `0.05%`.
   > The diff coverage is `86.74%`.
   
   ```diff
   @@            Coverage Diff             @@
   ##           master    #2984      +/-   ##
   ==========================================
   + Coverage   85.76%   85.81%   +0.05%     
   ==========================================
     Files         281      282       +1     
     Lines       51515    51531      +16     
   ==========================================
   + Hits        44182    44222      +40     
   + Misses       7333     7309      -24     
   ```
   
   
   | [Impacted Files](https://codecov.io/gh/apache/arrow-datafusion/pull/2984?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation) | Coverage Δ | |
   |---|---|---|
   | [datafusion/common/src/scalar.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2984/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb21tb24vc3JjL3NjYWxhci5ycw==) | `84.88% <66.66%> (+0.02%)` | :arrow_up: |
   | [datafusion/sql/src/interval.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2984/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcWwvc3JjL2ludGVydmFsLnJz) | `88.57% <88.57%> (ø)` | |
   | [datafusion/sql/src/planner.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2984/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9zcWwvc3JjL3BsYW5uZXIucnM=) | `81.88% <100.00%> (+0.72%)` | :arrow_up: |
   | [datafusion/expr/src/logical\_plan/plan.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2984/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9leHByL3NyYy9sb2dpY2FsX3BsYW4vcGxhbi5ycw==) | `77.60% <0.00%> (-0.18%)` | :arrow_down: |
   | [datafusion/core/src/physical\_plan/metrics/value.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2984/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3NyYy9waHlzaWNhbF9wbGFuL21ldHJpY3MvdmFsdWUucnM=) | `87.43% <0.00%> (+0.50%)` | :arrow_up: |
   | [datafusion/core/tests/parquet\_pruning.rs](https://codecov.io/gh/apache/arrow-datafusion/pull/2984/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation#diff-ZGF0YWZ1c2lvbi9jb3JlL3Rlc3RzL3BhcnF1ZXRfcHJ1bmluZy5ycw==) | `99.43% <0.00%> (+1.17%)` | :arrow_up: |
   
   Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=The+Apache+Software+Foundation).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933611556


##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   Relevant conversation:
   https://github.com/apache/arrow-datafusion/pull/2797/files#r910183035
   https://github.com/apache/arrow-datafusion/pull/2797/files#r910186338



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933566263


##########
datafusion/sql/src/planner.rs:
##########
@@ -2216,146 +2217,12 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
             }
         };
 
-        const SECONDS_PER_HOUR: f32 = 3_600_f32;
-        const MILLIS_PER_SECOND: f32 = 1_000_f32;
-
-        // We are storing parts as integers, it's why we need to align parts fractional
-        // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
-        // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
-        let align_interval_parts = |month_part: f32,
-                                    mut day_part: f32,
-                                    mut milles_part: f32|
-         -> (i32, i32, f32) {
-            // Convert fractional month to days, It's not supported by Arrow types, but anyway
-            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
-
-            // Convert fractional days to hours
-            milles_part += (day_part - ((day_part as i32) as f32))
-                * 24_f32
-                * SECONDS_PER_HOUR
-                * MILLIS_PER_SECOND;
-
-            (month_part as i32, day_part as i32, milles_part)
-        };
-
-        let calculate_from_part = |interval_period_str: &str,
-                                   interval_type: &str|
-         -> Result<(i32, i32, f32)> {
-            // @todo It's better to use Decimal in order to protect rounding errors
-            // Wait https://github.com/apache/arrow/pull/9232
-            let interval_period = match f32::from_str(interval_period_str) {
-                Ok(n) => n,
-                Err(_) => {
-                    return Err(DataFusionError::SQL(ParserError(format!(
-                        "Unsupported Interval Expression with value {:?}",
-                        value
-                    ))));
-                }
-            };
-
-            if interval_period > (i32::MAX as f32) {
-                return Err(DataFusionError::NotImplemented(format!(
-                    "Interval field value out of range: {:?}",
-                    value
-                )));
-            }
-
-            match interval_type.to_lowercase().as_str() {
-                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
-                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
-                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
-                "hour" | "hours" => {
-                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
-                }
-                "minutes" | "minute" => {
-                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
-                }
-                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
-                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
-                _ => Err(DataFusionError::NotImplemented(format!(
-                    "Invalid input syntax for type interval: {:?}",
-                    value
-                ))),
-            }
-        };
-
-        let mut result_month: i64 = 0;
-        let mut result_days: i64 = 0;
-        let mut result_millis: i64 = 0;
-
-        let mut parts = value.split_whitespace();
-
-        loop {
-            let interval_period_str = parts.next();
-            if interval_period_str.is_none() {
-                break;
-            }
-
-            let leading_field = leading_field
-                .as_ref()
-                .map(|dt| dt.to_string())
-                .unwrap_or_else(|| "second".to_string());
-
-            let unit = parts
-                .next()
-                .map(|part| part.to_string())
-                .unwrap_or(leading_field);
-
-            let (diff_month, diff_days, diff_millis) =
-                calculate_from_part(interval_period_str.unwrap(), &unit)?;
-
-            result_month += diff_month as i64;
-
-            if result_month > (i32::MAX as i64) {
-                return Err(DataFusionError::NotImplemented(format!(
-                    "Interval field value out of range: {:?}",
-                    value
-                )));
-            }
-
-            result_days += diff_days as i64;
-
-            if result_days > (i32::MAX as i64) {
-                return Err(DataFusionError::NotImplemented(format!(
-                    "Interval field value out of range: {:?}",
-                    value
-                )));
-            }
-
-            result_millis += diff_millis as i64;
-
-            if result_millis > (i32::MAX as i64) {
-                return Err(DataFusionError::NotImplemented(format!(
-                    "Interval field value out of range: {:?}",
-                    value
-                )));
-            }
-        }
-
-        // Interval is tricky thing
-        // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
-        // The true way to store and calculate intervals is to store it as it defined
-        // It's why we there are 3 different interval types in Arrow
-        if result_month != 0 && (result_days != 0 || result_millis != 0) {
-            let result: i128 = ((result_month as i128) << 96)
-                | ((result_days as i128) << 64)
-                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
-                | ((result_millis * 1_000_000_i64) as i128);
-
-            return Ok(Expr::Literal(ScalarValue::IntervalMonthDayNano(Some(
-                result,
-            ))));
-        }
-
-        // Month interval
-        if result_month != 0 {
-            return Ok(Expr::Literal(ScalarValue::IntervalYearMonth(Some(
-                result_month as i32,
-            ))));
-        }
+        let leading_field = leading_field
+            .as_ref()
+            .map(|dt| dt.to_string())
+            .unwrap_or_else(|| "second".to_string());

Review Comment:
   This is the same code as used previously, but it has been raised up and passed in https://github.com/apache/arrow-datafusion/pull/2984/files#diff-c01a34949db6258aa1593f011ecf90f62cbde406acd5cdbf8b9b60b970ace1cfL2294-L2297



##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {

Review Comment:
   These tests are new



##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   I am not quite sure what to make of this -- my current interpretation is that `IntervalMonthDayNano` and the interval parsing logic are inconsistent but I wanted to double check
   
   cc @avantgardnerio 



##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {

Review Comment:
   This code was moved as closely as possible from parser.rs



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933644641


##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   I think I found the C impl, and I think `parse_interval()` looks correct https://github.com/apache/arrow/blob/02c8598d264c839a5b5cf3109bfd406f3b8a6ba5/cpp/src/arrow/type.h#L1475
   
   I can make a PR to fix shortly if we're sure this is the correct format. It would be good to have an integration test between C & Rust. Do we have anything like that now?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933602355


##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   When I was doing my implementation, `MonthDayNano` was the one I could never figure out how to test, because I could not get a SQL query to cause one to be generated. Clearly I was unaware of the `parse_interval()` function. I inferred the order of the values based on github conversations, documentation, and how they were ordered in `IntervalDayTimeType`.
   
   So it's possible that `new_interval_mdn()` is not correct in its assumption:
   ```
           let m = months as u128 & u32::MAX as u128;
           let d = (days as u128 & u32::MAX as u128) << 32;
           let n = (nanos as u128) << 64;
   ```
   
   I'll pull this branch down and see if I can add more clarity.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933606743


##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   Looks like `parse_interval()` sees it as:
   ```
       if result_month != 0 && (result_days != 0 || result_millis != 0) {
           let result: i128 = ((result_month as i128) << 96)
                   | ((result_days as i128) << 64)
                   // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
                   | ((result_millis * 1_000_000_i64) as i128);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
alamb merged PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933911474


##########
datafusion/common/src/scalar.rs:
##########
@@ -618,6 +619,28 @@ impl ScalarValue {
             precision, scale
         )))
     }
+
+    /// Returns a [`ScalarValue::IntervalYearMonth`] representing
+    /// `years` years and `months` months

Review Comment:
   These seem very nice to have. I think more convenience functions like this are the key to terser and more readable code.



##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {

Review Comment:
   And thank you for them! Comparing the results of differing implementations is an extremely useful test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] ursabot commented on pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#issuecomment-1200402675

   Benchmark runs are scheduled for baseline = 0fa6a93b8e309b044f037ac96dea87698eabb8c6 and contender = c7fa789e85025a631ed634881e60c1ed71e8d269. c7fa789e85025a631ed634881e60c1ed71e8d269 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/f1b8940e80954ee5b5dcb857a1c19292...4bad206b6dcf405b8b60b46f4829a699/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] [test-mac-arm](https://conbench.ursa.dev/compare/runs/b5d32082a8b34698833a0f83c902ed13...76d2ab1d2ef84f8f972f4cf1e61f5bbf/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/17f148189eba4586bdabfe0999b34600...579fc966ed6b43348df0bc7887056433/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/e8b38689ee5c4da6807bd5c6474c2505...2a838d07599b45878d0e78322e6bc092/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933610666


##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   It is also entirely possible the but is in  `parse_interval` -- I am not sure anyone uses the `IntervalMonthDayNano` yet -- it was added relatively recently as I recall



##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   It is also entirely possible the bug is in  `parse_interval` -- I am not sure anyone uses the `IntervalMonthDayNano` yet -- it was added relatively recently as I recall



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
alamb commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933796593


##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   ```suggestion
       // https://github.com/apache/arrow-rs/pull/2235
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] avantgardnerio commented on a diff in pull request #2984: [Minor] Extract interval parsing logic, add unit tests

Posted by GitBox <gi...@apache.org>.
avantgardnerio commented on code in PR #2984:
URL: https://github.com/apache/arrow-datafusion/pull/2984#discussion_r933691406


##########
datafusion/sql/src/interval.rs:
##########
@@ -0,0 +1,214 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Interval parsing logic
+use datafusion_common::{DataFusionError, Result, ScalarValue};
+use std::str::FromStr;
+
+const SECONDS_PER_HOUR: f32 = 3_600_f32;
+const MILLIS_PER_SECOND: f32 = 1_000_f32;
+
+/// Parses a string with an interval like `'0.5 MONTH'` to an
+/// appropriately typed [`ScalarValue`]
+pub(crate) fn parse_interval(leading_field: &str, value: &str) -> Result<ScalarValue> {
+    // We are storing parts as integers, it's why we need to align parts fractional
+    // INTERVAL '0.5 MONTH' = 15 days, INTERVAL '1.5 MONTH' = 1 month 15 days
+    // INTERVAL '0.5 DAY' = 12 hours, INTERVAL '1.5 DAY' = 1 day 12 hours
+    let align_interval_parts =
+        |month_part: f32, mut day_part: f32, mut milles_part: f32| -> (i32, i32, f32) {
+            // Convert fractional month to days, It's not supported by Arrow types, but anyway
+            day_part += (month_part - (month_part as i32) as f32) * 30_f32;
+
+            // Convert fractional days to hours
+            milles_part += (day_part - ((day_part as i32) as f32))
+                * 24_f32
+                * SECONDS_PER_HOUR
+                * MILLIS_PER_SECOND;
+
+            (month_part as i32, day_part as i32, milles_part)
+        };
+
+    let calculate_from_part =
+        |interval_period_str: &str, interval_type: &str| -> Result<(i32, i32, f32)> {
+            // @todo It's better to use Decimal in order to protect rounding errors
+            // Wait https://github.com/apache/arrow/pull/9232
+            let interval_period = match f32::from_str(interval_period_str) {
+                Ok(n) => n,
+                Err(_) => {
+                    return Err(DataFusionError::NotImplemented(format!(
+                        "Unsupported Interval Expression with value {:?}",
+                        value
+                    )));
+                }
+            };
+
+            if interval_period > (i32::MAX as f32) {
+                return Err(DataFusionError::NotImplemented(format!(
+                    "Interval field value out of range: {:?}",
+                    value
+                )));
+            }
+
+            match interval_type.to_lowercase().as_str() {
+                "year" => Ok(align_interval_parts(interval_period * 12_f32, 0.0, 0.0)),
+                "month" => Ok(align_interval_parts(interval_period, 0.0, 0.0)),
+                "day" | "days" => Ok(align_interval_parts(0.0, interval_period, 0.0)),
+                "hour" | "hours" => {
+                    Ok((0, 0, interval_period * SECONDS_PER_HOUR * MILLIS_PER_SECOND))
+                }
+                "minutes" | "minute" => {
+                    Ok((0, 0, interval_period * 60_f32 * MILLIS_PER_SECOND))
+                }
+                "seconds" | "second" => Ok((0, 0, interval_period * MILLIS_PER_SECOND)),
+                "milliseconds" | "millisecond" => Ok((0, 0, interval_period)),
+                _ => Err(DataFusionError::NotImplemented(format!(
+                    "Invalid input syntax for type interval: {:?}",
+                    value
+                ))),
+            }
+        };
+
+    let mut result_month: i64 = 0;
+    let mut result_days: i64 = 0;
+    let mut result_millis: i64 = 0;
+
+    let mut parts = value.split_whitespace();
+
+    loop {
+        let interval_period_str = parts.next();
+        if interval_period_str.is_none() {
+            break;
+        }
+
+        let unit = parts.next().unwrap_or(leading_field);
+
+        let (diff_month, diff_days, diff_millis) =
+            calculate_from_part(interval_period_str.unwrap(), unit)?;
+
+        result_month += diff_month as i64;
+
+        if result_month > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_days += diff_days as i64;
+
+        if result_days > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+
+        result_millis += diff_millis as i64;
+
+        if result_millis > (i32::MAX as i64) {
+            return Err(DataFusionError::NotImplemented(format!(
+                "Interval field value out of range: {:?}",
+                value
+            )));
+        }
+    }
+
+    // Interval is tricky thing
+    // 1 day is not 24 hours because timezones, 1 year != 365/364! 30 days != 1 month
+    // The true way to store and calculate intervals is to store it as it defined
+    // It's why we there are 3 different interval types in Arrow
+    if result_month != 0 && (result_days != 0 || result_millis != 0) {
+        let result: i128 = ((result_month as i128) << 96)
+                | ((result_days as i128) << 64)
+                // IntervalMonthDayNano uses nanos, but IntervalDayTime uses milles
+                | ((result_millis * 1_000_000_i64) as i128);
+
+        return Ok(ScalarValue::IntervalMonthDayNano(Some(result)));
+    }
+
+    // Month interval
+    if result_month != 0 {
+        return Ok(ScalarValue::IntervalYearMonth(Some(result_month as i32)));
+    }
+
+    let result: i64 = (result_days << 32) | result_millis;
+    Ok(ScalarValue::IntervalDayTime(Some(result)))
+}
+
+#[cfg(test)]
+mod test {
+    use crate::assert_contains;
+
+    use super::*;
+
+    #[test]
+    fn test_parse_ym() {
+        assert_eq!(
+            parse_interval("months", "1 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 1)
+        );
+
+        assert_eq!(
+            parse_interval("months", "2 month").unwrap(),
+            ScalarValue::new_interval_ym(0, 2)
+        );
+
+        assert_eq!(
+            parse_interval("months", "3 year 1 month").unwrap(),
+            ScalarValue::new_interval_ym(3, 1)
+        );
+
+        assert_contains!(
+            parse_interval("months", "1 years 1 month")
+                .unwrap_err()
+                .to_string(),
+            r#"Invalid input syntax for type interval: "1 years 1 month""#
+        );
+    }
+
+    #[test]
+    fn test_dt() {
+        assert_eq!(
+            parse_interval("months", "5 days").unwrap(),
+            ScalarValue::new_interval_dt(5, 0)
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 3 hours").unwrap(),
+            ScalarValue::new_interval_dt(
+                7,
+                (3.0 * SECONDS_PER_HOUR * MILLIS_PER_SECOND) as i32
+            )
+        );
+
+        assert_eq!(
+            parse_interval("months", "7 days 5 minutes").unwrap(),
+            ScalarValue::new_interval_dt(7, (5.0 * 60.0 * MILLIS_PER_SECOND) as i32)
+        );
+    }
+
+    #[test]
+    // TODO file a ticket to track

Review Comment:
   Linking to https://github.com/apache/arrow-rs/pull/2235 for resolution.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org