You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/03/30 21:24:54 UTC

[GitHub] [arrow-datafusion] alamb opened a new pull request, #5806: Fix `interval` to use consistent units and arrow parser

alamb opened a new pull request, #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806

   Draft until:
   - [ ] Get all tests passing
   - [ ] Debug why some intervals seem to be parsing incorrectly
   
   # Which issue does this PR close?
   
   Closes https://github.com/apache/arrow-datafusion/issues/5801
   
   # Rationale for this change
   
   1. To consistently support intervals we need to uniformly handle that type
   2. @doki23 ported the interval parsing code uptream in https://github.com/apache/arrow-rs/pull/3762 so we should use that (in preparation of supporting intervals more fully in DataFusion)
   
   # What changes are included in this PR?
   
   1. Remove datafusion special `parse_interval` function and use arrow kernel everyhere
   
   # Are these changes tested?
   yes, both existing and new tests
   
   # Are there any user-facing changes?
   More consistency with how `interval ` is handled


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155994250


##########
datafusion/core/tests/sqllogictests/test_files/interval.slt:
##########
@@ -23,7 +23,113 @@ select
   arrow_typeof(interval '5 months'),
   arrow_typeof(interval '5 days 3 nanoseconds')
 ----
-Interval(YearMonth) Interval(MonthDayNano)
+Interval(MonthDayNano) Interval(MonthDayNano)
+
+
+## This is incredibly confusing but document it in tests:
+#
+# years is parsed as a column name
+# year is parsed as part of the interval type.
+#
+# postgres=# select interval '5' year;
+#  interval
+# ----------
+#  5 years
+# (1 row)
+#
+# postgres=# select interval '5' years;
+#   years
+# ----------
+#  00:00:05
+# (1 row)
+query ?
+select interval '5' years
+----
+0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs
+
+
+# check all different kinds of intervals
+query ?
+select interval '5' year
+----
+0 years 60 mons 0 days 0 hours 0 mins 0.000000000 secs

Review Comment:
   A little strange, I think it will be reasonable to use '5' year.
   
   I think `interval '24' month -> 0 years 24 mons` is reasonable.
   But it's strange `interval '2' year -> 0 years 24 mons`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155095725


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -843,153 +843,153 @@ async fn test_interval_expressions() -> Result<()> {
     // day nano intervals
     test_expression!(
         "interval '1'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '500 milliseconds'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '5 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 5.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs"
     );
     test_expression!(
         "interval '0.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '5 minute'",
-        "0 years 0 mons 0 days 0 hours 5 mins 0.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 minute 1 second'",
-        "0 years 0 mons 0 days 0 hours 5 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 hour'",
-        "0 years 0 mons 0 days 1 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 1 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 hour'",
-        "0 years 0 mons 0 days 5 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 5 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day'",
-        "0 years 0 mons 1 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 week'",
-        "0 years 0 mons 7 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 7 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '2 weeks'",
-        "0 years 0 mons 14 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 14 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day 1'",
-        "0 years 0 mons 1 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.5'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '0.5 day 1'",
-        "0 years 0 mons 0 days 12 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 12 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.49 day'",
-        "0 years 0 mons 0 days 11 hours 45 mins 36.000 secs"
+        "0 years 0 mons 0 days 11 hours 45 mins 36.000000000 secs"
     );
     test_expression!(
         "interval '0.499 day'",
-        "0 years 0 mons 0 days 11 hours 58 mins 33.600 secs"
+        "0 years 0 mons 0 days 11 hours 58 mins 33.600000000 secs"
     );
     test_expression!(
         "interval '0.4999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 51.360 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 51.360000000 secs"
     );
     test_expression!(
         "interval '0.49999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 59.136000000 secs"
     );
     test_expression!(
         "interval '0.49999999999 day'",
         "0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
     );
     test_expression!(
         "interval '5 day'",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     // Hour is ignored, this matches PostgreSQL
     test_expression!(
         "interval '5 day' hour",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 day 4 hours 3 minutes 2 seconds 100 milliseconds'",
-        "0 years 0 mons 5 days 4 hours 3 mins 2.100 secs"
+        "0 years 0 mons 5 days 4 hours 3 mins 2.100000000 secs"
     );
     // month intervals
     test_expression!(
         "interval '0.5 month'",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '0.5' month",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 month'",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1' MONTH",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 month'",
-        "0 years 5 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 5 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '13 month'",
-        "1 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 13 mons 0 days 0 hours 0 mins 0.000000000 secs"

Review Comment:
   @waitingkuo  and @ovr  -- Do you have any thoughts about this proposed change (that the reported intervals are not in the same units as the original expression)?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1157493781


##########
datafusion/core/tests/sqllogictests/test_files/interval.slt:
##########
@@ -23,7 +23,113 @@ select
   arrow_typeof(interval '5 months'),
   arrow_typeof(interval '5 days 3 nanoseconds')
 ----
-Interval(YearMonth) Interval(MonthDayNano)
+Interval(MonthDayNano) Interval(MonthDayNano)
+
+
+## This is incredibly confusing but document it in tests:
+#
+# years is parsed as a column name
+# year is parsed as part of the interval type.
+#
+# postgres=# select interval '5' year;
+#  interval
+# ----------
+#  5 years
+# (1 row)
+#
+# postgres=# select interval '5' years;
+#   years
+# ----------
+#  00:00:05
+# (1 row)
+query ?
+select interval '5' years
+----
+0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs
+
+
+# check all different kinds of intervals
+query ?
+select interval '5' year
+----
+0 years 60 mons 0 days 0 hours 0 mins 0.000000000 secs

Review Comment:
   Yeah, I agree it is a little strange. I can't figure out how important it is to to mirror back exactly what the user said.
   
   For some things like `5 years` we could pick `Interval(YearMonth)` and encode exactly what the user specified
   
   For things like `5 years 3 days` to express that properly I think we need to use `Interval(MonthDayNano)` which is going to be formatted like `60 months ... 3 days`
   
   Or maybe we should just make the interval printing prettier 🤔  like convert `60 months` to 5 years...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155994250


##########
datafusion/core/tests/sqllogictests/test_files/interval.slt:
##########
@@ -23,7 +23,113 @@ select
   arrow_typeof(interval '5 months'),
   arrow_typeof(interval '5 days 3 nanoseconds')
 ----
-Interval(YearMonth) Interval(MonthDayNano)
+Interval(MonthDayNano) Interval(MonthDayNano)
+
+
+## This is incredibly confusing but document it in tests:
+#
+# years is parsed as a column name
+# year is parsed as part of the interval type.
+#
+# postgres=# select interval '5' year;
+#  interval
+# ----------
+#  5 years
+# (1 row)
+#
+# postgres=# select interval '5' years;
+#   years
+# ----------
+#  00:00:05
+# (1 row)
+query ?
+select interval '5' years
+----
+0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs
+
+
+# check all different kinds of intervals
+query ?
+select interval '5' year
+----
+0 years 60 mons 0 days 0 hours 0 mins 0.000000000 secs

Review Comment:
   A little strange behavior.
   
   I think `interval '24' month -> 0 years 24 mons` is reasonable.
   But it's strange `interval '2' year -> 0 years 24 mons`.
   
   Maybe in the future, if we can add `year` in `interval`, it will can be resolved easily



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1154771201


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -843,153 +843,153 @@ async fn test_interval_expressions() -> Result<()> {
     // day nano intervals
     test_expression!(
         "interval '1'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '500 milliseconds'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '5 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 5.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs"
     );
     test_expression!(
         "interval '0.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '5 minute'",
-        "0 years 0 mons 0 days 0 hours 5 mins 0.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 minute 1 second'",
-        "0 years 0 mons 0 days 0 hours 5 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 hour'",
-        "0 years 0 mons 0 days 1 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 1 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 hour'",
-        "0 years 0 mons 0 days 5 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 5 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day'",
-        "0 years 0 mons 1 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 week'",
-        "0 years 0 mons 7 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 7 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '2 weeks'",
-        "0 years 0 mons 14 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 14 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day 1'",
-        "0 years 0 mons 1 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.5'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '0.5 day 1'",
-        "0 years 0 mons 0 days 12 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 12 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.49 day'",
-        "0 years 0 mons 0 days 11 hours 45 mins 36.000 secs"
+        "0 years 0 mons 0 days 11 hours 45 mins 36.000000000 secs"
     );
     test_expression!(
         "interval '0.499 day'",
-        "0 years 0 mons 0 days 11 hours 58 mins 33.600 secs"
+        "0 years 0 mons 0 days 11 hours 58 mins 33.600000000 secs"
     );
     test_expression!(
         "interval '0.4999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 51.360 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 51.360000000 secs"
     );
     test_expression!(
         "interval '0.49999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 59.136000000 secs"
     );
     test_expression!(
         "interval '0.49999999999 day'",
         "0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
     );
     test_expression!(
         "interval '5 day'",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     // Hour is ignored, this matches PostgreSQL
     test_expression!(
         "interval '5 day' hour",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 day 4 hours 3 minutes 2 seconds 100 milliseconds'",
-        "0 years 0 mons 5 days 4 hours 3 mins 2.100 secs"
+        "0 years 0 mons 5 days 4 hours 3 mins 2.100000000 secs"
     );
     // month intervals
     test_expression!(
         "interval '0.5 month'",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '0.5' month",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 month'",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1' MONTH",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 month'",
-        "0 years 5 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 5 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '13 month'",
-        "1 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 13 mons 0 days 0 hours 0 mins 0.000000000 secs"

Review Comment:
   I am not sure this matters -- but `interval` parsing now is `month/day/nano` always and thus is not quite as nice a display as it was previously (no years, instead it is always months). However perhaps if anyone cares we can improve the display in arrow-rs for this kind of interval



##########
datafusion/core/tests/sql/timestamp.rs:
##########
@@ -968,52 +968,52 @@ async fn timestamp_array_add_interval() -> Result<()> {
     let sql = "SELECT ts, ts - INTERVAL '8' MILLISECONDS FROM table_a";
     let actual = execute_to_batches(&ctx, sql).await;
     let expected = vec![
-        "+----------------------------+-----------------------------------+",
-        "| ts                         | table_a.ts - IntervalDayTime(\"8\") |",
-        "+----------------------------+-----------------------------------+",
-        "| 2020-09-08T13:42:29.190855 | 2020-09-08T13:42:29.182855        |",
-        "| 2020-09-08T12:42:29.190855 | 2020-09-08T12:42:29.182855        |",
-        "| 2020-09-08T11:42:29.190855 | 2020-09-08T11:42:29.182855        |",
-        "+----------------------------+-----------------------------------+",
+        "+----------------------------+----------------------------------------------+",
+        "| ts                         | table_a.ts - IntervalMonthDayNano(\"8000000\") |",
+        "+----------------------------+----------------------------------------------+",
+        "| 2020-09-08T13:42:29.190855 | 2020-09-08T13:42:29.182855                   |",
+        "| 2020-09-08T12:42:29.190855 | 2020-09-08T12:42:29.182855                   |",
+        "| 2020-09-08T11:42:29.190855 | 2020-09-08T11:42:29.182855                   |",
+        "+----------------------------+----------------------------------------------+",
     ];
     assert_batches_eq!(expected, &actual);
 
     let sql = "SELECT ts, ts + INTERVAL '1' SECOND FROM table_b";
     let actual = execute_to_batches(&ctx, sql).await;
     let expected = vec![
-        "+----------------------------+--------------------------------------+",
-        "| ts                         | table_b.ts + IntervalDayTime(\"1000\") |",
-        "+----------------------------+--------------------------------------+",
-        "| 2020-09-08T13:42:29.190855 | 2020-09-08T13:42:30.190855           |",
-        "| 2020-09-08T12:42:29.190855 | 2020-09-08T12:42:30.190855           |",
-        "| 2020-09-08T11:42:29.190855 | 2020-09-08T11:42:30.190855           |",
-        "+----------------------------+--------------------------------------+",
+        "+----------------------------+-------------------------------------------------+",
+        "| ts                         | table_b.ts + IntervalMonthDayNano(\"1000000000\") |",

Review Comment:
   This is pretty terrible formatting for the column name 🤮  but I don't think it is fixable easily -- see #2027 for more discussion



##########
datafusion/sql/src/expr/value.rs:
##########
@@ -202,11 +203,55 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
             }
         };
 
-        let leading_field = leading_field
-            .as_ref()
-            .map(|dt| dt.to_string())
-            .unwrap_or_else(|| "second".to_string());
+        let value = if has_units(&value) {

Review Comment:
   This is a pretty terrible hack but it seems to be necessary to get the tests to pass (aka to have postgres compatible syntax) 🤷 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155103879


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -843,153 +843,153 @@ async fn test_interval_expressions() -> Result<()> {
     // day nano intervals
     test_expression!(
         "interval '1'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '500 milliseconds'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '5 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 5.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs"
     );
     test_expression!(
         "interval '0.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '5 minute'",
-        "0 years 0 mons 0 days 0 hours 5 mins 0.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 minute 1 second'",
-        "0 years 0 mons 0 days 0 hours 5 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 hour'",
-        "0 years 0 mons 0 days 1 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 1 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 hour'",
-        "0 years 0 mons 0 days 5 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 5 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day'",
-        "0 years 0 mons 1 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 week'",
-        "0 years 0 mons 7 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 7 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '2 weeks'",
-        "0 years 0 mons 14 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 14 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day 1'",
-        "0 years 0 mons 1 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.5'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '0.5 day 1'",
-        "0 years 0 mons 0 days 12 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 12 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.49 day'",
-        "0 years 0 mons 0 days 11 hours 45 mins 36.000 secs"
+        "0 years 0 mons 0 days 11 hours 45 mins 36.000000000 secs"
     );
     test_expression!(
         "interval '0.499 day'",
-        "0 years 0 mons 0 days 11 hours 58 mins 33.600 secs"
+        "0 years 0 mons 0 days 11 hours 58 mins 33.600000000 secs"
     );
     test_expression!(
         "interval '0.4999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 51.360 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 51.360000000 secs"
     );
     test_expression!(
         "interval '0.49999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 59.136000000 secs"
     );
     test_expression!(
         "interval '0.49999999999 day'",
         "0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
     );
     test_expression!(
         "interval '5 day'",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     // Hour is ignored, this matches PostgreSQL
     test_expression!(
         "interval '5 day' hour",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 day 4 hours 3 minutes 2 seconds 100 milliseconds'",
-        "0 years 0 mons 5 days 4 hours 3 mins 2.100 secs"
+        "0 years 0 mons 5 days 4 hours 3 mins 2.100000000 secs"
     );
     // month intervals
     test_expression!(
         "interval '0.5 month'",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '0.5' month",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 month'",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1' MONTH",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 month'",
-        "0 years 5 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 5 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '13 month'",
-        "1 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 13 mons 0 days 0 hours 0 mins 0.000000000 secs"

Review Comment:
   FWIW this was a conscious change when the parsing logic was ported, to faithfully preserve what the user specified. This was mainly in the context of rounding days to months, which was simply incorrect, but months to years got the same treatment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jacksonrnewhouse commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "jacksonrnewhouse (via GitHub)" <gi...@apache.org>.
jacksonrnewhouse commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1164700546


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -843,153 +843,153 @@ async fn test_interval_expressions() -> Result<()> {
     // day nano intervals
     test_expression!(
         "interval '1'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '500 milliseconds'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '5 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 5.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs"
     );
     test_expression!(
         "interval '0.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '5 minute'",
-        "0 years 0 mons 0 days 0 hours 5 mins 0.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 minute 1 second'",
-        "0 years 0 mons 0 days 0 hours 5 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 hour'",
-        "0 years 0 mons 0 days 1 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 1 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 hour'",
-        "0 years 0 mons 0 days 5 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 5 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day'",
-        "0 years 0 mons 1 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 week'",
-        "0 years 0 mons 7 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 7 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '2 weeks'",
-        "0 years 0 mons 14 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 14 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day 1'",
-        "0 years 0 mons 1 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.5'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '0.5 day 1'",
-        "0 years 0 mons 0 days 12 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 12 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.49 day'",
-        "0 years 0 mons 0 days 11 hours 45 mins 36.000 secs"
+        "0 years 0 mons 0 days 11 hours 45 mins 36.000000000 secs"
     );
     test_expression!(
         "interval '0.499 day'",
-        "0 years 0 mons 0 days 11 hours 58 mins 33.600 secs"
+        "0 years 0 mons 0 days 11 hours 58 mins 33.600000000 secs"
     );
     test_expression!(
         "interval '0.4999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 51.360 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 51.360000000 secs"
     );
     test_expression!(
         "interval '0.49999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 59.136000000 secs"
     );
     test_expression!(
         "interval '0.49999999999 day'",
         "0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
     );
     test_expression!(
         "interval '5 day'",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     // Hour is ignored, this matches PostgreSQL
     test_expression!(
         "interval '5 day' hour",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 day 4 hours 3 minutes 2 seconds 100 milliseconds'",
-        "0 years 0 mons 5 days 4 hours 3 mins 2.100 secs"
+        "0 years 0 mons 5 days 4 hours 3 mins 2.100000000 secs"
     );
     // month intervals
     test_expression!(
         "interval '0.5 month'",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '0.5' month",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 month'",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1' MONTH",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 month'",
-        "0 years 5 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 5 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '13 month'",
-        "1 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 13 mons 0 days 0 hours 0 mins 0.000000000 secs"

Review Comment:
   Exploring how postgres handles this, if you do EXTRACT(EPOCH FROM interval), it treats months as 30 days, until you get to 12 months, and then it is 365.25 days. Not sure how much it matters to match postgres, but that's what I've seen.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1154723845


##########
datafusion/sql/Cargo.toml:
##########
@@ -38,6 +38,7 @@ unicode_expressions = []
 
 [dependencies]
 arrow-schema = { workspace = true }
+arrow = { workspace = true }
 datafusion-common = { path = "../common", version = "21.0.0" }

Review Comment:
   `datafusion-common` already depends on arrow so this is no new (net) dependency



##########
datafusion/sql/src/expr/value.rs:
##########
@@ -202,11 +203,17 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
             }
         };
 
-        let leading_field = leading_field
-            .as_ref()
-            .map(|dt| dt.to_string())
-            .unwrap_or_else(|| "second".to_string());
+        // leading_field really means the unit if specified
+        // for example, "month" in  `INTERVAL '5' month`
 
-        Ok(lit(parse_interval(&leading_field, &value)?))
+        let value = if let Some(leading_field) = leading_field.as_ref() {
+            format!("{value} {leading_field}")
+        } else {
+            // default to seconds for the units
+            format!("{value} seconds")

Review Comment:
   this is the same default as previously



##########
datafusion/optimizer/src/type_coercion.rs:
##########
@@ -601,14 +593,7 @@ fn coerce_arguments_for_signature(
 
 /// Cast `expr` to the specified type, if possible
 fn cast_expr(expr: &Expr, to_type: &DataType, schema: &DFSchema) -> Result<Expr> {
-    // Special case until Interval coercion is handled in arrow-rs

Review Comment:
   🎉  for avoiding the special cases



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1165863406


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -843,153 +843,153 @@ async fn test_interval_expressions() -> Result<()> {
     // day nano intervals
     test_expression!(
         "interval '1'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '500 milliseconds'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '5 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 5.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs"
     );
     test_expression!(
         "interval '0.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '5 minute'",
-        "0 years 0 mons 0 days 0 hours 5 mins 0.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 minute 1 second'",
-        "0 years 0 mons 0 days 0 hours 5 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 hour'",
-        "0 years 0 mons 0 days 1 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 1 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 hour'",
-        "0 years 0 mons 0 days 5 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 5 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day'",
-        "0 years 0 mons 1 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 week'",
-        "0 years 0 mons 7 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 7 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '2 weeks'",
-        "0 years 0 mons 14 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 14 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day 1'",
-        "0 years 0 mons 1 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.5'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '0.5 day 1'",
-        "0 years 0 mons 0 days 12 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 12 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.49 day'",
-        "0 years 0 mons 0 days 11 hours 45 mins 36.000 secs"
+        "0 years 0 mons 0 days 11 hours 45 mins 36.000000000 secs"
     );
     test_expression!(
         "interval '0.499 day'",
-        "0 years 0 mons 0 days 11 hours 58 mins 33.600 secs"
+        "0 years 0 mons 0 days 11 hours 58 mins 33.600000000 secs"
     );
     test_expression!(
         "interval '0.4999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 51.360 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 51.360000000 secs"
     );
     test_expression!(
         "interval '0.49999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 59.136000000 secs"
     );
     test_expression!(
         "interval '0.49999999999 day'",
         "0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
     );
     test_expression!(
         "interval '5 day'",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     // Hour is ignored, this matches PostgreSQL
     test_expression!(
         "interval '5 day' hour",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 day 4 hours 3 minutes 2 seconds 100 milliseconds'",
-        "0 years 0 mons 5 days 4 hours 3 mins 2.100 secs"
+        "0 years 0 mons 5 days 4 hours 3 mins 2.100000000 secs"
     );
     // month intervals
     test_expression!(
         "interval '0.5 month'",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '0.5' month",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 month'",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1' MONTH",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 month'",
-        "0 years 5 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 5 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '13 month'",
-        "1 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 13 mons 0 days 0 hours 0 mins 0.000000000 secs"

Review Comment:
   🤯 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1154724554


##########
datafusion/optimizer/src/type_coercion.rs:
##########
@@ -601,14 +593,7 @@ fn coerce_arguments_for_signature(
 
 /// Cast `expr` to the specified type, if possible
 fn cast_expr(expr: &Expr, to_type: &DataType, schema: &DFSchema) -> Result<Expr> {
-    // Special case until Interval coercion is handled in arrow-rs

Review Comment:
   🎉  for avoiding the special cases now that arrow-rs has string --> Interval suppot



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155091108


##########
datafusion/core/tests/sqllogictests/test_files/interval.slt:
##########
@@ -23,7 +23,113 @@ select
   arrow_typeof(interval '5 months'),
   arrow_typeof(interval '5 days 3 nanoseconds')
 ----
-Interval(YearMonth) Interval(MonthDayNano)
+Interval(MonthDayNano) Interval(MonthDayNano)
+
+
+## This is incredibly confusing but document it in tests:
+#
+# years is parsed as a column name

Review Comment:
   DataFusion is consistent with postgres so in that sense it is not a bug 
   
   This behavior is due to how the statement is parsed (sqlparser-rs treats `years` as the alias)
   
   It is incredibly confusing (I was confused for a while)
   
   Now that we have proper `Utf8` -> `Interval` parsing (thanks to @doki23 ) in arrow-rs I think the preferred way to create intervals should be `'5 months'::interval` which does the right thing and doesn't have the wacky syntax
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155103879


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -843,153 +843,153 @@ async fn test_interval_expressions() -> Result<()> {
     // day nano intervals
     test_expression!(
         "interval '1'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '500 milliseconds'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '5 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 5.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs"
     );
     test_expression!(
         "interval '0.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '5 minute'",
-        "0 years 0 mons 0 days 0 hours 5 mins 0.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 minute 1 second'",
-        "0 years 0 mons 0 days 0 hours 5 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 hour'",
-        "0 years 0 mons 0 days 1 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 1 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 hour'",
-        "0 years 0 mons 0 days 5 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 5 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day'",
-        "0 years 0 mons 1 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 week'",
-        "0 years 0 mons 7 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 7 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '2 weeks'",
-        "0 years 0 mons 14 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 14 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day 1'",
-        "0 years 0 mons 1 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.5'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '0.5 day 1'",
-        "0 years 0 mons 0 days 12 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 12 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.49 day'",
-        "0 years 0 mons 0 days 11 hours 45 mins 36.000 secs"
+        "0 years 0 mons 0 days 11 hours 45 mins 36.000000000 secs"
     );
     test_expression!(
         "interval '0.499 day'",
-        "0 years 0 mons 0 days 11 hours 58 mins 33.600 secs"
+        "0 years 0 mons 0 days 11 hours 58 mins 33.600000000 secs"
     );
     test_expression!(
         "interval '0.4999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 51.360 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 51.360000000 secs"
     );
     test_expression!(
         "interval '0.49999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 59.136000000 secs"
     );
     test_expression!(
         "interval '0.49999999999 day'",
         "0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
     );
     test_expression!(
         "interval '5 day'",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     // Hour is ignored, this matches PostgreSQL
     test_expression!(
         "interval '5 day' hour",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 day 4 hours 3 minutes 2 seconds 100 milliseconds'",
-        "0 years 0 mons 5 days 4 hours 3 mins 2.100 secs"
+        "0 years 0 mons 5 days 4 hours 3 mins 2.100000000 secs"
     );
     // month intervals
     test_expression!(
         "interval '0.5 month'",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '0.5' month",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 month'",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1' MONTH",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 month'",
-        "0 years 5 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 5 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '13 month'",
-        "1 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 13 mons 0 days 0 hours 0 mins 0.000000000 secs"

Review Comment:
   FWIW this was a conscious change when the parsing logic was ported, to faithfully preserve what the user specified. This was mainly in the context of rounding days to months and hours to days, which were simply incorrect, but months to years got the same treatment



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1154770224


##########
datafusion/common/src/parsers.rs:
##########
@@ -76,337 +72,3 @@ impl CompressionTypeVariant {
         !matches!(self, &Self::UNCOMPRESSED)
     }
 }
-

Review Comment:
   The point of this PR is to remove this code (which has been upstreamed to arrow-rs)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155088439


##########
datafusion/core/tests/sqllogictests/test_files/interval.slt:
##########
@@ -23,7 +23,113 @@ select
   arrow_typeof(interval '5 months'),
   arrow_typeof(interval '5 days 3 nanoseconds')
 ----
-Interval(YearMonth) Interval(MonthDayNano)
+Interval(MonthDayNano) Interval(MonthDayNano)
+
+
+## This is incredibly confusing but document it in tests:
+#
+# years is parsed as a column name

Review Comment:
   Is this a bug?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] tustvold commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "tustvold (via GitHub)" <gi...@apache.org>.
tustvold commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155123914


##########
datafusion/sql/src/expr/value.rs:
##########
@@ -202,11 +203,55 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
             }
         };
 
-        let leading_field = leading_field
-            .as_ref()
-            .map(|dt| dt.to_string())
-            .unwrap_or_else(|| "second".to_string());
+        let value = if has_units(&value) {

Review Comment:
   Would it be valid to just search for a character that isn't a digit or decimal point?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #5806: Fix `interval` to use consistent units and arrow parser

Posted by "jackwener (via GitHub)" <gi...@apache.org>.
jackwener commented on code in PR #5806:
URL: https://github.com/apache/arrow-datafusion/pull/5806#discussion_r1155989213


##########
datafusion/core/tests/sql/expr.rs:
##########
@@ -843,153 +843,153 @@ async fn test_interval_expressions() -> Result<()> {
     // day nano intervals
     test_expression!(
         "interval '1'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '500 milliseconds'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '5 second'",
-        "0 years 0 mons 0 days 0 hours 0 mins 5.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 5.000000000 secs"
     );
     test_expression!(
         "interval '0.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '.5 minute'",
-        "0 years 0 mons 0 days 0 hours 0 mins 30.000 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 30.000000000 secs"
     );
     test_expression!(
         "interval '5 minute'",
-        "0 years 0 mons 0 days 0 hours 5 mins 0.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 minute 1 second'",
-        "0 years 0 mons 0 days 0 hours 5 mins 1.000 secs"
+        "0 years 0 mons 0 days 0 hours 5 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '1 hour'",
-        "0 years 0 mons 0 days 1 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 1 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 hour'",
-        "0 years 0 mons 0 days 5 hours 0 mins 0.000 secs"
+        "0 years 0 mons 0 days 5 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day'",
-        "0 years 0 mons 1 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 week'",
-        "0 years 0 mons 7 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 7 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '2 weeks'",
-        "0 years 0 mons 14 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 14 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 day 1'",
-        "0 years 0 mons 1 days 0 hours 0 mins 1.000 secs"
+        "0 years 0 mons 1 days 0 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.5'",
-        "0 years 0 mons 0 days 0 hours 0 mins 0.500 secs"
+        "0 years 0 mons 0 days 0 hours 0 mins 0.500000000 secs"
     );
     test_expression!(
         "interval '0.5 day 1'",
-        "0 years 0 mons 0 days 12 hours 0 mins 1.000 secs"
+        "0 years 0 mons 0 days 12 hours 0 mins 1.000000000 secs"
     );
     test_expression!(
         "interval '0.49 day'",
-        "0 years 0 mons 0 days 11 hours 45 mins 36.000 secs"
+        "0 years 0 mons 0 days 11 hours 45 mins 36.000000000 secs"
     );
     test_expression!(
         "interval '0.499 day'",
-        "0 years 0 mons 0 days 11 hours 58 mins 33.600 secs"
+        "0 years 0 mons 0 days 11 hours 58 mins 33.600000000 secs"
     );
     test_expression!(
         "interval '0.4999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 51.360 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 51.360000000 secs"
     );
     test_expression!(
         "interval '0.49999 day'",
-        "0 years 0 mons 0 days 11 hours 59 mins 59.136 secs"
+        "0 years 0 mons 0 days 11 hours 59 mins 59.136000000 secs"
     );
     test_expression!(
         "interval '0.49999999999 day'",
         "0 years 0 mons 0 days 11 hours 59 mins 59.999999136 secs"
     );
     test_expression!(
         "interval '5 day'",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     // Hour is ignored, this matches PostgreSQL
     test_expression!(
         "interval '5 day' hour",
-        "0 years 0 mons 5 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 5 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 day 4 hours 3 minutes 2 seconds 100 milliseconds'",
-        "0 years 0 mons 5 days 4 hours 3 mins 2.100 secs"
+        "0 years 0 mons 5 days 4 hours 3 mins 2.100000000 secs"
     );
     // month intervals
     test_expression!(
         "interval '0.5 month'",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '0.5' month",
-        "0 years 0 mons 15 days 0 hours 0 mins 0.000 secs"
+        "0 years 0 mons 15 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1 month'",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '1' MONTH",
-        "0 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 1 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '5 month'",
-        "0 years 5 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 5 mons 0 days 0 hours 0 mins 0.000000000 secs"
     );
     test_expression!(
         "interval '13 month'",
-        "1 years 1 mons 0 days 0 hours 0 mins 0.00 secs"
+        "0 years 13 mons 0 days 0 hours 0 mins 0.000000000 secs"

Review Comment:
   > FWIW this was a conscious change when the parsing logic was ported, to faithfully preserve what the user specified. This was mainly in the context of rounding days to months and hours to days, which were simply incorrect, but months to years got the same treatment
   
   Make sense to me👍



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org