You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "stuartcarnie (via GitHub)" <gi...@apache.org> on 2023/03/23 07:57:50 UTC

[GitHub] [arrow-datafusion] stuartcarnie opened a new pull request, #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

stuartcarnie opened a new pull request, #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698

   # Which issue does this PR close?
   
   Closes #5697
   
   # Rationale for this change
   
   Teach `date_bin` to support for microsecond and nanosecond precision of intervals via the `MonthDayNano` type. Note that months are complicated to support and will require additional work. In doing so, we can also address #5689.
   
   # What changes are included in this PR?
   
   `date_bin` is now capable of parsing fractional intervals with a precision of microseconds and greater. The interval parser is also capable of interpreting the additional units `microsecond`, `microseconds`, `nanosecond` and `nanoseconds`, which should be additive.
   
   # Are these changes tested?
   
   Yes, unit tests for parsing intervals, validating arguments to `date_bin` and new SQL tests in the `timestamps.slt` file.
   
   # Are there any user-facing changes?
   
   Yes, users can now use additional units for intervals, such as:
   
   ```sql
   select date_bin('500 microseconds', ...)
   ```
   
   and the `date_bin` function is capable of binning microsecond and nanosecond precision intervals.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] stuartcarnie commented on a diff in pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie commented on code in PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698#discussion_r1146768915


##########
datafusion/physical-expr/src/datetime_expressions.rs:
##########
@@ -354,6 +354,24 @@ fn date_bin_impl(
                 }
             }
         }
+        ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(Some(v))) => {
+            let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(*v);
+            if months != 0 {
+                return Err(DataFusionError::NotImplemented(
+                    "DATE_BIN stride does not support month intervals".to_string(),

Review Comment:
   Correct.
   
   We'll need to round to the first of the month, and those nanosecond intervals are not regular. We'll probably be best served using the `chrono` package to handle month and year rounding.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Weijun-H commented on a diff in pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698#discussion_r1145888642


##########
datafusion/common/src/parsers.rs:
##########
@@ -77,19 +77,22 @@ impl CompressionTypeVariant {
     }
 }
 
+#[rustfmt::skip]
 #[derive(Clone, Copy)]
 #[repr(u16)]
 enum IntervalType {
-    Century = 0b_00_0000_0001,
-    Decade = 0b_00_0000_0010,
-    Year = 0b_00_0000_0100,
-    Month = 0b_00_0000_1000,
-    Week = 0b_00_0001_0000,
-    Day = 0b_00_0010_0000,
-    Hour = 0b_00_0100_0000,
-    Minute = 0b_00_1000_0000,
-    Second = 0b_01_0000_0000,
-    Millisecond = 0b_10_0000_0000,
+    Century     = 0b_0000_0000_0001,

Review Comment:
   Nice Format 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Weijun-H commented on a diff in pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "Weijun-H (via GitHub)" <gi...@apache.org>.
Weijun-H commented on code in PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698#discussion_r1145889336


##########
datafusion/common/src/parsers.rs:
##########
@@ -77,19 +77,22 @@ impl CompressionTypeVariant {
     }
 }
 
+#[rustfmt::skip]
 #[derive(Clone, Copy)]
 #[repr(u16)]
 enum IntervalType {
-    Century = 0b_00_0000_0001,
-    Decade = 0b_00_0000_0010,
-    Year = 0b_00_0000_0100,
-    Month = 0b_00_0000_1000,
-    Week = 0b_00_0001_0000,
-    Day = 0b_00_0010_0000,
-    Hour = 0b_00_0100_0000,
-    Minute = 0b_00_1000_0000,
-    Second = 0b_01_0000_0000,
-    Millisecond = 0b_10_0000_0000,
+    Century     = 0b_0000_0000_0001,
+    Decade      = 0b_0000_0000_0010,
+    Year        = 0b_0000_0000_0100,
+    Month       = 0b_0000_0000_1000,
+    Week        = 0b_0000_0001_0000,
+    Day         = 0b_0000_0010_0000,
+    Hour        = 0b_0000_0100_0000,
+    Minute      = 0b_0000_1000_0000,
+    Second      = 0b_0001_0000_0000,
+    Millisecond = 0b_0010_0000_0000,
+    Microsecond = 0b_0100_0000_0000,
+    Nanosecond  = 0b_1000_0000_0000,

Review Comment:
   Nice Format 👍 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698#issuecomment-1481834334

   Thanks @stuartcarnie  -- I actually already ported over the code in https://github.com/apache/arrow-rs/pull/3916 (because I figured it would be just as fast as writing up a ticket)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb merged pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb merged PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on a diff in pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on code in PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698#discussion_r1146263257


##########
datafusion/physical-expr/src/datetime_expressions.rs:
##########
@@ -354,6 +354,24 @@ fn date_bin_impl(
                 }
             }
         }
+        ColumnarValue::Scalar(ScalarValue::IntervalMonthDayNano(Some(v))) => {
+            let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(*v);
+            if months != 0 {
+                return Err(DataFusionError::NotImplemented(
+                    "DATE_BIN stride does not support month intervals".to_string(),

Review Comment:
   👍  this is because months are not a fixed number of nanoseconds, right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb commented on PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698#issuecomment-1481302577

   Filed https://github.com/apache/arrow-rs/pull/3916 in arrow-rs


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] stuartcarnie commented on pull request #5698: feat: `date_bin` supports MonthDayNano, microsecond and nanosecond units

Posted by "stuartcarnie (via GitHub)" <gi...@apache.org>.
stuartcarnie commented on PR #5698:
URL: https://github.com/apache/arrow-datafusion/pull/5698#issuecomment-1481824410

   @alamb thank you for the review. I can help port that code over to `arrow-rs` if I have some spare cycles – I see that will help us with implementing `::interval` casting, which is 💯 .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org