You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/27 14:45:01 UTC

[GitHub] [arrow] rok opened a new pull request, #13440: ARROW-14819: [R] Binding for lubridate::qday

rok opened a new pull request, #13440:
URL: https://github.com/apache/arrow/pull/13440

   This is to resolve [ARROW-14819](https://issues.apache.org/jira/browse/ARROW-14819).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r911404868


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -172,6 +172,15 @@ register_bindings_datetime_components <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
+  register_binding("qday", function(x) {
+    floored_x <- Expression$create("floor_temporal", x, options = list(unit = 9L, calendar_based_origin = FALSE))
+    days_between <- Expression$create("days_between", floored_x, x)
+    if (call_binding("is.Date", x)) {
+        return(Expression$create("add", days_between, Expression$scalar(1L)))
+    }
+    days_between

Review Comment:
   Having found the c++ timezone bug I switched back to `days_between` kernel and added a comment about what this is doing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r908346786


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -172,6 +172,15 @@ register_bindings_datetime_components <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
+  register_binding("qday", function(x) {
+    floored_x <- Expression$create("floor_temporal", x, options = list(unit = 9L, calendar_based_origin = FALSE))
+    days_between <- Expression$create("days_between", floored_x, x)
+    if (call_binding("is.Date", x)) {
+        return(Expression$create("add", days_between, Expression$scalar(1L)))
+    }
+    days_between

Review Comment:
   I thought it was about date counting as the end of the day but it turned out to be a timezone issue in disguise. Fixing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok merged pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok merged PR #13440:
URL: https://github.com/apache/arrow/pull/13440


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13440:
URL: https://github.com/apache/arrow/pull/13440#issuecomment-1175360706

   @thisisnic after the rebase this is now (almost) c++ free :).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r914302572


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -719,6 +734,22 @@ test_that("extract yday from date", {
   )
 })
 
+test_that("extract qday from date", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(date)) %>%
+      collect(),
+    test_df

Review Comment:
   Same as above for this (use a tibble where `date` is a sequence spanning a year to make sure that boundary days are handled).



##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -533,6 +533,21 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(datetime)) %>%
+      collect(),
+    test_df

Review Comment:
   Rather than `test_df`, it might make sense for this binding to use a sequence of dates (or datetimes) that spans a year (or maybe that spans a year and a leap year if that matters here). Maybe something like
   
   ``` r
   tibble::tibble(date = seq(as.Date("2000-01-01"), as.Date("2000-12-31"), by = "day"))
   #> # A tibble: 366 × 1
   #>    date      
   #>    <date>    
   #>  1 2000-01-01
   #>  2 2000-01-02
   #>  3 2000-01-03
   #>  4 2000-01-04
   #>  5 2000-01-05
   #>  6 2000-01-06
   #>  7 2000-01-07
   #>  8 2000-01-08
   #>  9 2000-01-09
   #> 10 2000-01-10
   #> # … with 356 more rows
   tibble::tibble(datetime = seq(as.POSIXct("2000-01-01 00:00:00"), as.POSIXct("2000-12-31 23:00:00"), by = "day"))
   #> # A tibble: 366 × 1
   #>    datetime           
   #>    <dttm>             
   #>  1 2000-01-01 00:00:00
   #>  2 2000-01-02 00:00:00
   #>  3 2000-01-03 00:00:00
   #>  4 2000-01-04 00:00:00
   #>  5 2000-01-05 00:00:00
   #>  6 2000-01-06 00:00:00
   #>  7 2000-01-07 00:00:00
   #>  8 2000-01-08 00:00:00
   #>  9 2000-01-09 00:00:00
   #> 10 2000-01-10 00:00:00
   #> # … with 356 more rows
   ```
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] jonkeane commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
jonkeane commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r923805105


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -209,6 +209,13 @@ register_bindings_datetime_components <- function() {
     build_expr("month", x)
   })
 
+  register_binding("lubridate::qday", function(x) {
+    # We calculate day of quarter by flooring timestamp to beginning of quarter and
+    # calculating days between beginning of quarter and timestamp/date in question.
+    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))
+    build_expr("days_between", floored_x, x) + Expression$scalar(1L)

Review Comment:
   This is extremely minor, and take it or leave it: but it did take me a second to think through why we are adding 1 here. Maybe we could add it to the comment up above?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r924717072


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -574,6 +574,26 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  test_df <- tibble::tibble(
+    time = as.POSIXct(seq(as.Date("1999-12-31"), as.Date("2001-01-01"), by = "day"))

Review Comment:
   That's a good point @paleolimbot! It's as if calling `qday` with `mutate` in `compare_dplyr_binding` returns an `int64` with `tzone = "UTC"`. Meanwhile calling it with `transmute` returns correctly.
   
   Would this be considered a sharp edge?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r914709427


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -533,6 +533,21 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(datetime)) %>%
+      collect(),
+    test_df

Review Comment:
   I love the idea of more complete tests! I would actually propose a development-time test suite (it seems overkill for CI) that tests every moment over the past century.
   
   The test you're proposing however hits this bug where rounding kernels interpret 32 bit arrays as 64 bit ones ([ARROW-16142](https://issues.apache.org/jira/browse/ARROW-16142)) so I suppose we really need to fix this now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r925046266


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -574,6 +574,26 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  test_df <- tibble::tibble(
+    time = as.POSIXct(seq(as.Date("1999-12-31"), as.Date("2001-01-01"), by = "day"))

Review Comment:
   https://issues.apache.org/jira/browse/ARROW-17132



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r908192224


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -172,6 +172,15 @@ register_bindings_datetime_components <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
+  register_binding("qday", function(x) {
+    floored_x <- Expression$create("floor_temporal", x, options = list(unit = 9L, calendar_based_origin = FALSE))
+    days_between <- Expression$create("days_between", floored_x, x)
+    if (call_binding("is.Date", x)) {
+        return(Expression$create("add", days_between, Expression$scalar(1L)))
+    }
+    days_between

Review Comment:
   Please can you add a brief comment on why the difference here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] thisisnic commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
thisisnic commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r909328054


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -172,6 +172,15 @@ register_bindings_datetime_components <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
+  register_binding("qday", function(x) {
+    x <- build_expr("floor_temporal", x, options = list(unit = 6L, calendar_based_origin = FALSE))
+    x <- build_expr("cast", x, options = cast_options(to_type = timestamp("s")))
+    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L, calendar_based_origin = FALSE))
+    duration_between <- build_expr("subtract", x, floored_x)
+    seconds_between <- build_expr("cast", duration_between, options = cast_options(to_type = int64()))
+    build_expr("floor", seconds_between / Expression$scalar(86400L) + Expression$scalar(1L))

Review Comment:
   Please can you add some comments here explaining why we need these steps?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r923971589


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -574,6 +574,26 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  test_df <- tibble::tibble(
+    time = as.POSIXct(seq(as.Date("1999-12-31"), as.Date("2001-01-01"), by = "day"))

Review Comment:
   Would `seq(as.POSIXct("1999-12-31", tz = "UTC"), as.POSIXct("2001-01-01", tz = "UTC"), by = "day")` let you drop the `ignore_attr` bit below? As a new reader of this code I'm wondering why the timezone needs to be ignored.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r914796111


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -533,6 +533,21 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(datetime)) %>%
+      collect(),
+    test_df

Review Comment:
   You could try that and see how long it takes...it might only be a few ms and then I'd say keep it in the regular test suite. We do have a mechanism for running extra tests but right now it's limited to the large memory tests (via the env var ARROW_LARGE_MEMORY_TESTS). Given that a single real-world poke at this exposed an error, I'd  say at least a year is a must in our normal test suite.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on PR #13440:
URL: https://github.com/apache/arrow/pull/13440#issuecomment-1168049310

   (I'm on vacation this week but look forward to taking a look on Monday!)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
kou commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r907904202


##########
r/NEWS.md:
##########
@@ -17,6 +17,12 @@
   under the License.
 -->
 
+# arrow 9.0.0
+
+## Enhancements to date and time support
+
+* added `lubridate::qday()` (day of quarter)

Review Comment:
   Could you  put this to `# arrow 8.0.0.9000` instead of creating `# arrow 9.0.0`?
   `# arrow 8.0.0.9000` will be renamed to `# arrow 9.0.0` in our release process.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r924790914


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -574,6 +574,26 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  test_df <- tibble::tibble(
+    time = as.POSIXct(seq(as.Date("1999-12-31"), as.Date("2001-01-01"), by = "day"))

Review Comment:
   I believe that's an error in restoring the R metadata (this PR doesn't touch R attributes to my reading). It would be helpful to file a `reprex()` in a JIRA so that we can fix this later (an int64 should never have a timezone attribute!), but it's definitely not this PR's problem.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r924860370


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -574,6 +574,26 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  test_df <- tibble::tibble(
+    time = as.POSIXct(seq(as.Date("1999-12-31"), as.Date("2001-01-01"), by = "day"))

Review Comment:
   In that case I'll open a Jira and merge this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r909101940


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -172,6 +172,15 @@ register_bindings_datetime_components <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
+  register_binding("qday", function(x) {
+    floored_x <- Expression$create("floor_temporal", x, options = list(unit = 9L, calendar_based_origin = FALSE))
+    days_between <- Expression$create("days_between", floored_x, x)
+    if (call_binding("is.Date", x)) {
+        return(Expression$create("add", days_between, Expression$scalar(1L)))
+    }
+    days_between

Review Comment:
   It turns out that ["days_between" acero kernel](https://arrow.apache.org/docs/python/generated/pyarrow.compute.days_between.html) is doing something different than [qday is](https://github.com/tidyverse/lubridate/blob/c4714046ec412e669906ab8d1cd9a8e4fb125e13/R/accessors-day.r#L114-L117). This has way more moving pieces now.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r917955333


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -533,6 +533,21 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(datetime)) %>%
+      collect(),
+    test_df

Review Comment:
   Ok, [ARROW-16142](https://issues.apache.org/jira/browse/ARROW-16142) was resolved so this is now ready.



##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -719,6 +734,22 @@ test_that("extract yday from date", {
   )
 })
 
+test_that("extract qday from date", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(date)) %>%
+      collect(),
+    test_df

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r914810441


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -533,6 +533,21 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(datetime)) %>%
+      collect(),
+    test_df

Review Comment:
   Yeah, why not. I'll give it a try.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13440:
URL: https://github.com/apache/arrow/pull/13440#issuecomment-1167461519

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dragosmg commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
dragosmg commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r908643919


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -719,6 +728,15 @@ test_that("extract yday from date", {
   )
 })
 
+test_that("extract qday from date", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(date)) %>%
+      collect(),
+    test_df
+  )
+})
+

Review Comment:
   Looks great so far. Could you add tests with a regular R object? Something like:
   ```r
   compare_dplyr_binding(
   .input %>%
     mutate(y = qday(as.Date("2022-06-28")) %>%
     collect(),
   test_df
   )
   ```
   
   I think this would fail, but there's an easy solution: replace the `Expression$create()` call with a call to `build_expr()` (which automatically converts an R object into a scalar).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dragosmg commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
dragosmg commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r909521107


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -533,6 +533,15 @@ test_that("extract yday from timestamp", {
   )
 })
 
+test_that("extract qday from timestamp", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(datetime)) %>%
+      collect(),
+    test_df
+  )
+})

Review Comment:
   ```suggestion
     compare_dplyr_binding(
        .input %>%
          mutate(x = qday(as.POSIXct("2022-06-29 12:35))) %>%
          collect(),
        test_df
      )
   })
   ```



##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -719,6 +728,22 @@ test_that("extract yday from date", {
   )
 })
 
+test_that("extract qday from date", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(date)) %>%
+      collect(),
+    test_df
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+       mutate(y = qday(as.Date(date))) %>%

Review Comment:
   ```suggestion
          mutate(y = qday(as.Date("2022-06-29"))) %>%
   ```



##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -719,6 +728,15 @@ test_that("extract yday from date", {
   )
 })
 
+test_that("extract qday from date", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(date)) %>%
+      collect(),
+    test_df
+  )
+})
+

Review Comment:
   I added 2 suggestions. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r907692924


##########
r/src/compute.cpp:
##########
@@ -519,6 +519,35 @@ std::shared_ptr<arrow::compute::FunctionOptions> make_compute_options(
     return out;
   }
 
+  if (func_name == "round_temporal" || func_name == "floor_temporal" ||
+      func_name == "ceil_temporal") {
+    using Options = arrow::compute::RoundTemporalOptions;
+
+    int64_t multiple = 1;
+    enum arrow::compute::CalendarUnit unit = arrow::compute::CalendarUnit::DAY;
+    bool week_starts_monday = true;
+    bool ceil_is_strictly_greater = true;
+    bool calendar_based_origin = true;
+
+    if (!Rf_isNull(options["multiple"])) {
+      multiple = cpp11::as_cpp<int64_t>(options["multiple"]);
+    }
+    if (!Rf_isNull(options["unit"])) {
+      unit = cpp11::as_cpp<enum arrow::compute::CalendarUnit>(options["unit"]);
+    }
+    if (!Rf_isNull(options["week_starts_monday"])) {
+      week_starts_monday = cpp11::as_cpp<bool>(options["week_starts_monday"]);
+    }
+    if (!Rf_isNull(options["ceil_is_strictly_greater"])) {
+      ceil_is_strictly_greater = cpp11::as_cpp<bool>(options["ceil_is_strictly_greater"]);
+    }
+    if (!Rf_isNull(options["calendar_based_origin"])) {
+      calendar_based_origin = cpp11::as_cpp<bool>(options["calendar_based_origin"]);
+    }
+    return std::make_shared<Options>(multiple, unit, week_starts_monday,
+                                     ceil_is_strictly_greater, calendar_based_origin);
+  }
+

Review Comment:
   This is lifted from #12154.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r908108387


##########
r/NEWS.md:
##########
@@ -17,6 +17,12 @@
   under the License.
 -->
 
+# arrow 9.0.0
+
+## Enhancements to date and time support
+
+* added `lubridate::qday()` (day of quarter)

Review Comment:
   I wasn't aware, thanks and done! :)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #13440:
URL: https://github.com/apache/arrow/pull/13440#issuecomment-1191153340

   Benchmark runs are scheduled for baseline = 39980dcdbfbc74ee1bdd345591a728dbc6e21dfe and contender = 0330353a93eff799616bf66e0e994236393458f7. 0330353a93eff799616bf66e0e994236393458f7 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/14cbdfdd35df4bf9add736aac64d3853...f6291a15518e4d0f9ab8f5ae8a815af0/)
   [Failed :arrow_down:0.1% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/dc123a514b864e75843f994f363cd6c0...5efba42132f54cf2975fda287c4edbfc/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/3ffaea3794e14180a7779a399c270fd7...d86d24695178400cb10f9229238a1303/)
   [Finished :arrow_down:0.36% :arrow_up:0.04%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/19c734896f4f4477849aedc5391b8375...fcc23fcd8b7b4afa8f6f8e781e85931e/)
   Buildkite builds:
   [Failed] [`0330353a` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1155)
   [Failed] [`0330353a` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1160)
   [Failed] [`0330353a` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1149)
   [Finished] [`0330353a` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1166)
   [Failed] [`39980dcd` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1154)
   [Failed] [`39980dcd` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1159)
   [Failed] [`39980dcd` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1148)
   [Finished] [`39980dcd` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1165)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r909094312


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -719,6 +728,15 @@ test_that("extract yday from date", {
   )
 })
 
+test_that("extract qday from date", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(date)) %>%
+      collect(),
+    test_df
+  )
+})
+

Review Comment:
   Done. Can you please check if it's what you had in mind?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r909549309


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -172,6 +172,15 @@ register_bindings_datetime_components <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
+  register_binding("qday", function(x) {
+    x <- build_expr("floor_temporal", x, options = list(unit = 6L, calendar_based_origin = FALSE))
+    x <- build_expr("cast", x, options = cast_options(to_type = timestamp("s")))
+    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L, calendar_based_origin = FALSE))
+    duration_between <- build_expr("subtract", x, floored_x)
+    seconds_between <- build_expr("cast", duration_between, options = cast_options(to_type = int64()))
+    build_expr("floor", seconds_between / Expression$scalar(86400L) + Expression$scalar(1L))

Review Comment:
   I'm starting to suspect `days_between` will work once [ARROW-16932](https://issues.apache.org/jira/browse/ARROW-16932) is in. Will test.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r923872217


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -209,6 +209,13 @@ register_bindings_datetime_components <- function() {
     build_expr("month", x)
   })
 
+  register_binding("lubridate::qday", function(x) {
+    # We calculate day of quarter by flooring timestamp to beginning of quarter and
+    # calculating days between beginning of quarter and timestamp/date in question.
+    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))
+    build_expr("days_between", floored_x, x) + Expression$scalar(1L)

Review Comment:
   ```suggestion
       # We calculate day of quarter by flooring timestamp to beginning of quarter and
       # calculating days between beginning of quarter and timestamp/date in question.
       # Since we use one one-based numbering we add one.
       floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))
       build_expr("days_between", floored_x, x) + Expression$scalar(1L)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r923873308


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -209,6 +209,13 @@ register_bindings_datetime_components <- function() {
     build_expr("month", x)
   })
 
+  register_binding("lubridate::qday", function(x) {
+    # We calculate day of quarter by flooring timestamp to beginning of quarter and
+    # calculating days between beginning of quarter and timestamp/date in question.
+    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L))
+    build_expr("days_between", floored_x, x) + Expression$scalar(1L)

Review Comment:
   @jonkeane that's a fair question especially since documentation on `qday` is not really available.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] dragosmg commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
dragosmg commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r908643919


##########
r/tests/testthat/test-dplyr-funcs-datetime.R:
##########
@@ -719,6 +728,15 @@ test_that("extract yday from date", {
   )
 })
 
+test_that("extract qday from date", {
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = qday(date)) %>%
+      collect(),
+    test_df
+  )
+})
+

Review Comment:
   Looks great so far. Could you add a test with a regular R object? Something like:
   ```r
   compare_dplyr_binding(
   .input %>%
     mutate(y = qday(as.Date("2022-06-28")) %>%
     collect(),
   test_df
   )
   ```
   
   I think this would fail, but there's an easy solution: replace the `Expression$create()` call with a `build_expr()` one (which automatically converts an R object into a scalar).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #13440:
URL: https://github.com/apache/arrow/pull/13440#issuecomment-1167461470

   https://issues.apache.org/jira/browse/ARROW-14819


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on a diff in pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on code in PR #13440:
URL: https://github.com/apache/arrow/pull/13440#discussion_r909344760


##########
r/R/dplyr-funcs-datetime.R:
##########
@@ -172,6 +172,15 @@ register_bindings_datetime_components <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
+  register_binding("qday", function(x) {
+    x <- build_expr("floor_temporal", x, options = list(unit = 6L, calendar_based_origin = FALSE))
+    x <- build_expr("cast", x, options = cast_options(to_type = timestamp("s")))
+    floored_x <- build_expr("floor_temporal", x, options = list(unit = 9L, calendar_based_origin = FALSE))
+    duration_between <- build_expr("subtract", x, floored_x)
+    seconds_between <- build_expr("cast", duration_between, options = cast_options(to_type = int64()))
+    build_expr("floor", seconds_between / Expression$scalar(86400L) + Expression$scalar(1L))

Review Comment:
   Will do. I've not yet given up on optimising it some too :).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] rok commented on pull request #13440: ARROW-14819: [R] Binding for lubridate::qday

Posted by GitBox <gi...@apache.org>.
rok commented on PR #13440:
URL: https://github.com/apache/arrow/pull/13440#issuecomment-1169902628

   > @dragosmg 
   > I added 2 suggestions.
   
   Merged, thank you!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org