You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/04 22:35:07 UTC

[GitHub] [arrow] jonkeane commented on a change in pull request #12482: ARROW-15701 [R] month() should allow integer inputs

jonkeane commented on a change in pull request #12482:
URL: https://github.com/apache/arrow/pull/12482#discussion_r819959724



##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -105,7 +105,22 @@ register_bindings_datetime <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
-  register_binding("month", function(x, label = FALSE, abbr = TRUE, locale = Sys.getlocale("LC_TIME")) {
+  register_binding("month", function(x,
+                                     label = FALSE,
+                                     abbr = TRUE,
+                                     locale = Sys.getlocale("LC_TIME")) {
+
+    if (call_binding("is.integer", x)) {
+      if (is.integer(x)) {
+        x <- build_expr("cast", x, options = cast_options(to_type = int32()))
+      }

Review comment:
       This bit looks repetitive (these should already be integers — and this cast would then attempt to cast them to integers)? Are you trying to ensure that things aren't int64()? Or something else? 

##########
File path: r/tests/testthat/test-dplyr-funcs-datetime.R
##########
@@ -820,6 +816,79 @@ test_that("dst extracts daylight savings time correctly", {
   )
 })
 
+test_that("month() supports integer input", {
+    test_df_month <- tibble(
+      month_as_int = c(1:12, NA)
+    )
+
+    compare_dplyr_binding(
+      .input %>%
+        mutate(month_int_input = month(month_as_int)) %>%
+        collect(),
+      test_df_month
+    )
+
+    skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+    compare_dplyr_binding(
+      .input %>%
+        # R returns ordered factor whereas Arrow returns character
+        mutate(
+          month_int_input = as.character(month(month_as_int, label = TRUE))
+        ) %>%
+        collect(),
+      test_df_month
+    )
+
+    compare_dplyr_binding(
+      .input %>%
+        # R returns ordered factor whereas Arrow returns character
+        mutate(
+          month_int_input = as.character(
+            month(month_as_int, label = TRUE, abbr = FALSE)
+          )) %>%

Review comment:
       ```suggestion
             )
           ) %>%
   ```
   
   Minor style nit

##########
File path: r/R/dplyr-funcs-datetime.R
##########
@@ -105,7 +105,22 @@ register_bindings_datetime <- function() {
     (call_binding("yday", x) - 1) %/% 7 + 1
   })
 
-  register_binding("month", function(x, label = FALSE, abbr = TRUE, locale = Sys.getlocale("LC_TIME")) {
+  register_binding("month", function(x,
+                                     label = FALSE,
+                                     abbr = TRUE,
+                                     locale = Sys.getlocale("LC_TIME")) {
+
+    if (call_binding("is.integer", x)) {
+      if (is.integer(x)) {
+        x <- build_expr("cast", x, options = cast_options(to_type = int32()))
+      }
+      x <- call_binding("if_else",
+                        call_binding("between", x, 1, 12),
+                        x,
+                        NA_integer_)
+      x <- build_expr("cast", x * 28L, options = cast_options(to_type = date32()))

Review comment:
       We could actually short-circuit here and return the expression with simply `if_else` when `label = FALSE`, since that will return an integer itself, yeah? 
   
   I agree that this is a little bit unorthodox, but I _think_ it works. An alternative would be to put this value into a date string with a call to the `paste` binding and then doing `strptime` on that. Both aren't super ideal — but I suspect that this integer multiplication is going to be more performant (though if we are basing a decision on that, we probably should measure it!). 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org