You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/02/04 16:22:08 UTC

[GitHub] [arrow] jonkeane commented on a change in pull request #12319: ARROW-14199 [R] bindings for format (where possible)

jonkeane commented on a change in pull request #12319:
URL: https://github.com/apache/arrow/pull/12319#discussion_r798743253



##########
File path: r/R/dplyr-funcs-type.R
##########
@@ -248,3 +249,33 @@ register_bindings_type_elementwise <- function() {
     is_inf & !call_binding("is.na", is_inf)
   })
 }
+
+register_bindings_type_format <- function() {
+  register_binding("format", function(x, ...) {
+    if (inherits(x, "Expression") &&
+        x$type_id() %in% Type[c("TIMESTAMP", "DATE32", "DATE64")]) {
+      binding_format_datetime(x, ...)
+    } else {
+      abort(paste0("`format()` not yet supported for `", class(x$type())[[1]], "`"))
+    }
+  })
+}
+
+binding_format_datetime <- function(x, format = "", tz = "", usetz = FALSE) {
+
+  if (usetz) {
+    format <- paste(format, "%Z")
+  }
+
+  if (call_binding("is.POSIXct", x)) {
+    if (tz == "" && x$type()$timezone() != "") {
+      tz <- x$type()$timezone()
+    } else if (tz == "") {
+      tz <- Sys.timezone()
+    }
+    ts <- Expression$create("cast", x, options = list(to_type = timestamp(x$type()$unit(), tz)))

Review comment:
       Especially with #12240 I don't think we'll need this cast part here. Even without that PR, I'm not sure there's a huge amount of benefit of assuming (and then converting) the timezones like R does it (and that's actually counter to what Arrow is supposed to do with timestamps that don't have timezones)

##########
File path: r/tests/testthat/test-dplyr-funcs-type.R
##########
@@ -768,3 +768,84 @@ test_that("nested structs can be created from scalars and existing data frames",
     tibble(a = 1:2)
   )
 })
+
+test_that("format date/time", {
+  skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+  times <- tibble(
+    datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"), NA),
+    date = c(as.Date("2021-01-01"), NA)
+  )
+  formats <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %z %Z %j %U %W %x %X %% %G %V %u"
+  formats_date <- "%a %A %w %d %b %B %m %y %Y %H %I %p %M %j %U %W %x %X %% %G %V %u"
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(datetime, format = formats)) %>%
+      collect(),
+    times
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(date, format = formats_date)) %>%
+      collect(),
+    times
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(datetime, format = formats, tz = "Pacific/Marquesas")) %>%
+      collect(),
+    times
+  )
+
+  compare_dplyr_binding(
+    .input %>%
+      mutate(x = format(datetime, format = formats, tz = "EST", usetz = TRUE)) %>%

Review comment:
       Like here, if we wanted to assert that formating with a different timezone gives us the right thing, that would be a good place to do something other than `"Pacific/Marquesas"`

##########
File path: r/tests/testthat/test-dplyr-funcs-type.R
##########
@@ -768,3 +768,84 @@ test_that("nested structs can be created from scalars and existing data frames",
     tibble(a = 1:2)
   )
 })
+
+test_that("format date/time", {
+  skip_on_os("windows") # https://issues.apache.org/jira/browse/ARROW-13168
+
+  times <- tibble(
+    datetime = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "Etc/GMT+6"), NA),

Review comment:
       Is there something particular about `"Etc/GMT+6"`? We use `"Pacific/Marquesas"` elsewhere to be 1. a timezone most people don't already have configured and 2. a timezone that is different from UTC and 3. a timezone that isn't even on the hour all of which to attempt to catch places where we assume any of those three. _Generally_ if we're picking a random timezone we should favor `"Pacific/Marquesas"` unless there are specific reasons to do something else.
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org