You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/27 17:33:49 UTC

[GitHub] [arrow] jonkeane commented on a change in pull request #10334: ARROW-12198: [R] bindings for strptime

jonkeane commented on a change in pull request #10334:
URL: https://github.com/apache/arrow/pull/10334#discussion_r640790181



##########
File path: r/R/dplyr-functions.R
##########
@@ -338,3 +338,24 @@ get_stringr_pattern_options <- function(pattern) {
 contains_regex <- function(string) {
   grepl("[.\\|()[{^$*+?]", string)
 }
+
+nse_funcs$strptime <- function(x, format = "%Y-%m-%d %H:%M:%S", tz = NULL, unit = 1L) {

Review comment:
       Should this default be the more readable "s"? 

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -493,3 +493,81 @@ test_that("edge cases in string detection and replacement", {
     tibble(x = c("ABC"))
   )
 })
+
+test_that("strptime", {
+
+  t_string <- tibble(x = c("2018-10-07 19:04:05", NA))
+  t_stamp <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05"), NA))
+  t_stampPDT <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "PDT"), NA))
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x)
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S", unit = TimeUnit$NANO)

Review comment:
       Would it be possible to change this to:
   
   ```suggestion
           x = strptime(x, format = "%Y-%m-%d %H:%M:%S", unit = "ns")
   ```
   
   So it's a little closer to what we expect a person would use in practice?

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -493,3 +493,81 @@ test_that("edge cases in string detection and replacement", {
     tibble(x = c("ABC"))
   )
 })
+
+test_that("strptime", {
+
+  t_string <- tibble(x = c("2018-10-07 19:04:05", NA))
+  t_stamp <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05"), NA))
+  t_stampPDT <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "PDT"), NA))

Review comment:
       It doesn't look like this is used later, though I could be missing something. If not, could you remove it?

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -493,3 +493,81 @@ test_that("edge cases in string detection and replacement", {
     tibble(x = c("ABC"))
   )
 })
+
+test_that("strptime", {
+
+  t_string <- tibble(x = c("2018-10-07 19:04:05", NA))
+  t_stamp <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05"), NA))
+  t_stampPDT <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "PDT"), NA))
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x)
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(

Review comment:
       Do you know if these would be equal? I'm not super familiar with how this precision is measured/handled in lubridate/R.

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -493,3 +493,81 @@ test_that("edge cases in string detection and replacement", {
     tibble(x = c("ABC"))
   )
 })
+
+test_that("strptime", {
+
+  t_string <- tibble(x = c("2018-10-07 19:04:05", NA))
+  t_stamp <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05"), NA))
+  t_stampPDT <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "PDT"), NA))
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x)
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S", unit = TimeUnit$NANO)
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S", unit = "s")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S", tz="PDT")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )

Review comment:
       I'm curious what you're testing with this test. Could you explain a little bit more about the case that it's testing?

##########
File path: r/R/dplyr-functions.R
##########
@@ -338,3 +338,24 @@ get_stringr_pattern_options <- function(pattern) {
 contains_regex <- function(string) {
   grepl("[.\\|()[{^$*+?]", string)
 }
+
+nse_funcs$strptime <- function(x, format = "%Y-%m-%d %H:%M:%S", tz = NULL, unit = 1L) {
+  # Arrow uses unit for time parsing, strptime() does not.
+  # Arrow has no default option for strptime (format, unit),
+  # we suggest following format = "%Y-%m-%d %H:%M:%S", unit = MILLI/1L/"ms",
+  # (ARROW-12809)
+
+  # ParseTimestampStrptime currently ignores the timezone information (ARROW-12820).
+  # Stop if tz is provided.
+  if (is.character(tz)) {
+    arrow_not_supported("Time zone argument")
+  }
+
+  t_unit <- make_valid_time_unit(unit,c("s" = TimeUnit$SECOND, "ms" = TimeUnit$MILLI, "us" = TimeUnit$MICRO, "ns" = TimeUnit$NANO))

Review comment:
       To match a little more closely `timestamp()`:
   
   ```suggestion
     unit <- make_valid_time_unit(unit, c(valid_time64_units, valid_time32_units))
   ```
   
   And then change `t_unit` to `unit` below (unless you have a need to keep both around?)

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -493,3 +493,81 @@ test_that("edge cases in string detection and replacement", {
     tibble(x = c("ABC"))
   )
 })
+
+test_that("strptime", {
+
+  t_string <- tibble(x = c("2018-10-07 19:04:05", NA))
+  t_stamp <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05"), NA))
+  t_stampPDT <- tibble(x = c(lubridate::ymd_hms("2018-10-07 19:04:05", tz = "PDT"), NA))
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x)
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S", unit = TimeUnit$NANO)
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S", unit = "s")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  expect_equivalent(
+    t_string %>%
+      Table$create() %>%
+      mutate(
+        x = strptime(x, format = "%Y-%m-%d %H:%M:%S", tz="PDT")
+      ) %>%
+      collect(),
+    t_stamp,
+    check.tzone = FALSE
+  )
+
+  tstring <- tibble(x = c("08-05-2008", NA))
+  tstamp <- tibble(x = c(lubridate::mdy("08/05/2008"), NA))
+  tstamp[[1]] <- as.POSIXct(tstamp[[1]])

Review comment:
       I wonder if it would be clearer to do something like `strptime("08-05-2008", format = "%m-%d-%Y")` to generate the expectation here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org