You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/16 22:44:56 UTC

[GitHub] [arrow] nealrichardson commented on a change in pull request #10334: ARROW-12198: [R] bindings for strptime

nealrichardson commented on a change in pull request #10334:
URL: https://github.com/apache/arrow/pull/10334#discussion_r633148838



##########
File path: r/R/compute.R
##########
@@ -286,3 +286,8 @@ cast_options <- function(safe = TRUE, ...) {
   )
   modifyList(opts, list(...))
 }
+
+strptime_arrow <- function(..., format, unit){
+  a <- collect_arrays_from_dots(list(...))

Review comment:
       I don't think you want `collect_arrays_from_dots` here. This function exists to support the base R behavior like:
   
   ```
   > sum(1, 2)
   [1] 3
   ```
   
   But `strptime` doesn't take `...` like that.

##########
File path: r/R/compute.R
##########
@@ -286,3 +286,8 @@ cast_options <- function(safe = TRUE, ...) {
   )
   modifyList(opts, list(...))
 }
+
+strptime_arrow <- function(..., format, unit){

Review comment:
       I'm not sure how useful this function is since it is a thin wrapper around `call_function()` and we can't set it as an S3 method.. More useful would be to add a version of this in the `nse_funcs` in dplyr-functions.R.
   
   In either case, we should match the `base::strptime()` signature: `function (x, format, tz = "")` with the possible addition of `unit` if that's an Arrow feature. 
   
   Also, should `format` and `unit` have default arguments?

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -291,6 +291,17 @@ test_that("Timezone handling in Arrow roundtrip (ARROW-3543)", {
   expect_identical(read_feather(feather_file), df)
 })
 
+test_that("strptime", {
+  # array of strings
+  time_strings <- Array$create(c("2018-10-07 19:04:05", NA))
+  # array of timestamps (doesn't work if tz="" is added!)
+  timestamps <- Array$create(c(as.POSIXct("2018-10-07 19:04:05"), NA))
+  # array of parsed timestamps
+  parsed_timestamps <- strptime_arrow(time_strings, format = "%Y-%m-%d %H:%M:%S", unit = TimeUnit$MICRO)

Review comment:
       `unit` should also take human-friendly strings ("s", "ms", etc.); see how this is done in the `timestamp()` function in type.R.

##########
File path: r/tests/testthat/test-Array.R
##########
@@ -291,6 +291,17 @@ test_that("Timezone handling in Arrow roundtrip (ARROW-3543)", {
   expect_identical(read_feather(feather_file), df)
 })
 
+test_that("strptime", {
+  # array of strings
+  time_strings <- Array$create(c("2018-10-07 19:04:05", NA))
+  # array of timestamps (doesn't work if tz="" is added!)

Review comment:
       Why not? 

##########
File path: r/src/compute.cpp
##########
@@ -233,6 +233,12 @@ std::shared_ptr<arrow::compute::FunctionOptions> make_compute_options(
                                      max_replacements);
   }
 
+  if (func_name == "strptime") {
+    using Options = arrow::compute::StrptimeOptions;

Review comment:
       Does `StrptimeOptions` have a `Defaults()` method in C++? If so, we should call it.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org