You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/07/05 15:05:57 UTC

[GitHub] [arrow] thisisnic commented on a change in pull request #10624: ARROW-12992: [R] bindings for substr(), substring(), str_sub()

thisisnic commented on a change in pull request #10624:
URL: https://github.com/apache/arrow/pull/10624#discussion_r662899545



##########
File path: r/R/expression.R
##########
@@ -32,6 +32,8 @@
   "str_to_upper" = "utf8_upper",
   "str_reverse" = "utf8_reverse",
   # str_trim is defined in dplyr-functions.R
+  # str_sub is defined in dplyr-functions.R
+  # substr is defined in dplyr-functions.R
   "year" = "year",

Review comment:
       Should we mention `substring` here as well?

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -866,3 +866,57 @@ test_that("str_like", {
     df
   )
 })
+
+test_that("substrings", {
+  df <- tibble(
+    x = "Apache Arrow"
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        foo = "Apache Arrow",
+        bar1 = substr(foo, 1, 6), # Apache
+        bar2 = substr(foo, 0, 6), # Apache
+        bar3 = substr(foo, -1, 6), # Apache
+        bar4 = substr(foo, 6, 1), # ""
+        bar5 = substr(foo, -1, -2), # ""
+        bar6 = substr(foo, 8, 12) # Arrow

Review comment:
       No need to set up string "Apache Arrow" via variable `foo` as it's already been set up as variable `x`.
   ```suggestion
           bar1 = substr(x, 1, 6), # Apache
           bar2 = substr(x, 0, 6), # Apache
           bar3 = substr(x, -1, 6), # Apache
           bar4 = substr(x, 6, 1), # ""
           bar5 = substr(x, -1, -2), # ""
           bar6 = substr(x, 8, 12) # Arrow
   ```

##########
File path: r/R/dplyr-functions.R
##########
@@ -280,6 +280,45 @@ nse_funcs$str_trim <- function(string, side = c("both", "left", "right")) {
   Expression$create(trim_fun, string)
 }
 
+nse_funcs$substr <- function(string, start, stop) {
+  assert_that(
+    length(start) == 1,
+    msg = "Start of length != 1 not supported in Arrow"
+  )
+  assert_that(
+    length(end) == 1,
+    msg = "End of length != 1 not supported in Arrow"
+  )
+
+  if (start > stop) {
+    start <- 0
+    stop <- 0
+  } else {
+    start <- max(0, start - 1)
+    stop <- max(0, stop)
+    start_stop <- c(min(start, stop), max(start, stop))
+    start <- start_stop[1]
+    stop <- start_stop[2]
+  }

Review comment:
       I'm a little confused here - please can you give me an example of values that could be supplied as `start` and `stop` which would make it necessary to use `min()` and `max()` here to put the values back in the correct order? 

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -866,3 +866,57 @@ test_that("str_like", {
     df
   )
 })
+
+test_that("substrings", {
+  df <- tibble(
+    x = "Apache Arrow"
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        foo = "Apache Arrow",
+        bar1 = substr(foo, 1, 6), # Apache

Review comment:
       Please can you split these out into separate tests?  It makes it easier to follow and fix issues.

##########
File path: r/tests/testthat/test-dplyr-string-functions.R
##########
@@ -866,3 +866,57 @@ test_that("str_like", {
     df
   )
 })
+
+test_that("substrings", {
+  df <- tibble(
+    x = "Apache Arrow"
+  )
+
+  expect_dplyr_equal(
+    input %>%
+      mutate(
+        foo = "Apache Arrow",
+        bar1 = substr(foo, 1, 6), # Apache
+        bar2 = substr(foo, 0, 6), # Apache
+        bar3 = substr(foo, -1, 6), # Apache
+        bar4 = substr(foo, 6, 1), # ""
+        bar5 = substr(foo, -1, -2), # ""
+        bar6 = substr(foo, 8, 12) # Arrow
+      ) %>%
+      select(bar1:bar6) %>%

Review comment:
       ```suggestion
   
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org