You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/24 23:21:26 UTC

[GitHub] [arrow] ateucher opened a new pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

ateucher opened a new pull request #12711:
URL: https://github.com/apache/arrow/pull/12711


   This ensures that the `grepl` binding mimics R's base `grepl` by returning `FALSE` for `NA` inputs (previously it returned `NA`). As several other bindings called the `grepl` binding and we don't want the `grepl` behaviour with `NA` to propagate to those bindings (they all return `NA` with `NA` inputs), I had to change how they were constructed as well. I abstracted out the main parts of the `register_binding` for the string matching functions (those that return a `logical`) into a helper function `create_string_match_expr()`, which is used by the bindings for `grepl`, `str_detect`, `str_starts`, and `str_ends`.
   
   I added several tests for `NA` behaviour - which [failed before the changes](https://github.com/ateucher/arrow/actions/runs/2037010015) and now pass (at least locally, will wait for CI to finish here)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ateucher commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ateucher commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835419994



##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -248,14 +254,26 @@ test_that("grepl with ignore.case = TRUE and fixed = TRUE", {
   expect_equal(
     df %>%
       Table$create() %>%
-      filter(x = grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
+      filter(grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
       collect(),
     tibble(x = character(0))
   )
+  expect_equal(
+    df %>%
+      Table$create() %>%
+      mutate(
+        a = grepl("O", x, ignore.case = TRUE, fixed = TRUE)
+      ) %>%
+      collect(),
+    tibble(
+      x = c("Foo", "bar", NA_character_),
+      a = c(TRUE, FALSE, FALSE)
+    )
+  )

Review comment:
       Yeah, I was impressed with that `compare_dplyr_bindings` expectation. Very clever. I left it this way (explicitly comparing to the manually-created tibble) due to the comment [here](https://github.com/apache/arrow/blob/acc6c2eb8ae580085296f608c1a0e8564269456d/r/tests/testthat/test-dplyr-funcs-string.R#L239-L240)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1079533870


   Benchmark runs are scheduled for baseline = bfa3bca91968bb59e36ad5ddd931d636165b52f6 and contender = 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42. 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e403fd6b526543e3b506e544f86e8280...e41751e728ce4dfcbb5358c6ea401bb2/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1f7efd4f37cb48f1947573b1dd5cc4dd...e6f545f5e8414a6c8c9e67170d195418/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/58e6938e2d114fbdb000115d14f6a1d1...f053678d25a64e86b8104dab34d1a79b/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/93c48de9c49b4f558cbce418db9082a2...f5e70fd8cff84a5e885fa7c9bc1dccbe/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ateucher commented on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ateucher commented on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1079512119


   My pleasure! Thank you for the help and tips... and kudos on the Arrow dev guide. It's really good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1079533870


   Benchmark runs are scheduled for baseline = bfa3bca91968bb59e36ad5ddd931d636165b52f6 and contender = 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42. 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e403fd6b526543e3b506e544f86e8280...e41751e728ce4dfcbb5358c6ea401bb2/)
   [Finished :arrow_down:0.75% :arrow_up:0.21%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1f7efd4f37cb48f1947573b1dd5cc4dd...e6f545f5e8414a6c8c9e67170d195418/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/58e6938e2d114fbdb000115d14f6a1d1...f053678d25a64e86b8104dab34d1a79b/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/93c48de9c49b4f558cbce418db9082a2...f5e70fd8cff84a5e885fa7c9bc1dccbe/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835316494



##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -1103,6 +1130,14 @@ test_that("str_starts, str_ends, startsWith, endsWith", {
     df
   )
 
+  compare_dplyr_binding(
+    .input %>%
+      transmute(a = str_ends(x, "r"),
+                b = str_ends(x, "r", negate = TRUE),
+                d = str_ends(x, fixed("r"))) %>%

Review comment:
       ```suggestion
                   b = str_ends(x, "r", negate = TRUE),
                   c = str_ends(x, fixed("r")),
                   d = str_ends(x, fixed("r"), negate = TRUE)) %>%
   ```
   
   I think this would catch what I was thinking up above?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ateucher commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ateucher commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835416505



##########
File path: r/R/dplyr-funcs-string.R
##########
@@ -271,11 +285,13 @@ register_bindings_string_regex <- function() {
     if (opts$fixed) {
       out <- call_binding("endsWith", x = string, suffix = opts$pattern)
     } else {
-      out <- call_binding("grepl", pattern = paste0(opts$pattern, "$"), x = string, fixed = FALSE)
-    }
-
-    if (negate) {
-      out <- !out
+      out <- create_string_match_expr(
+        arrow_fun = "match_substring_regex",
+        string = string,
+        pattern = paste0(opts$pattern, "$"),
+        ignore_case = opts$ignore_case,
+        negate = negate
+      )

Review comment:
       Oh that's a great catch. I'll add a test and modify accordingly. It's probably worth taking that out of `create_string_match_expr()` and just doing it explicitly in the three functions that need it, as before.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ateucher commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ateucher commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835420273



##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -1103,6 +1130,14 @@ test_that("str_starts, str_ends, startsWith, endsWith", {
     df
   )
 
+  compare_dplyr_binding(
+    .input %>%
+      transmute(a = str_ends(x, "r"),
+                b = str_ends(x, "r", negate = TRUE),
+                d = str_ends(x, fixed("r"))) %>%

Review comment:
       Perfect thanks!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835452214



##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -248,14 +254,26 @@ test_that("grepl with ignore.case = TRUE and fixed = TRUE", {
   expect_equal(
     df %>%
       Table$create() %>%
-      filter(x = grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
+      filter(grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
       collect(),
     tibble(x = character(0))
   )
+  expect_equal(
+    df %>%
+      Table$create() %>%
+      mutate(
+        a = grepl("O", x, ignore.case = TRUE, fixed = TRUE)
+      ) %>%
+      collect(),
+    tibble(
+      x = c("Foo", "bar", NA_character_),
+      a = c(TRUE, FALSE, FALSE)
+    )
+  )

Review comment:
       Aaah, yup you're absolutely right. I missed that comment up there




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1079533870


   Benchmark runs are scheduled for baseline = bfa3bca91968bb59e36ad5ddd931d636165b52f6 and contender = 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42. 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e403fd6b526543e3b506e544f86e8280...e41751e728ce4dfcbb5358c6ea401bb2/)
   [Finished :arrow_down:0.75% :arrow_up:0.21%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1f7efd4f37cb48f1947573b1dd5cc4dd...e6f545f5e8414a6c8c9e67170d195418/)
   [Finished :arrow_down:2.14% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/58e6938e2d114fbdb000115d14f6a1d1...f053678d25a64e86b8104dab34d1a79b/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/93c48de9c49b4f558cbce418db9082a2...f5e70fd8cff84a5e885fa7c9bc1dccbe/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835659090



##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -248,14 +254,26 @@ test_that("grepl with ignore.case = TRUE and fixed = TRUE", {
   expect_equal(
     df %>%
       Table$create() %>%
-      filter(x = grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
+      filter(grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
       collect(),
     tibble(x = character(0))
   )
+  expect_equal(
+    df %>%
+      Table$create() %>%
+      mutate(
+        a = grepl("O", x, ignore.case = TRUE, fixed = TRUE)
+      ) %>%
+      collect(),
+    tibble(
+      x = c("Foo", "bar", NA_character_),
+      a = c(TRUE, FALSE, FALSE)
+    )
+  )

Review comment:
       Hmmm, yeah maybe. Would you mind opening a Jira for that (we can get some discussion there + implement it separately so as not to extend the scope of this PR too much!)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ateucher commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ateucher commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835482760



##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -248,14 +254,26 @@ test_that("grepl with ignore.case = TRUE and fixed = TRUE", {
   expect_equal(
     df %>%
       Table$create() %>%
-      filter(x = grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
+      filter(grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
       collect(),
     tibble(x = character(0))
   )
+  expect_equal(
+    df %>%
+      Table$create() %>%
+      mutate(
+        a = grepl("O", x, ignore.case = TRUE, fixed = TRUE)
+      ) %>%
+      collect(),
+    tibble(
+      x = c("Foo", "bar", NA_character_),
+      a = c(TRUE, FALSE, FALSE)
+    )
+  )

Review comment:
       Given that base `grepl` gives that warning when `ignore.case = TRUE` and `fixed = TRUE`, should the binding for `grepl` (as well as for `sub` and `gsub`) also emit that warning?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ateucher commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ateucher commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835419994



##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -248,14 +254,26 @@ test_that("grepl with ignore.case = TRUE and fixed = TRUE", {
   expect_equal(
     df %>%
       Table$create() %>%
-      filter(x = grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
+      filter(grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
       collect(),
     tibble(x = character(0))
   )
+  expect_equal(
+    df %>%
+      Table$create() %>%
+      mutate(
+        a = grepl("O", x, ignore.case = TRUE, fixed = TRUE)
+      ) %>%
+      collect(),
+    tibble(
+      x = c("Foo", "bar", NA_character_),
+      a = c(TRUE, FALSE, FALSE)
+    )
+  )

Review comment:
       Yeah, I was impressed with that `compare_dplyr_bindings` expectation. Very clever! I left it this way (explicitly comparing to the manually-created tibble) due to the comment [here](https://github.com/apache/arrow/blob/acc6c2eb8ae580085296f608c1a0e8564269456d/r/tests/testthat/test-dplyr-funcs-string.R#L239-L240).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot commented on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ursabot commented on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1079533870


   Benchmark runs are scheduled for baseline = bfa3bca91968bb59e36ad5ddd931d636165b52f6 and contender = 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42. 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Scheduled] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e403fd6b526543e3b506e544f86e8280...e41751e728ce4dfcbb5358c6ea401bb2/)
   [Scheduled] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1f7efd4f37cb48f1947573b1dd5cc4dd...e6f545f5e8414a6c8c9e67170d195418/)
   [Scheduled] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/58e6938e2d114fbdb000115d14f6a1d1...f053678d25a64e86b8104dab34d1a79b/)
   [Scheduled] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/93c48de9c49b4f558cbce418db9082a2...f5e70fd8cff84a5e885fa7c9bc1dccbe/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ursabot edited a comment on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ursabot edited a comment on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1079533870


   Benchmark runs are scheduled for baseline = bfa3bca91968bb59e36ad5ddd931d636165b52f6 and contender = 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42. 5bd4d8ec279d1e9263b99a6bac619d80fe7dee42 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:25.0% :warning: Contender and baseline run contexts do not match] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e403fd6b526543e3b506e544f86e8280...e41751e728ce4dfcbb5358c6ea401bb2/)
   [Finished :arrow_down:0.75% :arrow_up:0.21%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/1f7efd4f37cb48f1947573b1dd5cc4dd...e6f545f5e8414a6c8c9e67170d195418/)
   [Finished :arrow_down:2.14% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/58e6938e2d114fbdb000115d14f6a1d1...f053678d25a64e86b8104dab34d1a79b/)
   [Finished :arrow_down:0.38% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/93c48de9c49b4f558cbce418db9082a2...f5e70fd8cff84a5e885fa7c9bc1dccbe/)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1078475154






-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ateucher commented on pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
ateucher commented on pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#issuecomment-1079435623


   I think I've addressed all of your comments now @jonkeane - let me know what you need me to do next!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane commented on a change in pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
jonkeane commented on a change in pull request #12711:
URL: https://github.com/apache/arrow/pull/12711#discussion_r835310289



##########
File path: r/R/dplyr-funcs-string.R
##########
@@ -271,11 +285,13 @@ register_bindings_string_regex <- function() {
     if (opts$fixed) {
       out <- call_binding("endsWith", x = string, suffix = opts$pattern)
     } else {
-      out <- call_binding("grepl", pattern = paste0(opts$pattern, "$"), x = string, fixed = FALSE)
-    }
-
-    if (negate) {
-      out <- !out
+      out <- create_string_match_expr(
+        arrow_fun = "match_substring_regex",
+        string = string,
+        pattern = paste0(opts$pattern, "$"),
+        ignore_case = opts$ignore_case,
+        negate = negate
+      )

Review comment:
       Hmm, I don't think we have tests that would cover this (though we should!) but what would happen if someone called `str_ends("bar_foo", fixed("foo"), negate = TRUE)`? I think we might still need the separate `if (negate) {...}` block here

##########
File path: r/R/dplyr-funcs-string.R
##########
@@ -257,11 +269,13 @@ register_bindings_string_regex <- function() {
     if (opts$fixed) {
       out <- call_binding("startsWith", x = string, prefix = opts$pattern)
     } else {
-      out <- call_binding("grepl", pattern = paste0("^", opts$pattern), x = string, fixed = FALSE)
-    }
-
-    if (negate) {
-      out <- !out
+      out <- create_string_match_expr(
+        arrow_fun = "match_substring_regex",
+        string = string,
+        pattern = paste0("^", opts$pattern),
+        ignore_case = opts$ignore_case,
+        negate = negate
+      )

Review comment:
       Same thing here about negate as `str_ends` below (sorry the comments are backwards! I saw it there first 🙈 )

##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -1075,6 +1093,15 @@ test_that("str_starts, str_ends, startsWith, endsWith", {
     df
   )
 
+  compare_dplyr_binding(
+    .input %>%
+      transmute(a = str_starts(x, "b.*"),
+                b = str_starts(x, "b.*", negate = TRUE),
+                d = str_starts(x, fixed("b"))) %>%
+      collect(),
+    df
+  )
+

Review comment:
       Ah, nice! I see you're already used to `compare_dplyr_binding` here. Nice additions!

##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -248,14 +254,26 @@ test_that("grepl with ignore.case = TRUE and fixed = TRUE", {
   expect_equal(
     df %>%
       Table$create() %>%
-      filter(x = grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
+      filter(grepl("^B.+", x, ignore.case = TRUE, fixed = TRUE)) %>%
       collect(),
     tibble(x = character(0))
   )
+  expect_equal(
+    df %>%
+      Table$create() %>%
+      mutate(
+        a = grepl("O", x, ignore.case = TRUE, fixed = TRUE)
+      ) %>%
+      collect(),
+    tibble(
+      x = c("Foo", "bar", NA_character_),
+      a = c(TRUE, FALSE, FALSE)
+    )
+  )

Review comment:
       ```suggestion
     compare_dplyr_binding(
       .input %>%
         mutate(
           a = grepl("O", x, ignore.case = TRUE, fixed = TRUE)
         ) %>%
         collect(),
       df
     )
   ```
   
   This is more stylistic than anything else, but you should be able to swap out this code
   
   This uses one of our custom expectations, which can be a little hairy at first, but test a few different routes (as a table, as a record batch) + confirm we're getting the same behavior as base R | the tidyverse functions we are binding
   
   https://github.com/apache/arrow/blob/acc6c2eb8ae580085296f608c1a0e8564269456d/r/tests/testthat/helper-expectation.R#L69-L137

##########
File path: r/tests/testthat/test-dplyr-funcs-string.R
##########
@@ -1103,6 +1130,14 @@ test_that("str_starts, str_ends, startsWith, endsWith", {
     df
   )
 
+  compare_dplyr_binding(
+    .input %>%
+      transmute(a = str_ends(x, "r"),
+                b = str_ends(x, "r", negate = TRUE),
+                d = str_ends(x, fixed("r"))) %>%

Review comment:
       ```suggestion
                   b = str_ends(x, "r", negate = TRUE),
                   c = str_ends(x, fixed("r")),
                   d = str_ends(x,  fixed("r"), negate = TRUE)) %>%
   ```
   
   I think this would catch what I was thinking up above?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jonkeane closed pull request #12711: ARROW-16007: [R] grepl bindings return FALSE for NA inputs

Posted by GitBox <gi...@apache.org>.
jonkeane closed pull request #12711:
URL: https://github.com/apache/arrow/pull/12711


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org