You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/14 18:45:13 UTC

[GitHub] [arrow] thisisnic opened a new pull request #10032: ARROW-12185: [R] Bindings for any, all

thisisnic opened a new pull request #10032:
URL: https://github.com/apache/arrow/pull/10032


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #10032:
URL: https://github.com/apache/arrow/pull/10032


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
thisisnic commented on pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#issuecomment-824105454


   > @thisisnic it looks like when you ran `devtools::document()`, it built some docs changes unrelated to this PR. This happens sometimes (e.g. when someone else commits a PR without running `devtools::document()`) and it's no big deal, but it'd be a good idea to push a commit to undo those unrelated docs changes, to avoid any merge conflicts.
   
   Fixed now!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614329094



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       In `scalar_aggregate()` and `median.ArrowDatum`, we check na.rm and null_count before calling the C++ function. Why should any/all do it differently?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614334376



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       The case with `any` and `all` is different—it's about three-valued logic, e.g. with the default `na.rm = FALSE`, `any(c(TRUE, NA))` returns `TRUE` but `any(c(FALSE, NA))` returns `NA`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r617834001



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!na.rm && a$null_count > 0 && !as.vector(result)){
+    Scalar$create(NA)

Review comment:
       ```suggestion
     if (!na.rm && a$null_count > 0 && !as.vector(result)) {
       # Three-valued logic: with na.rm = FALSE, any(c(TRUE, NA)) returns TRUE but any(c(FALSE, NA)) returns NA
       # TODO: C++ library should take na.rm for any/all (like ARROW-9054)
       Scalar$create(NA)
   ```
   
   FYI @rok, not sure if you want to add this to your PR for ARROW-9054 or if you'd like to do a followup to add any/all to it, up to you.

##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!na.rm && a$null_count > 0 && !as.vector(result)){
+    Scalar$create(NA)
+  } else {
+    result
+  }
+}
+
+#' @export
+all.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("all", a)
+  
+  if(!na.rm &&  a$null_count > 0 && as.vector(result)){

Review comment:
       ```suggestion
     if (!na.rm && a$null_count > 0 && as.vector(result)) {
       # See comment above in any() about three-valued logic
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r613976734



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       Small question: what's the `!as.vector(result)` for?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#issuecomment-819885621


   https://issues.apache.org/jira/browse/ARROW-12185


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614610345



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       Thanks for the suggestion, have updated this now!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] rok commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
rok commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r617897892



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!na.rm && a$null_count > 0 && !as.vector(result)){
+    Scalar$create(NA)

Review comment:
       Agreed. Made [ARROW-12499](https://issues.apache.org/jira/browse/ARROW-12499) to capture this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614076651



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       We discussed this just now on a call, but for transparency, I'll answer here too!  `any()` and `all()` use three-valuered logic, so here if there are `NA`s in the data and `na.rm` is set to `FALSE`, we return an NA if the initial result is not `TRUE`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r616057116



##########
File path: r/tests/testthat/test-compute-aggregate.R
##########
@@ -351,3 +351,71 @@ test_that("value_counts", {
   expect_identical(as.data.frame(value_counts(a)), result_df)
   expect_identical(as.vector(value_counts(a)$counts), result_df$counts)
 })
+
+test_that("any.Array", {
+  
+  data <- c(1:10, NA, NA)
+  array_data <- Array$create(data)
+  
+  expect_equal(as.vector(any(array_data > 5)), any(data > 5))
+  expect_equal(as.vector(any(array_data < 1)), any(data < 1))
+  expect_equal(as.vector(any(array_data < 1, na.rm = TRUE)), any(data < 1, na.rm = TRUE))

Review comment:
       These tests could be simplified by using the helper function `expect_vector_equal` (which is defined in `helper-expectation.R`). E.g.:
   ```suggestion
     expect_vector_equal(any(input > 5), data)
     expect_vector_equal(any(input < 1), data)
     expect_vector_equal(any(input < 1, na.rm = TRUE), data)
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r616057116



##########
File path: r/tests/testthat/test-compute-aggregate.R
##########
@@ -351,3 +351,71 @@ test_that("value_counts", {
   expect_identical(as.data.frame(value_counts(a)), result_df)
   expect_identical(as.vector(value_counts(a)$counts), result_df$counts)
 })
+
+test_that("any.Array", {
+  
+  data <- c(1:10, NA, NA)
+  array_data <- Array$create(data)
+  
+  expect_equal(as.vector(any(array_data > 5)), any(data > 5))
+  expect_equal(as.vector(any(array_data < 1)), any(data < 1))
+  expect_equal(as.vector(any(array_data < 1, na.rm = TRUE)), any(data < 1, na.rm = TRUE))

Review comment:
       These tests could be simplified by using the helper function `expect_vector_equal()` (which is defined in `helper-expectation.R`). E.g.:
   ```suggestion
     expect_vector_equal(any(input > 5), data)
     expect_vector_equal(any(input < 1), data)
     expect_vector_equal(any(input < 1, na.rm = TRUE), data)
   ```

##########
File path: r/tests/testthat/test-compute-aggregate.R
##########
@@ -351,3 +351,71 @@ test_that("value_counts", {
   expect_identical(as.data.frame(value_counts(a)), result_df)
   expect_identical(as.vector(value_counts(a)$counts), result_df$counts)
 })
+
+test_that("any.Array", {
+  
+  data <- c(1:10, NA, NA)
+  array_data <- Array$create(data)
+  
+  expect_equal(as.vector(any(array_data > 5)), any(data > 5))
+  expect_equal(as.vector(any(array_data < 1)), any(data < 1))
+  expect_equal(as.vector(any(array_data < 1, na.rm = TRUE)), any(data < 1, na.rm = TRUE))

Review comment:
       Note that `expect_vector_equal()` tests both `Array` and `ChunkedArray` so you'll be able to reduce the number of tests used here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#issuecomment-822671980


   @thisisnic it looks like when you ran `devtools::document()`, it built some docs changes unrelated to this PR. This happens sometimes (e.g. when someone else commits a PR without running `devtools::document()`) and it's no big deal, but it'd be a good idea to push a commit to undo those unrelated docs changes, to avoid any merge conflicts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r616464997



##########
File path: r/tests/testthat/test-compute-aggregate.R
##########
@@ -351,3 +351,71 @@ test_that("value_counts", {
   expect_identical(as.data.frame(value_counts(a)), result_df)
   expect_identical(as.vector(value_counts(a)$counts), result_df$counts)
 })
+
+test_that("any.Array", {
+  
+  data <- c(1:10, NA, NA)
+  array_data <- Array$create(data)
+  
+  expect_equal(as.vector(any(array_data > 5)), any(data > 5))
+  expect_equal(as.vector(any(array_data < 1)), any(data < 1))
+  expect_equal(as.vector(any(array_data < 1, na.rm = TRUE)), any(data < 1, na.rm = TRUE))

Review comment:
       Thanks, updated!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614086125



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       It would be best to put the `!na.rm` at the beginning here, because it's the only one of these three logicals that does not have any compute cost to calculate. Putting it at the beginning allows the expression to short-circuit and never evaluate the other two if `na.rm` is `TRUE`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
jorisvandenbossche commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r617840727



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!na.rm && a$null_count > 0 && !as.vector(result)){
+    Scalar$create(NA)

Review comment:
       > FYI @rok, not sure if you want to add this to your PR for ARROW-9054 or if you'd like to do a followup to add any/all to it, up to you.
   
   I would maybe keep that as a separate followup PR, ARROW-9054 is already getting quite big ;)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614417180



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       Ah right, got it, thanks. 
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614087301



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       ditto for the same line in the `all` definition below




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r614086125



##########
File path: r/R/compute.R
##########
@@ -186,6 +186,32 @@ unique.ArrowDatum <- function(x, incomparables = FALSE, ...) {
   call_function("unique", x)
 }
 
+#' @export
+any.ArrowDatum <- function(..., na.rm = FALSE){
+  
+  a <- collect_arrays_from_dots(list(...))
+  result <- call_function("any", a)
+
+  if(!as.vector(result) && a$null_count > 0 && !na.rm){

Review comment:
       It would be best to put the `!na.rm` at the beginning here, because it's the only one of these three logicals that does not have any compute cost to calculate. Putting it the beginning allows the expression to short-circuit and never evaluate the other two if `na.rm` is `TRUE`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #10032: ARROW-12185: [R] Bindings for any, all

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #10032:
URL: https://github.com/apache/arrow/pull/10032#discussion_r616058924



##########
File path: r/tests/testthat/test-compute-aggregate.R
##########
@@ -351,3 +351,71 @@ test_that("value_counts", {
   expect_identical(as.data.frame(value_counts(a)), result_df)
   expect_identical(as.vector(value_counts(a)$counts), result_df$counts)
 })
+
+test_that("any.Array", {
+  
+  data <- c(1:10, NA, NA)
+  array_data <- Array$create(data)
+  
+  expect_equal(as.vector(any(array_data > 5)), any(data > 5))
+  expect_equal(as.vector(any(array_data < 1)), any(data < 1))
+  expect_equal(as.vector(any(array_data < 1, na.rm = TRUE)), any(data < 1, na.rm = TRUE))

Review comment:
       Note that `expect_vector_equal` tests both `Array` and `ChunkedArray` so you'll be able to reduce the number of tests used here




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org