You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/19 19:47:06 UTC

[GitHub] [arrow] jonkeane commented on a change in pull request #10056: ARROW-12184: [R] Bindings for na.fail, na.omit, na.exclude, na.pass

jonkeane commented on a change in pull request #10056:
URL: https://github.com/apache/arrow/pull/10056#discussion_r616110601



##########
File path: r/R/arrow-datum.R
##########
@@ -46,6 +46,22 @@ as.vector.ArrowDatum <- function(x, mode) {
   )
 }
 
+#'@export

Review comment:
       This is super minor/nit picky (though it looks like the rd tag works without it which I was a bit surprised by): 
   
   ```suggestion
   #' @export
   ```

##########
File path: r/R/arrow-tabular.R
##########
@@ -211,6 +211,31 @@ head.ArrowTabular <- head.ArrowDatum
 #' @export
 tail.ArrowTabular <- tail.ArrowDatum
 
+#' @export
+na.fail.ArrowTabular <- function(object, ...){
+  
+  na_count <- sum(purrr::map_int(object$columns, ~.x$null_count))
+  if(na_count > 0){
+    stop("missing values in object")
+  }
+  object
+  
+}
+
+#' @export
+na.omit.ArrowTabular <- function(object, ...){
+  
+  na_expr <- paste0("!is.na(", names(object), ")", collapse = ",")
+  filter_expr <- paste0("dplyr::filter(object,", na_expr, ")")
+  expression <- rlang::parse_expr(filter_expr)
+  
+  rlang::eval_tidy(expression)
+

Review comment:
       Very minor:
   ```suggestion
   ```

##########
File path: r/R/arrow-tabular.R
##########
@@ -211,6 +211,31 @@ head.ArrowTabular <- head.ArrowDatum
 #' @export
 tail.ArrowTabular <- tail.ArrowDatum
 
+#' @export
+na.fail.ArrowTabular <- function(object, ...){
+  

Review comment:
       Very minor:
   ```suggestion
   ```

##########
File path: r/tests/testthat/test-na-omit.R
##########
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+data_no_na <- c(2:10)
+data_na <- c(NA_real_, data_no_na)

Review comment:
       Would it be possible to either make the `NA` the last element or an element in the middle? I don't have any reason to think this is true here, but I've been bitten by testing the only the first element in the past.

##########
File path: r/R/arrow-tabular.R
##########
@@ -211,6 +211,31 @@ head.ArrowTabular <- head.ArrowDatum
 #' @export
 tail.ArrowTabular <- tail.ArrowDatum
 
+#' @export
+na.fail.ArrowTabular <- function(object, ...){
+  
+  na_count <- sum(purrr::map_int(object$columns, ~.x$null_count))
+  if(na_count > 0){
+    stop("missing values in object")
+  }
+  object
+  
+}
+
+#' @export
+na.omit.ArrowTabular <- function(object, ...){
+  
+  na_expr <- paste0("!is.na(", names(object), ")", collapse = ",")
+  filter_expr <- paste0("dplyr::filter(object,", na_expr, ")")

Review comment:
       Is it possible to implement this without `dplyr`? We only suggest `dplyr`, and it would be nice to not have to have it installed/loaded for `na.omit()` to work. This might not be the right approach, but my first thought: would it be possible to iterate `objects` and use the `na.omit()` method above? (or use the `$Filter` method you use above)? 

##########
File path: r/tests/testthat/helper-expectation.R
##########
@@ -189,3 +187,109 @@ expect_vector_equal <- function(expr, # A vectorized R expression containing `in
     skip(paste(skip_msg, collpase = "\n"))
   }
 }
+
+expect_vector_equivalent <- function(expr, # A vectorized R expression containing `input` as its input

Review comment:
       Another argument for `expect_vector_equal(ignore_attributes = TRUE)` is that testthat 3e deprecates `expect_equivalent()` in favor of `expect_equal(ignore_attr = TRUE)`.  And if we can name the argument, I would vote for `ignore_attr` over `ignore_attributes` to match testthat (well, I guess technically it's waldo at that point).
   

##########
File path: r/R/arrow-tabular.R
##########
@@ -211,6 +211,31 @@ head.ArrowTabular <- head.ArrowDatum
 #' @export
 tail.ArrowTabular <- tail.ArrowDatum
 
+#' @export
+na.fail.ArrowTabular <- function(object, ...){
+  
+  na_count <- sum(purrr::map_int(object$columns, ~.x$null_count))
+  if(na_count > 0){
+    stop("missing values in object")
+  }
+  object
+  
+}
+
+#' @export
+na.omit.ArrowTabular <- function(object, ...){
+  

Review comment:
       Very minor:
   ```suggestion
   ```

##########
File path: r/R/arrow-tabular.R
##########
@@ -211,6 +211,31 @@ head.ArrowTabular <- head.ArrowDatum
 #' @export
 tail.ArrowTabular <- tail.ArrowDatum
 
+#' @export
+na.fail.ArrowTabular <- function(object, ...){
+  
+  na_count <- sum(purrr::map_int(object$columns, ~.x$null_count))
+  if(na_count > 0){
+    stop("missing values in object")
+  }
+  object
+  

Review comment:
       Very minor:
   
   ```suggestion
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org