You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/24 14:26:42 UTC

[GitHub] [arrow] thisisnic opened a new pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

thisisnic opened a new pull request #10387:
URL: https://github.com/apache/arrow/pull/10387


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r644333487



##########
File path: r/R/util.R
##########
@@ -110,3 +110,11 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+is_writable_table <- function(x) {
+  inherits(x, c("data.frame", "ArrowTabular"))
+}
+
+attr(is_writable_table, "fail") <- function(call, env){
+  paste0(eval(substitute(substitute(y, env), list(y = call$x)))," is not an object of class 'data.frame', 'RecordBatch', or 'Table'.")

Review comment:
       Ah, thanks, had got mixed up with something else you said, but makes sense now.
   
   Worked out a way of putting the object's class back into the error message so have pushed that change now, as well as the comment about what that attribute does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r638834438



##########
File path: r/R/util.R
##########
@@ -110,3 +110,9 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+check_tabular <- function(x, format){
+  if(!inherits(x, "ArrowTabular")){
+    stop(paste0("Cannot write to ", format, ". 'x' must be an object of class 'data.frame', 'RecordBatch', or 'Table', not '", class(x)[1], "'"))

Review comment:
       Great idea, have changed to use `rlang::abort`. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r639909406



##########
File path: r/tests/testthat/test-csv.R
##########
@@ -317,7 +317,7 @@ test_that("Write a CSV file with different batch sizes", {
 test_that("Write a CSV file with invalid input type", {
   expect_error(
     write_csv_arrow(Array$create(1:5), csv_file),
-    regexp = 'x must be a "ArrowTabular"'
+    regexp = "Cannot write to CSV. 'x' must be an object of class 'data.frame', 'RecordBatch', or 'Table', not 'Array'"

Review comment:
       Sounds good - Jon opened a ticket to do that




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #10387:
URL: https://github.com/apache/arrow/pull/10387


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r644135689



##########
File path: r/R/util.R
##########
@@ -110,3 +110,11 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+is_writable_table <- function(x) {
+  inherits(x, c("data.frame", "ArrowTabular"))
+}
+
+attr(is_writable_table, "fail") <- function(call, env){

Review comment:
       Can you add a comment pointing to `assertthat` and how this works?

##########
File path: r/R/util.R
##########
@@ -110,3 +110,11 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+is_writable_table <- function(x) {
+  inherits(x, c("data.frame", "ArrowTabular"))
+}
+
+attr(is_writable_table, "fail") <- function(call, env){
+  paste0(eval(substitute(substitute(y, env), list(y = call$x)))," is not an object of class 'data.frame', 'RecordBatch', or 'Table'.")

Review comment:
       What do you think about saying "must be" instead of "is not"?
   
   Also, previously you had said what the class of `x` was: is that feasible here? (Ok if not or not worth it, just observing that it was nice.)

##########
File path: r/R/util.R
##########
@@ -110,3 +110,11 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+is_writable_table <- function(x) {
+  inherits(x, c("data.frame", "ArrowTabular"))
+}
+
+attr(is_writable_table, "fail") <- function(call, env){
+  paste0(eval(substitute(substitute(y, env), list(y = call$x)))," is not an object of class 'data.frame', 'RecordBatch', or 'Table'.")

Review comment:
       Also, I don't think the eval(substitute(substitute())) is totally safe: 
   
   ```
   > assert_that(is_writable_table(c("a", "b")))
   Error in stop(assertError(attr(res, "msg"))) : bad error message
   ```
   
   `deparse(call$x)` seems to work:
   
   ```
   > attr(is_writable_table, "fail") <- function(call, env){
     paste0(deparse(call$x)," is not an object of class 'data.frame', 'RecordBatch', or 'Table'.")
   }
   > assert_that(is_writable_table(c("a", "b")))
   Error: c("a", "b") is not an object of class 'data.frame', 'RecordBatch', or 'Table'.
   > assert_that(is_writable_table(letters))
   Error: letters is not an object of class 'data.frame', 'RecordBatch', or 'Table'.
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r638725724



##########
File path: r/tests/testthat/test-csv.R
##########
@@ -317,7 +317,7 @@ test_that("Write a CSV file with different batch sizes", {
 test_that("Write a CSV file with invalid input type", {
   expect_error(
     write_csv_arrow(Array$create(1:5), csv_file),
-    regexp = 'x must be a "ArrowTabular"'
+    regexp = "Cannot write to CSV. 'x' must be an object of class 'data.frame', 'RecordBatch', or 'Table', not 'Array'"

Review comment:
       This is also not strictly about this pull request, but I believe the 📦 would benefit from switching to testthat 3rd edition `expect_snaphot()` ... https://testthat.r-lib.org/articles/third-edition.html

##########
File path: r/R/util.R
##########
@@ -110,3 +110,9 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+check_tabular <- function(x, format){
+  if(!inherits(x, "ArrowTabular")){
+    stop(paste0("Cannot write to ", format, ". 'x' must be an object of class 'data.frame', 'RecordBatch', or 'Table', not '", class(x)[1], "'"))

Review comment:
       This might be more of a general thing than related to this specific pull request, but perhaps we could use `rlang::abort()` and take advantage of the bullets, e.g. 
   
   ```r
   abort(c(
     paste0("Cannot write to ", format, ".), 
     x = paste0("'x' must be an object of class 'data.frame', 'RecordBatch', or 'Table', not '", class(x)[1], "'")
   ))
   ```
   
   Or if we were willing to use `glue::` 
   
   ```r
   abort(c(
     glue("Cannot write to { format }."), 
     x = glue("'x' must be an object of class 'data.frame', 'RecordBatch', or 'Table', not '{ class(x)[1] }'")
   ))
   ```
   
   Relatedly, we tend to use backticks when naming objects and calls, i.e. 
   
   ```r
   glue("`x` must be an object") ...
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r644177675



##########
File path: r/R/util.R
##########
@@ -110,3 +110,11 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+is_writable_table <- function(x) {
+  inherits(x, c("data.frame", "ArrowTabular"))
+}
+
+attr(is_writable_table, "fail") <- function(call, env){
+  paste0(eval(substitute(substitute(y, env), list(y = call$x)))," is not an object of class 'data.frame', 'RecordBatch', or 'Table'.")

Review comment:
       I think that "must be" sounds better than "is not" after reading the tidyverse style guide's approach to errors since updating this PR ;) Yeah, makes sense - is a little more informative, will update.
   
   I'll have a look into alternatives.  My problem was that when I try it with `deparse(call$x)` it works when called directly like you have above, but when I run the unit tests and it's being called from the `write_*` functions, the error message then reads `x must be an object of class 'data.frame', 'RecordBatch', or 'Table'.`.  I guess it's looking at the wrong level, but I wasn't sure how to get it to look at the right one.  Any ideas?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r644285045



##########
File path: r/R/util.R
##########
@@ -110,3 +110,11 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+is_writable_table <- function(x) {
+  inherits(x, c("data.frame", "ArrowTabular"))
+}
+
+attr(is_writable_table, "fail") <- function(call, env){
+  paste0(eval(substitute(substitute(y, env), list(y = call$x)))," is not an object of class 'data.frame', 'RecordBatch', or 'Table'.")

Review comment:
       I don't think that's bad because `x` is the name of the argument in these functions. If the argument were named something else, it would use that name:
   
   ```
   > f <- function(zzz) {
     assert_that(is_writable_table(zzz))
   }
   > f(letters)
   Error: zzz is not an object of class 'data.frame', 'RecordBatch', or 'Table'.
   > f(c("a", "b"))
   Error: zzz is not an object of class 'data.frame', 'RecordBatch', or 'Table'.
   ```
   
   This is how `assertthat` does it too. See https://github.com/hadley/assertthat/blob/master/R/base-is.r for example.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] romainfrancois commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
romainfrancois commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r640381291



##########
File path: r/R/util.R
##########
@@ -110,3 +110,18 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+check_tabular <- function(x, format) {
+  if (!inherits(x, "ArrowTabular")) {
+    abort(
+      c(
+        paste0("Cannot write to ", format, "."),
+        x = paste0(
+          "`x` must be an object of class 'data.frame', 'RecordBatch', or 'Table', not '",
+          class(x)[1],
+          "'"
+        )
+      )
+    )
+  }
+}

Review comment:
       ```suggestion
   }
   
   ```
   
   GitHub seems concerned that this does not have a newline. 
   
   Should this perhaps end with `invisible(x)` ? Probably not important with the current use case, but that would allow e.g. `do_something(check_tabular(x), ...)`
   
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r644333487



##########
File path: r/R/util.R
##########
@@ -110,3 +110,11 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+is_writable_table <- function(x) {
+  inherits(x, c("data.frame", "ArrowTabular"))
+}
+
+attr(is_writable_table, "fail") <- function(call, env){
+  paste0(eval(substitute(substitute(y, env), list(y = call$x)))," is not an object of class 'data.frame', 'RecordBatch', or 'Table'.")

Review comment:
       Ah, thanks, had got mixed up with something else you said, but makes sense now.
   
   Worked out a way of putting the object's class back into the error message so have pushed that change now, as well as the comment about what that attribute does.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#issuecomment-847083731


   https://issues.apache.org/jira/browse/ARROW-12761


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r643894665



##########
File path: r/R/util.R
##########
@@ -110,3 +110,19 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+check_tabular <- function(x, format) {

Review comment:
       Cheers for the feedback @nealrichardson, and thanks for the `assertthat` hint, I have learned a LOT about evaluation this morning working that one out :grin: 
   
   I've made a number of changes - let me know how it looks now or if there's still more to do.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r643343363



##########
File path: r/R/util.R
##########
@@ -110,3 +110,19 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+check_tabular <- function(x, format) {

Review comment:
       I like factoring out a function to encapsulate common logic, but I have some concerns with this one. The name is too generic--I would expect `check_tabular <- function(x) inherits(x, "ArrowTabular")`--but this is much more specific and doesn't really describe what it does, which I think will make this harder to maintain and use going forward.
   
   * Nothing about the function name suggests that it is only for writing to a format, but the error message says that
   * It checks whether you have `ArrowTabular` but then the error message says data.frame or ArrowTabular. This happens to work because you call it after converting data.frame to Table inside each function where this is used, but it wouldn't necessarily work if you used it in another function.
   * It says "`x` must be ..." but nothing guarantees that the argument name in the parent function is called `x`
   
   Some suggestions:
   
   * Rename to `assert_` something since, like the other assert_ functions, this raises an error if the condition is not met. 
   * Either make the full error message (i.e. what currently is `paste0("Cannot write to ", format, ".")` a function argument so that the function isn't specific to writing, or put "write" somewhere in the function name so that it's clear that it's only for validating writing.
   * Use some `call` parsing to get the actual argument name from the caller (that is, don't assume it's called `x`) (Would `asserthat` do this for you, or have code you can refer to for inspiration?)
   * Change the assertion to `if (!inherits(x, c("data.frame", "ArrowTabular")))` so that it matches the message. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] thisisnic commented on a change in pull request #10387: ARROW-12761: [R] Better error handling for write_to_raw

Posted by GitBox <gi...@apache.org>.
thisisnic commented on a change in pull request #10387:
URL: https://github.com/apache/arrow/pull/10387#discussion_r640456625



##########
File path: r/R/util.R
##########
@@ -110,3 +110,18 @@ handle_embedded_nul_error <- function(e) {
   }
   stop(e)
 }
+
+check_tabular <- function(x, format) {
+  if (!inherits(x, "ArrowTabular")) {
+    abort(
+      c(
+        paste0("Cannot write to ", format, "."),
+        x = paste0(
+          "`x` must be an object of class 'data.frame', 'RecordBatch', or 'Table', not '",
+          class(x)[1],
+          "'"
+        )
+      )
+    )
+  }
+}

Review comment:
       Thanks for looking at this- have updated it to do this now to return `invisible(x)` & will update the tests to use `expect_snapshot_error` when making a PR to upgrade to testthat 3e.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org