You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/05 22:55:02 UTC

[GitHub] [arrow] nealrichardson opened a new pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

nealrichardson opened a new pull request #9899:
URL: https://github.com/apache/arrow/pull/9899


   Following a similar approach to ARROW-11766. This also pulls in `withr` as a test dependency and refactors some unrelated tests to use it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #9899:
URL: https://github.com/apache/arrow/pull/9899#issuecomment-813696456


   https://issues.apache.org/jira/browse/ARROW-11478


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9899:
URL: https://github.com/apache/arrow/pull/9899#discussion_r607914010



##########
File path: r/R/util.R
##########
@@ -86,3 +86,11 @@ all_names <- function(expr) {
 is_constant <- function(expr) {
   length(all_vars(expr)) == 0
 }
+
+handle_embedded_nul_error <- function(e) {
+  msg <- conditionMessage(e)
+  if (grepl(" nul ", msg)) {
+    e$message <- paste0(msg, "; to strip nuls when converting from Arrow to R, set options(arrow.skip_nul = TRUE)")
+  }
+  stop(e)
+}

Review comment:
       Confirmed: 
   
   ```
   > f <- function(...) stop(...)
   > f("boo")
   Error in f("boo") : boo
   > f("boo", call. = FALSE)
   Error: boo
   > tryCatch(f("boo"), error = function(e) stop(e))
   Error in f("boo") : boo
   > tryCatch(f("boo", call. = FALSE), error = function(e) stop(e))
   Error: boo
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
nealrichardson closed pull request #9899:
URL: https://github.com/apache/arrow/pull/9899


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on a change in pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on a change in pull request #9899:
URL: https://github.com/apache/arrow/pull/9899#discussion_r607447880



##########
File path: r/R/util.R
##########
@@ -86,3 +86,11 @@ all_names <- function(expr) {
 is_constant <- function(expr) {
   length(all_vars(expr)) == 0
 }
+
+handle_embedded_nul_error <- function(e) {
+  msg <- conditionMessage(e)
+  if (grepl(" nul ", msg)) {
+    e$message <- paste0(msg, "; to strip nuls when converting from Arrow to R, set options(arrow.skip_nul = TRUE)")
+  }
+  stop(e)
+}

Review comment:
       I thought (though I could be remembering wrong) that when you call `stop()` on a caught `simpleError` class object, you just re-raise it and that preserves however the original source of the `stop()` had said `call.` or whatever options. I'll review/confirm tomorrow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] westonpace commented on pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
westonpace commented on pull request #9899:
URL: https://github.com/apache/arrow/pull/9899#issuecomment-813791413


   > The error on centos7 is something that I see on about 1 out of every 10 runs locally on the as.data.frame(RecordBatch) test. Locally it looks like:
   > 
   > ```
   > libc++abi.dylib: Pure virtual function called!
   > /bin/sh: line 1: 93200 Abort trap: 6
   > ```
   > 
   > I haven't observed this on repeated runs of the array/chunked-array tests.
   > 
   > Any ideas @romainfrancois @bkietz? Even though my changes are purely in the R code, googling the error message suggests it's a C++ issue.
   
   It's a bug in `to_dataframe_parallel` (or possibly somewhere deeper).  Something is calling `stop` which is causing the code to bail before `tg` (the `arrow::internal::TaskGroup`) is destroyed.  The converters are then destroyed first for whatever reason (I don't really know how R cleans this stuff up).  When the `TaskGroup` is destroyed it waits until all running tasks cease.  Since this hasn't happened the tasks are still running.  The tasks reference `this` implicitly where `this` is the converter.  Thus, the tasks have a pointer to an object that has been destroyed and they attempt to make a call on it.  This yields the error you saw.
   
   A quick patch (but I'm not sure if the right one as I don't understand R's `stop` well enough) could be...
   
   ```
   diff --git a/r/src/array_to_vector.cpp b/r/src/array_to_vector.cpp
   index ddcb74946..ef57fa660 100644
   --- a/r/src/array_to_vector.cpp
   +++ b/r/src/array_to_vector.cpp
   @@ -88,11 +88,12 @@ class Converter {
      // for each array, add a task to the task group
      //
      // The task group is Finish() in the caller
   -  void IngestParallel(SEXP data, const std::shared_ptr<arrow::internal::TaskGroup>& tg) {
   +  void IngestParallel(SEXP data, const std::shared_ptr<arrow::internal::TaskGroup>& tg,
   +                      std::shared_ptr<Converter> self) {
        R_xlen_t k = 0, i = 0;
        for (const auto& array : arrays_) {
          auto n_chunk = array->length();
   -      tg->Append([=] { return IngestOne(data, array, k, n_chunk, i); });
   +      tg->Append([=] { return self->IngestOne(data, array, k, n_chunk, i); });
          k += n_chunk;
          i++;
        }
   @@ -1242,7 +1243,7 @@ cpp11::writable::list to_dataframe_parallel(
    
        // add a task to ingest data of that column if that can be done in parallel
        if (converters[i]->Parallel()) {
   -      converters[i]->IngestParallel(column, tg);
   +      converters[i]->IngestParallel(column, tg, converters[i]);
        }
      }
   ```
   
   This copies the `shared_ptr` into the task and uses that (instead of an implicit `this` capture).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on pull request #9899:
URL: https://github.com/apache/arrow/pull/9899#issuecomment-813729704


   The error on centos7 is something that I see on about 1 out of every 10 runs locally on the as.data.frame(RecordBatch) test. Locally it looks like:
   
   ```
   libc++abi.dylib: Pure virtual function called!
   /bin/sh: line 1: 93200 Abort trap: 6
   ```
   
   I haven't observed this on repeated runs of the array/chunked-array tests.
   
   Any ideas @romainfrancois @bkietz? Even though my changes are purely in the R code, googling the error message suggests it's a C++ issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on a change in pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
ianmcook commented on a change in pull request #9899:
URL: https://github.com/apache/arrow/pull/9899#discussion_r607445520



##########
File path: r/R/util.R
##########
@@ -86,3 +86,11 @@ all_names <- function(expr) {
 is_constant <- function(expr) {
   length(all_vars(expr)) == 0
 }
+
+handle_embedded_nul_error <- function(e) {
+  msg <- conditionMessage(e)
+  if (grepl(" nul ", msg)) {
+    e$message <- paste0(msg, "; to strip nuls when converting from Arrow to R, set options(arrow.skip_nul = TRUE)")
+  }
+  stop(e)
+}

Review comment:
       Is the call info in the error message at all informative? If not, you might want `stop(msg, call. = FALSE)` inside the conditional (after appending to the message of course).




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] ianmcook commented on pull request #9899: ARROW-11478: [R] Consider ways to make arrow.skip_nul option more user-friendly

Posted by GitBox <gi...@apache.org>.
ianmcook commented on pull request #9899:
URL: https://github.com/apache/arrow/pull/9899#issuecomment-813782420


   Looks good to me 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org