You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/19 17:19:42 UTC

[GitHub] [arrow] paleolimbot commented on a diff in pull request #13650: ARROW-16703: [R] Refactor map_batches() so it can stream results

paleolimbot commented on code in PR #13650:
URL: https://github.com/apache/arrow/pull/13650#discussion_r924776076


##########
r/tests/testthat/test-dataset-write.R:
##########
@@ -703,6 +703,7 @@ test_that("Dataset min_rows_per_group", {
 
   row_group_sizes <- ds %>%
     map_batches(~ record_batch(nrows = .$num_rows)) %>%
+    (function(x) x$read_table()) %>%

Review Comment:
   It doesn't right now because the requisite `RunWithCapturedR()` isn't there yet (it gets added here: https://github.com/apache/arrow/pull/13397/files#diff-0d1ff6f17f571f6a348848af7de9c05ed588d3339f46dd3bcf2808489f7dca92R132-R144 )
   
   <details>
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   
   source_reader <- RecordBatchReader$create(
     batches = list(
       as_record_batch(mtcars[1:10, ]),
       as_record_batch(mtcars[11:20, ]),
       as_record_batch(mtcars[21:nrow(mtcars), ])
     )
   )
   
   reader <- source_reader |> 
     map_batches(~rbind(as.data.frame(.), as.data.frame(.))) 
   
   dplyr::collect(reader)
   #> Error in `dplyr::collect()` at r/R/dplyr-collect.R:43:48:
   #> ! NotImplemented: Call to R from a non-R thread without calling RunWithCapturedR
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.h:242  ReadNext(&batch)
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/util/iterator.h:428  it_.Next()
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:559  iterator_.Next()
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.cc:337  ReadNext(&batch)
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.cc:351  ToRecordBatches()
   ```
   
   <sup>Created on 2022-07-19 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>
   
   </details>



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org