You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/07/19 17:19:42 UTC
[GitHub] [arrow] paleolimbot commented on a diff in pull request #13650: ARROW-16703: [R] Refactor map_batches() so it can stream results
paleolimbot commented on code in PR #13650:
URL: https://github.com/apache/arrow/pull/13650#discussion_r924776076
##########
r/tests/testthat/test-dataset-write.R:
##########
@@ -703,6 +703,7 @@ test_that("Dataset min_rows_per_group", {
row_group_sizes <- ds %>%
map_batches(~ record_batch(nrows = .$num_rows)) %>%
+ (function(x) x$read_table()) %>%
Review Comment:
It doesn't right now because the requisite `RunWithCapturedR()` isn't there yet (it gets added here: https://github.com/apache/arrow/pull/13397/files#diff-0d1ff6f17f571f6a348848af7de9c05ed588d3339f46dd3bcf2808489f7dca92R132-R144 )
<details>
``` r
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
source_reader <- RecordBatchReader$create(
batches = list(
as_record_batch(mtcars[1:10, ]),
as_record_batch(mtcars[11:20, ]),
as_record_batch(mtcars[21:nrow(mtcars), ])
)
)
reader <- source_reader |>
map_batches(~rbind(as.data.frame(.), as.data.frame(.)))
dplyr::collect(reader)
#> Error in `dplyr::collect()` at r/R/dplyr-collect.R:43:48:
#> ! NotImplemented: Call to R from a non-R thread without calling RunWithCapturedR
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.h:242 ReadNext(&batch)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/util/iterator.h:428 it_.Next()
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:559 iterator_.Next()
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.cc:337 ReadNext(&batch)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/record_batch.cc:351 ToRecordBatches()
```
<sup>Created on 2022-07-19 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org