You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/10/20 01:36:00 UTC

[jira] [Commented] (ARROW-18101) [R] RecordBatchReaderHead from ExecPlan with UDF cannot be read

    [ https://issues.apache.org/jira/browse/ARROW-18101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17620669#comment-17620669 ] 

Dewey Dunnington commented on ARROW-18101:
------------------------------------------

After some sleuthing, it seems that's because an {{ExecPlanReader}} is collected to a table explicitly using {{RunWithCapturedR}}:

https://github.com/apache/arrow/blob/5984d8a81a6fe8d53e89457d113c931aae59dcd8/r/src/compute-exec.cpp#L191-L198

...but a regular {{RecordBatchReader}} (like the one created by {{head.RecordBatchReader()}}) is not:

https://github.com/apache/arrow/blob/5984d8a81a6fe8d53e89457d113c931aae59dcd8/r/src/recordbatchreader.cpp#L113-L117

I believe this was on purpose to limit number of places where {{RunWithCapturedR()}} was used since it might be causing problems in valgrind/autobrew nightlies (ARROW-17879). Ideally, {{RunWithCapturedR()}} causes no problems and it's safe to use it to collect any RecordBatchReader into a table.

> [R] RecordBatchReaderHead from ExecPlan with UDF cannot be read
> ---------------------------------------------------------------
>
>                 Key: ARROW-18101
>                 URL: https://issues.apache.org/jira/browse/ARROW-18101
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>            Reporter: Neal Richardson
>            Assignee: Dewey Dunnington
>            Priority: Major
>
> {code}
>   register_scalar_function(
>     "times_32",
>     function(context, x) x * 32.0,
>     int32(),
>     float64(),
>     auto_convert = TRUE
>   )
>   record_batch(a = 1:1000) %>%
>     dplyr::mutate(b = times_32(a)) %>%
>     as_record_batch_reader() %>%
>     head(11) %>%
>     as_arrow_table()
> # Error: NotImplemented: Call to R (resolve scalar user-defined function output data type) from a non-R thread from an unsupported context
> # /arrow/cpp/src/arrow/compute/exec.cc:649  kernel_->signature->out_type().Resolve(kernel_ctx_, args.inputs)
> # /arrow/cpp/src/arrow/compute/exec/expression.cc:602  executor->Init(&kernel_context, {kernel, types, options})
> # /arrow/cpp/src/arrow/compute/exec/project_node.cc:91  ExecuteScalarExpression(simplified_expr, target, plan()->exec_context())
> # /arrow/cpp/src/arrow/record_batch.cc:336  ReadNext(&batch)
> # /arrow/cpp/src/arrow/record_batch.cc:350  ToRecordBatches()
> {code}
> It works fine if you don't call {{as_record_batch_reader()}} in the middle. Oddly, it also works fine if you add {{as_adq()}} (aka {{collapse()}}) after head() and before evaluating to table--that is, if you run it through an ExecPlan again, it doesn't error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)