You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2022/07/16 10:44:00 UTC

[jira] [Comment Edited] (ARROW-11841) [R][C++] Allow cancelling long-running commands

    [ https://issues.apache.org/jira/browse/ARROW-11841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17565132#comment-17565132 ] 

Antoine Pitrou edited comment on ARROW-11841 at 7/16/22 10:43 AM:
------------------------------------------------------------------

From the R perspective, you can call
{code}
SafeCallIntoRVoid([]() { cpp11::check_user_interrupt(); })
{code}
anywhere in C++ and it will return a non-OK status if there's a pending interrupt. That will only work for tasks run with {{RunWithCapturedR()}}, although that includes most of the operations one would want to cancel (e.g., reading CSV, Feather, and query engine execution after ARROW-16444). It also only works for places where the R package can insert some kind of cancel callback.

If we have a Future that we can cancel, we could rig something similar, maybe using our own event loop (currently we use Arrow's RunInSerialExecutor and I don't know how customizable that is).

In addition to RMonitor, there's also the 'later' package ( https://github.com/r-lib/later ) which can also run event loops although I don't know how customizable they are.

In the R package we have the {{RMainThread}} singleton which could be an appropriate place to register a C++ cancel callback (e.g., a {{std::function}} that calls some Future's cancel callback). In most cases, though, we don't have a future we can cancel.



was (Author: paleolimbot):
From the R perspective, you can call {{SafeCallIntoRVoid([]() { cpp11::check_user_interrupt(); })}} anywhere in C++ and it will return a non-OK status if there's a pending interrupt. That will only work for tasks run with {{RunWithCapturedR()}}, although that includes most of the operations one would want to cancel (e.g., reading CSV, Feather, and query engine execution after ARROW-16444). It also only works for places where the R package can insert some kind of cancel callback.

If we have a Future that we can cancel, we could rig something similar, maybe using our own event loop (currently we use Arrow's RunInSerialExecutor and I don't know how customizable that is).

In addition to RMonitor, there's also the 'later' package ( https://github.com/r-lib/later ) which can also run event loops although I don't know how customizable they are.

In the R package we have the {{RMainThread}} singleton which could be an appropriate place to register a C++ cancel callback (e.g., a {{std::function}} that calls some Future's cancel callback). In most cases, though, we don't have a future we can cancel.


> [R][C++] Allow cancelling long-running commands
> -----------------------------------------------
>
>                 Key: ARROW-11841
>                 URL: https://issues.apache.org/jira/browse/ARROW-11841
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Antoine Pitrou
>            Priority: Major
>             Fix For: 10.0.0
>
>
> When calling a long-running task (for example reading a CSV file) from the R prompt, users may want to interrupt with Ctrl-C.
> Allowing this will require integrating R's user interruption facility with the cancellation API that's going to be exposed in C++ (see  ARROW-8732).
> Below some information I've gathered on the topic:
> There is some hairy discussion of how to interrupt C++ code from R at https://stackoverflow.com/questions/40563522/r-how-to-write-interruptible-c-function-and-recover-partial-results and https://stat.ethz.ch/pipermail/r-devel/2011-April/060714.html .
> It seems it may involve polling cpp11::check_user_interrupt() and catching any cpp11::unwind_exception that may signal an interruption. A complication is that apparently R APIs should only be called from the main thread. There's also a small library which claims to make writing all this easier: https://github.com/tnagler/RcppThread/blob/master/inst/include/RcppThread/RMonitor.hpp
> But since user interruptions will only be noticed by the R main thread, the solution may be to launch heavy computations (e.g. CSV reading) in a separate thread and have the main R thread periodically poll for interrupts while waiting for the separate thread. This is what this dedicated thread class does in its join method: https://github.com/tnagler/RcppThread/blob/master/inst/include/RcppThread/Thread.hpp#L79



--
This message was sent by Atlassian Jira
(v8.20.10#820010)