You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/09/28 18:23:00 UTC

[jira] [Commented] (ARROW-8379) [R] Investigate/fix thread safety issues (esp. Windows)

    [ https://issues.apache.org/jira/browse/ARROW-8379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421595#comment-17421595 ] 

Neal Richardson commented on ARROW-8379:
----------------------------------------

Surveying the open issues (now linked here), it looks like:

* On 32-bit Rtools 3.5, async multithreading just does not work. This means we should disable the dataset features entirely on this build. It does not appear that we can conditionally disable ARROW_DATASET only on this build, based on how configure.win works, so we should check this in the R code. Make arrow_with_dataset() check os, R version, and arch, which will get tests and examples to skip appropriately, and then we could check that inside Dataset$create() and error informatively to prevent users on this setup from trying. 
* Multithreaded conversion from Arrow to R is prone to issues on Windows across the board, regardless of rtools version or 32/64bits. We should set option.use_threads = FALSE on Windows in .onLoad. 
    * This will have the side effect of disabling multithreading in some other parts of C++ code where use_threads is an option. It is not clear that that is strictly required, but it will be a side effect unless we distinguish the use_threads controls we expose.
* There may be more work to be done with CPU and IO threadpools, which get used internally in Arrow C++, but I think it might be best to release with these fixes and see if we still get error reports. 
    * Relatedly, an alternative to setting use_threads = FALSE globally would be to leave multithreading on but reduce the size of the CPU and IO threadpools; some reports suggest that setting them to less than the number of CPUs allow them to work. It's not clear though whether this fixes the problem or just decreases the frequency of deadlock. 

> [R] Investigate/fix thread safety issues (esp. Windows)
> -------------------------------------------------------
>
>                 Key: ARROW-8379
>                 URL: https://issues.apache.org/jira/browse/ARROW-8379
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: R
>            Reporter: Neal Richardson
>            Priority: Major
>
> There have been a number of issues where the R bindings' multithreading has been implicated in unstable behavior (ARROW-7844 for example). In ARROW-8375 I disabled {{use_threads}} in the Windows tests, and it appeared that the mysterious Windows segfaults stopped. We should fix whatever the underlying issues are.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)