You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Andrew C Thomas (Jira)" <ji...@apache.org> on 2022/04/29 20:48:00 UTC

[jira] [Created] (ARROW-16423) R arrow/dplyr: simple join and collect crashes session

Andrew C Thomas created ARROW-16423:
---------------------------------------

             Summary: R arrow/dplyr: simple join and collect crashes session
                 Key: ARROW-16423
                 URL: https://issues.apache.org/jira/browse/ARROW-16423
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 7.0.0
            Reporter: Andrew C Thomas


Trying to do an inner join style filter on an open_dataset, and R crashes, but not reliably the first time. Sometimes takes a couple of tries until it does.

Reprex follows.

------------------------------------------------------

library (arrow)
library (dplyr)
library (tidyr)

DataSet <- expand_grid (A = 1:10, B = 1:10, C = 1:10000) %>%
  group_by (A, B)
write_dataset(DataSet, "TestBreakData")

for (DoThisUntilItBreaks in 1:100) {
  message (DoThisUntilItBreaks)
  D2 <- open_dataset("TestBreakData") %>% inner_join (data.frame (A=1L, B=1:5)) %>% collect
}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)