You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2019/11/13 19:08:00 UTC
[jira] [Commented] (ARROW-7156) [R] [C++] Large Batches Cause Error
/ Crashes
[ https://issues.apache.org/jira/browse/ARROW-7156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16973607#comment-16973607 ]
Neal Richardson commented on ARROW-7156:
----------------------------------------
Searching for the error message, I see https://github.com/apache/arrow/blob/a33bd3acae41f89972c71ad5bd559a3cecf3e197/cpp/src/arrow/memory_pool.cc#L285
which suggests an overflow somewhere.
Could you please clarify what "Crashes R Studio!" means?
https://issues.apache.org/jira/issues/?jql=project%20%3D%20ARROW%20AND%20fixVersion%20%3D%201.0.0%20AND%20component%20%3D%20R%20AND%20text%20~%20%222gb%22 finds 3 known issues about large string columns--could that be involved?
> [R] [C++] Large Batches Cause Error / Crashes
> ---------------------------------------------
>
> Key: ARROW-7156
> URL: https://issues.apache.org/jira/browse/ARROW-7156
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, R
> Affects Versions: 0.14.1, 0.15.1
> Reporter: Anthony Abate
> Priority: Major
>
> I have a 30 gig arrow file with 100 batches. the largest batch in the file causes get batch to fail - All other batches load fine. in 14.11 the individual batch errors.. in 15.1.1 the batch crashes R studio when it is used
> *14.1.1*
> {code:java}
> > rbn <- data_rbfr$get_batch(x)
> Error in ipc__RecordBatchFileReader_ReadRecordBatch(self, i) :
> Invalid: negative malloc size
> {code}
> *15.1.1*
> {code:java}
> rbn <- data_rbfr$get_batch(x) works!
> df <- as.data.frame(rbn) - Crashes R Studio! {code}
>
> Update
> I put the data in the batch into a separate file. The file size is over 2 gigs.
> Using 15.1.1, when I try to load this entire file via read_arrow it also fails.
> {code:java}
> ar <- arrow::read_arrow("e:\\temp\\file.arrow")
> Error in Table__from_RecordBatchFileReader(batch_reader) :
> Invalid: negative malloc size{code}
> {color:#c5060b} {color}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)