You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/11/21 10:40:38 UTC

[GitHub] [arrow] jangorecki opened a new issue #8732: arrow::write_feather error: Capacity error: array cannot contain more than 2147483646 bytes

jangorecki opened a new issue #8732:
URL: https://github.com/apache/arrow/issues/8732


   I tried to log in to my existing account on jira but I am keep getting the error
   ```
   Sorry, your userid is required to answer a CAPTCHA question correctly.
   ```
   thus reporting issue here.
   
   I cannot write arrow file due to the error about array limitation. Reproducible example:
   ```r
   d = data.frame(id1 = factor(paste0("id",1:4e8)))
   write_feather(d, "data.feather")
   #Error in Table__from_dots(dots, schema) : 
   #  Capacity error: array cannot contain more than 2147483646 bytes, have 2147483649
   ```
   R 4.0.3
   arrow 2.0.0
   
   Ubuntu 18.04
   kernel 5.4.0


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on issue #8732: arrow::write_feather error: Capacity error: array cannot contain more than 2147483646 bytes

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on issue #8732:
URL: https://github.com/apache/arrow/issues/8732#issuecomment-732298256


   You could also generate the data from Python (`pyarrow`) and I believe it would handle the chunking correctly.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson commented on issue #8732: arrow::write_feather error: Capacity error: array cannot contain more than 2147483646 bytes

Posted by GitBox <gi...@apache.org>.
nealrichardson commented on issue #8732:
URL: https://github.com/apache/arrow/issues/8732#issuecomment-732297194


   Thanks for the report, and that's odd about your ASF jira account--maybe password reset would fix it? 
   
   The issue is that the R package isn't chunking the data.frame when converting to Arrow--that is, it's not a limitation of Arrow/Feather format but just of the R package as it stands now. We're working on it (https://issues.apache.org/jira/browse/ARROW-9293 among others) and hope to have some improvements in the next release.
   
   If you're interested, you may be able to work around this now by doing the chunking yourself, something like
   
   ```r
   write_feather(Table$create(d[1:2e8, , drop = FALSE], d[2e8 + 1:2e8, , drop = FALSE]), "data.feather")
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] nealrichardson closed issue #8732: arrow::write_feather error: Capacity error: array cannot contain more than 2147483646 bytes

Posted by GitBox <gi...@apache.org>.
nealrichardson closed issue #8732:
URL: https://github.com/apache/arrow/issues/8732


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org