You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-9293) [R] Add chunk_size to Table$create()

     [ https://issues.apache.org/jira/browse/ARROW-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-9293:
----------------------------------

    Assignee:     (was: Romain Francois)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [R] Add chunk_size to Table$create()
> ------------------------------------
>
>                 Key: ARROW-9293
>                 URL: https://issues.apache.org/jira/browse/ARROW-9293
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Neal Richardson
>            Priority: Major
>
> While working on ARROW-3308, I noticed that write_feather has a chunk_size argument, which by default will write batches of 64k rows into the file. In principle, a chunking strategy like this would prevent the need to bump up to large_utf8 when ingesting a large character vector because you'd end up with many chunks that each fit into a regular utf8 type. However, the way the function works, the data.frame is converted to a Table with all ChunkedArrays containing a single chunk first, which is where the large_utf8 type gets set. But if Table$create() could be instructed to make multiple chunks, this would be resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)