You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Wes McKinney (Jira)" <ji...@apache.org> on 2020/05/25 19:54:00 UTC

[jira] [Commented] (ARROW-7012) [C++] Clarify ChunkedArray chunking strategy and policy

    [ https://issues.apache.org/jira/browse/ARROW-7012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17116220#comment-17116220 ] 

Wes McKinney commented on ARROW-7012:
-------------------------------------

In general, this is not something that users should be too concerned with. The new kernels framework provides a configurability knob ({{ExecContext::exec_chunksize}}) for selecting the upper limit for the size of chunks that are processed

> [C++] Clarify ChunkedArray chunking strategy and policy
> -------------------------------------------------------
>
>                 Key: ARROW-7012
>                 URL: https://issues.apache.org/jira/browse/ARROW-7012
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Neal Richardson
>            Priority: Major
>             Fix For: 1.0.0
>
>
> See discussion on ARROW-6784 and [https://github.com/apache/arrow/pull/5686]. Among the questions:
>  * Do Arrow users control the chunking, or is it an internal implementation detail they should not manage?
>  * If users control it, how do they control it? E.g. if I call Take and use a ChunkedArray for the indices to take, does the chunking follow how the indices are chunked? Or should we attempt to preserve the mapping of data to their chunks in the input table/chunked array?
>  * If it's an implementation detail, what is the optimal chunk size? And when is it worth reshaping (concatenating, slicing) input data to attain this optimal size? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)