You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/23 23:29:00 UTC
[jira] [Updated] (ARROW-2019) Control the memory allocated for
inner vector in LIST
[ https://issues.apache.org/jira/browse/ARROW-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-2019:
----------------------------------
Labels: pull-request-available (was: )
> Control the memory allocated for inner vector in LIST
> -----------------------------------------------------
>
> Key: ARROW-2019
> URL: https://issues.apache.org/jira/browse/ARROW-2019
> Project: Apache Arrow
> Issue Type: Improvement
> Reporter: Siddharth Teotia
> Assignee: Siddharth Teotia
> Priority: Critical
> Labels: pull-request-available
>
> We have observed cases in our external sort code where the amount of memory actually allocated for a record batch sometimes turns out to be more than necessary and also more than what was reserved by the operator for special purposes. Thus queries fail with OOM.
> Usually to control the memory allocated by vector.allocateNew() is to do a setInitialCapacity() and the latter modifies the vector state variables which are then used to allocate memory. However, due to the multiplier of 5 used in List Vector, we end up asking for more memory than necessary. For example, for a value count of 4095, we asked for 128KB of memory for an offset buffer of VarCharVector for a field which was list of varchars.
> We did ((4095 * 5) + 1) * 4 => 80KB . => 128KB (rounded off to power of 2 allocation).
> We had earlier made changes to setInitialCapacity() of ListVector when we were facing problems with deeply nested lists and decided to use the multiplier only for the leaf scalar vector.
> It looks like there is a need for a specialized setInitialCapacity() for ListVector where the caller dictates the repeatedness.
> Also, there is another bug in setInitialCapacity() where the allocation of validity buffer doesn't obey the capacity specified in setInitialCapacity().
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)