You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/01/23 23:29:00 UTC

[jira] [Updated] (ARROW-2019) Control the memory allocated for inner vector in LIST

     [ https://issues.apache.org/jira/browse/ARROW-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-2019:
----------------------------------
    Labels: pull-request-available  (was: )

> Control the memory allocated for inner vector in LIST
> -----------------------------------------------------
>
>                 Key: ARROW-2019
>                 URL: https://issues.apache.org/jira/browse/ARROW-2019
>             Project: Apache Arrow
>          Issue Type: Improvement
>            Reporter: Siddharth Teotia
>            Assignee: Siddharth Teotia
>            Priority: Critical
>              Labels: pull-request-available
>
> We have observed cases in our external sort code where the amount of memory actually allocated for a record batch sometimes turns out to be more than necessary and also more than what was reserved by the operator for special purposes. Thus queries fail with OOM.
> Usually to control the memory allocated by vector.allocateNew() is to do a setInitialCapacity() and the latter modifies the vector state variables which are then used to allocate memory. However, due to the multiplier of 5 used in List Vector, we end up asking for more memory than necessary. For example, for a value count of 4095, we asked for 128KB of memory for an offset buffer of VarCharVector for a field which was list of varchars. 
> We did ((4095 * 5) + 1) * 4 => 80KB . => 128KB (rounded off to power of 2 allocation). 
> We had earlier made changes to setInitialCapacity() of ListVector when we were facing problems with deeply nested lists and decided to use the multiplier only for the leaf scalar vector. 
> It looks like there is a need for a specialized setInitialCapacity() for ListVector where the caller dictates the repeatedness.
> Also, there is another bug in setInitialCapacity() where the allocation of validity buffer doesn't obey the capacity specified in setInitialCapacity(). 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)