You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jason Altekruse (JIRA)" <ji...@apache.org> on 2015/05/08 20:23:03 UTC

[jira] [Commented] (DRILL-2996) ValueVectors shouldn't call reAlloc() in a while() loop

    [ https://issues.apache.org/jira/browse/DRILL-2996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535167#comment-14535167 ] 

Jason Altekruse commented on DRILL-2996:
----------------------------------------

Along with this functional refactoring, we need to come up with a hard specification of how much repetition we support. It is a combination of large varlength values and large lists that are bringing out out these scenarios with excessive allocation.

We have a hard limit that a batch of records can only have 65K elements (so that we know that we can use a two-byte unsigned int to index into them). Currently we impose no specific limit on the number of child values we can have in a list, but these will hit limits as they inner values are stored in vectors themselves (so they can hit this 65k limit, if we have 65k lists, this could happen with a single element list in every index and a few extra elements in just one of the lists). Most cases are handled in regular execution by the field read/writer abstractions, as well as our default behavior to only fill vectors with ~4000 values. At the Value Vector level we do not have enforcement of a hard limit for these inner values and I think that is part of the problem.

> ValueVectors shouldn't call reAlloc() in a while() loop
> -------------------------------------------------------
>
>                 Key: DRILL-2996
>                 URL: https://issues.apache.org/jira/browse/DRILL-2996
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>            Reporter: Chris Westin
>            Assignee: Hanifi Gunes
>
> Instead, reAlloc() should be change to take a "new minimum size" as an argument. This value is just the value used to determine the while loops' termination. Then reAlloc() can figure out how much more to allocate once and for all, instead of possibly reallocating and copying more than once, and it can make sure that the size doesn't overflow (we've seen some instances of the allocator being called with negative sizes).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)