You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/07/08 01:08:00 UTC

[jira] [Updated] (DRILL-5446) Offset Vector in VariableLengthVectors may waste up to 256KB per value vector

     [ https://issues.apache.org/jira/browse/DRILL-5446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-5446:
-------------------------------
    Fix Version/s:     (was: 1.11.0)

> Offset Vector in VariableLengthVectors may waste up to 256KB per value vector
> -----------------------------------------------------------------------------
>
>                 Key: DRILL-5446
>                 URL: https://issues.apache.org/jira/browse/DRILL-5446
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.10.0
>            Reporter: Boaz Ben-Zvi
>            Assignee: Boaz Ben-Zvi
>
> In exec/vector/src/main/codegen/templates/VariableLengthVectors.java -- the implementation uses an "offset vector" to note the BEGINNING of each variable length element. In order to find the length (i.e. the END of the element), need to look at the FOLLOWING element. 
>   This requires the "offset vector" to have ONE MORE entry than the total number of elements -- in order to find the END of the LAST element.
>   Some places in the code (e.g., the hash table) use the maximum number of elements - 64K ( = 65536 ).  And each entry in the "offset vector" is 4-byte UInt4, hence looks like needing 256KB. 
>   However because of that "ONE MORE", the code in this case allocates for 65537, thus (rounding to next power of 2) allocating 512KB, where half is not used !!!! 
>  (And this is per each varchar value vector, per each batch; e.g., in the qa test Functional/aggregates/tpcds_variants/text/aggregate25.q where there are 10 key columns, each hash-table batch is wasting 2.5MB !).
> Possible fix: change the logic in VariableLengthVectors.java to keep the END point of each variable length element - the first element's beginning is always ZERO, so it need not be kept.
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)