You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/12/04 00:31:58 UTC

[jira] [Updated] (DRILL-5100) External Sort does not manage memory requirements of a schema change

     [ https://issues.apache.org/jira/browse/DRILL-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-5100:
-------------------------------
    Issue Type: Sub-task  (was: Bug)
        Parent: DRILL-5080

> External Sort does not manage memory requirements of a schema change
> --------------------------------------------------------------------
>
>                 Key: DRILL-5100
>                 URL: https://issues.apache.org/jira/browse/DRILL-5100
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>
> The external sort is given a fixed amount of memory to hold buffered in-memory batches prior to spilling. External sort also handles certain schema changes when union vectors are enabled. When a schema change occurs, existing vectors are coerced into the new schema format, perhaps replacing an existing vector with a new union vector.
> This conversion requires (direct) memory. When done when the external sort has already almost filled its in-memory buffer, the conversion process can cause memory overflow and failure.
> The following show the allocated memory before and after schema changes in the unit tests {{TestExternalSort.testNumericTypes}}:
> {code}
> Before: 134144
> After: 150528
> Before: 150528
> After: 166912
> {code}
> Union vectors appear to be larger than the original BIGINT vectors. External sort must anticipate this and perhaps spill to ensure sufficient room exists for the new, larger vectors.
> Further, the conversion process itself requires that two copies of each vector be in memory: the original and the new, converted one. The external sort does not check to ensure this much working memory is available, leading to potential OOM errors during each vector conversion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)