You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/08/31 22:26:00 UTC

[jira] [Comment Edited] (DRILL-5758) Rollup of external sort fixes to issues found by QA

    [ https://issues.apache.org/jira/browse/DRILL-5758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149708#comment-16149708 ] 

Paul Rogers edited comment on DRILL-5758 at 8/31/17 10:25 PM:
--------------------------------------------------------------

The external sort memory manager works by anticipating the allocation size of each batch: input, spill, merge, and so on. This is done using the "record batch sizer" that figures out data sizes by observing input vectors. The sizer then generates a set of allocation "hints" used to allocate proper-size vectors for the various batches. If the memory calcs are wrong, then a batch might become larger than expected, causing OOM errors. One way to check if batches are under-estimated is to check if the sort code ends up needing to double batch sizes. This does, in fact, occur:

{code}
Spilling 42 batches, into spill batches of 11397 rows, to /tmp/drill/spill/...
Initial output batch allocation: 2392064 bytes
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [91176] -> [182352]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [45592] -> [91184]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [45592] -> [91184]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [91176] -> [182352]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [91176] -> [182352]
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [182352] -> [364704]
Took 130655 us to merge 11397 records, consuming 3244032 bytes of memory
{code}

The above tells us that the estimates are off by no more than 50% (else the vectors would be double more than once.) But, the estimates are off and must be corrected. Since even the offsets are reallocated, this must mean that the record count given to the allocation code differs from the number actually written to the batch.

* Original estimate for the batch size (from elsewhere in the logs): 1,572,786
* Actual initial allocation size: 2,392,064
* Final actual allocation size: 3,244,032

This tells us that the calculations are wrong somewhere.


was (Author: paul.rogers):
The external sort memory manager works by anticipating the allocation size of each batch: input, spill, merge, and so on. This is done using the "record batch sizer" that figures out data sizes by observing input vectors. The sizer then generates a set of allocation "hints" used to allocate proper-size vectors for the various batches. If the memory calcs are wrong, then a batch might become larger than expected, causing OOM errors. One way to check if batches are under-estimated is to check if the sort code ends up needing to double batch sizes. This does, in fact, occur:

{code}
Spilling 42 batches, into spill batches of 11397 rows, to /tmp/drill/spill/...
Initial output batch allocation: 2392064 bytes
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [91176] -> [182352]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [45592] -> [91184]
vector.UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [45592] -> [91184]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.BigIntVector - Reallocating vector [c(BIGINT:OPTIONAL)]. # of bytes: [91176] -> [182352]
vector.UInt1Vector - Reallocating vector [$bits$(UINT1:REQUIRED)]. # of bytes: [11397] -> [22794]
vector.Float8Vector - Reallocating vector [d(FLOAT8:OPTIONAL)]. # of bytes: [91176] -> [182352]
vector.BigIntVector - Reallocating vector [$data$(BIGINT:REQUIRED)]. # of bytes: [182352] -> [364704]
Took 130655 us to merge 11397 records, consuming 3244032 bytes of memory
{code}

The above tells us that the estimates are off by no more than 50% (else the vectors would be double more than once.) But, the estimates are off and must be corrected.

* Original estimate for the batch size (from elsewhere in the logs): 1,572,786
* Actual initial allocation size: 2,392,064
* Final actual allocation size: 3,244,032

This tells us that the calculations are wrong somewhere.

> Rollup of external sort fixes to issues found by QA
> ---------------------------------------------------
>
>                 Key: DRILL-5758
>                 URL: https://issues.apache.org/jira/browse/DRILL-5758
>             Project: Apache Drill
>          Issue Type: Task
>    Affects Versions: 1.12.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.12.0
>
>
> Tracking JIRA to used for the PR that combines fixes for various JIRA entries. Bugs fixed in this task are given by the linked issues.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)