You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/02/23 01:36:44 UTC

[jira] [Commented] (DRILL-5294) Managed External Sort throws an OOM during the merge and spill phase

    [ https://issues.apache.org/jira/browse/DRILL-5294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15879664#comment-15879664 ] 

Paul Rogers commented on DRILL-5294:
------------------------------------

Basic statistics:

{code}
ExternalSortBatch - Config: memory limit = 126322567, spill file size = 268435456, batch size = 8388608, 
    merge limit = 2147483647, merge batch size = 12632257
{code}

The above line appears 17 times in the log, showing that the query has 17 slices (AKA minor fragments.) This can also be seen from the minor fragment number in the log line:

{code}
[2751ce6d-67e6-ae08-3b68-e33b29f9d2a3:frag:1:16] ... ExternalSortBatch - Config
{code}

The node was given 32 GB of direct memory. Each sort is given 126,322,567 = 126 MB of memory for a total of 2 GB. So, the query is running with the default value of max query memory per node and the query is using just 1/8 of available direct memory.

Something is amiss with the "record batch sizer":

{code}
ExternalSortBatch - Memory delta: 526336, actual batch size: 365313, Diff: 161023
{code}

This does not cause the fault here, but the the "diff" should be 0 if the sizer does its job.

No log lines suggest a spill occurred. One of the sorts got the point where it will spill to disk (we see the code generated for {{PriorityQueueCopierGen56}}). At this point, some slice ran out of memory while spilling to disk.

The data file:

{code}
36,951,000,000 3500cols.tbl
{code}

Given that the data file is 37 GB in size, and sort memory is 2 GB (spread over 17 slices) considerable spilling should occur.

It may be that the copier needs more memory than was anticipated, and so the sort held more rows in memory than it should have before beginning to spill.

> Managed External Sort throws an OOM during the merge and spill phase
> --------------------------------------------------------------------
>
>                 Key: DRILL-5294
>                 URL: https://issues.apache.org/jira/browse/DRILL-5294
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>            Reporter: Rahul Challapalli
>            Assignee: Paul Rogers
>             Fix For: 1.10.0
>
>         Attachments: 2751ce6d-67e6-ae08-3b68-e33b29f9d2a3.sys.drill, drillbit.log
>
>
> commit # : 38f816a45924654efd085bf7f1da7d97a4a51e38
> The below query fails with managed sort while it succeeds on the old sort
> {code}
> select * from (select columns[433] col433, columns[0], columns[1],columns[2],columns[3],columns[4],columns[5],columns[6],columns[7],columns[8],columns[9],columns[10],columns[11] from dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by columns[450],columns[330],columns[230],columns[220],columns[110],columns[90],columns[80],columns[70],columns[40],columns[10],columns[20],columns[30],columns[40],columns[50]) d where d.col433 = 'sjka skjf';
> Error: RESOURCE ERROR: External Sort encountered an error while spilling to disk
> Fragment 1:11
> [Error Id: 0aa20284-cfcc-450f-89b3-645c280f33a4 on qa-node190.qa.lab:31010] (state=,code=0)
> {code}
> Env : 
> {code}
> No of Drillbits : 1
> DRILL_MAX_DIRECT_MEMORY="32G"
> DRILL_MAX_HEAP="4G"
> {code}
> Attached the logs and profile. Data is too large for a jira



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)