You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Jinfeng Ni (JIRA)" <ji...@apache.org> on 2017/06/16 17:59:00 UTC
[jira] [Commented] (DRILL-5211) Queries fail due to direct memory fragmentation

    [ https://issues.apache.org/jira/browse/DRILL-5211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16052193#comment-16052193 ] 

Jinfeng Ni commented on DRILL-5211:
-----------------------------------

My 2cents:

I could understand the cause of direct memory fragmentation. 

The proposal in ApacheDrillVectorLimits.pdf seems to be a reversal of changes in DRILL-1960.  You are right that value vector's setSafe() will do realloc() in case running out of drillbuf. The realloc() will double Drillbuf, which may eventually lead to memory fragment / OOM.

Before DRILL-1960, setSafe() actually returns boolean to indicate whether the method completes successfully or not (in stead of throwing an exception as in your proposed "setScalar" method). When it returns false, it's each operator's responsibility to  1) rewind / replay the overflow row,  2) break the ongoing batch into two, and 3) pass the full batch to downstream and continue work with the the overflow row plus rest incoming rows.  I guess that puts significant complexity to operators, and that's why DRILL-1960 was proposed to move the complexity to value vector from operator.  [~sphillips] would know better than me about the background.

Also, the proposal seems to focus on scan operator.  Many operators might be able to produce a value vector beyond allowed size. Project is one example. Exchange operator, selection vector remover, aggregator, etc, which has to evaluate an expression, or reshuffle/copy data, could run into such situations.  My guess it would require huge effort to modify all the impacted operators to enforce the "size-aware" vector writer policy. 


1. https://issues.apache.org/jira/browse/DRILL-1960

> Queries fail due to direct memory fragmentation
> -----------------------------------------------
>
>                 Key: DRILL-5211
>                 URL: https://issues.apache.org/jira/browse/DRILL-5211
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>             Fix For: 1.9.0
>
>         Attachments: ApacheDrillMemoryFragmentationBackground.pdf, ApacheDrillVectorSizeLimits.pdf, EnhancedScanOperator.pdf, ScanSchemaManagement.pdf
>
>
> Consider a test of the external sort as follows:
> * Direct memory: 3GB
> * Input file: 18 GB, with one Varchar column of 8K width
> The sort runs, spilling to disk. Once all data arrives, the sort beings to merge the results. But, to do that, it must first do an intermediate merge. For example, in this sort, there are 190 spill files, but only 19 can be merged at a time. (Each merge file contains 128 MB batches, and only 19 can fit in memory, giving a total footprint of 2.5 GB, well below the 3 GB limit.
> Yet, when loading batch xx, Drill fails with an OOM error. At that point, total available direct memory is 3,817,865,216. (Obtained from {{maxMemory}} in the {{Bits}} class in the JDK.)
> It appears that Drill wants to allocate 58,257,868 bytes, but the {{totalCapacity}} (again in {{Bits}}) is already 3,800,769,206, causing an OOM.
> The problem is that, at this point, the external sort should not ask the system for more memory. The allocator for the external sort is at just 1,192,350,366 before the allocation request. Plenty of spare memory should be available, released when the in-memory batches were spilled to disk prior to merging. Indeed, earlier in the run, the sort had reached a peak memory usage of 2,710,716,416 bytes. This memory should be available for reuse during merging, and is plenty sufficient to fill the particular request in question.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)