You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2016/11/09 06:22:58 UTC

[jira] [Commented] (DRILL-5023) ExternalSortBatch does not spill fully, throws off spill calculations

    [ https://issues.apache.org/jira/browse/DRILL-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649938#comment-15649938 ] 

Paul Rogers commented on DRILL-5023:
------------------------------------

More detail. This behavior seems to be an artifact of the way that {{BatchGroup}} was written. It seems to require that each group has a "current container." When spilling, there really is no need for a current container. But, because the close and and other methods assume one, it appears that the code simply adds a container just to get things to work.

The result of this hack is that one spill batch is kept in memory per spill session. This "overhead" is not considered when determining when to spill next, causing an unaccounted-for accumulation of in-memory buffered rows.

The proper solution is to modify the {{BatchGroup}} class for the spill case so that it does not require a spurious container.

> ExternalSortBatch does not spill fully, throws off spill calculations
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5023
>                 URL: https://issues.apache.org/jira/browse/DRILL-5023
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as needed to operate within a defined memory budget.
> When needed, ESB spills accumulated record batches to disk. However, when doing so, the ESB carves off the first spillable batch and holds it in memory:
> {code}
>     // 1 output container is kept in memory, so we want to hold on to it and transferClone
>     // allows keeping ownership
>     VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, oContext);
>     c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
>     c1.setRecordCount(count);
> ...
>     BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
> {code}
> When the spill batch size gets larger (to fix DRILL-5022), the result is that nothing is spilled as the first spillable batch is simply stored back into memory on the (supposedly) spilled batches list.
> The desired behavior is for all spillable batches to be written to disk. If the first batch is held back to work around some issue (to keep a schema, say?), then fine a different solution that allows the actual data to spill.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)