You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/02/17 22:32:44 UTC

[jira] [Updated] (DRILL-5023) ExternalSortBatch does not spill fully, throws off spill calculations

     [ https://issues.apache.org/jira/browse/DRILL-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-5023:
-------------------------------
    Fix Version/s: 1.10.0

> ExternalSortBatch does not spill fully, throws off spill calculations
> ---------------------------------------------------------------------
>
>                 Key: DRILL-5023
>                 URL: https://issues.apache.org/jira/browse/DRILL-5023
>             Project: Apache Drill
>          Issue Type: Sub-task
>    Affects Versions: 1.8.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Minor
>             Fix For: 1.10.0
>
>
> The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as needed to operate within a defined memory budget.
> When needed, ESB spills accumulated record batches to disk. However, when doing so, the ESB carves off the first spillable batch and holds it in memory:
> {code}
>     // 1 output container is kept in memory, so we want to hold on to it and transferClone
>     // allows keeping ownership
>     VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, oContext);
>     c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
>     c1.setRecordCount(count);
> ...
>     BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
> {code}
> When the spill batch size gets larger (to fix DRILL-5022), the result is that nothing is spilled as the first spillable batch is simply stored back into memory on the (supposedly) spilled batches list.
> The desired behavior is for all spillable batches to be written to disk. If the first batch is held back to work around some issue (to keep a schema, say?), then fine a different solution that allows the actual data to spill.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)