You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/02/17 22:22:44 UTC
[jira] [Resolved] (DRILL-5023) ExternalSortBatch does not spill
fully, throws off spill calculations
[ https://issues.apache.org/jira/browse/DRILL-5023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Paul Rogers resolved DRILL-5023.
--------------------------------
Resolution: Fixed
> ExternalSortBatch does not spill fully, throws off spill calculations
> ---------------------------------------------------------------------
>
> Key: DRILL-5023
> URL: https://issues.apache.org/jira/browse/DRILL-5023
> Project: Apache Drill
> Issue Type: Sub-task
> Affects Versions: 1.8.0
> Reporter: Paul Rogers
> Assignee: Paul Rogers
> Priority: Minor
>
> The {{ExternalSortBatch}} (ESB) operator sorts records, spilling to disk as needed to operate within a defined memory budget.
> When needed, ESB spills accumulated record batches to disk. However, when doing so, the ESB carves off the first spillable batch and holds it in memory:
> {code}
> // 1 output container is kept in memory, so we want to hold on to it and transferClone
> // allows keeping ownership
> VectorContainer c1 = VectorContainer.getTransferClone(outputContainer, oContext);
> c1.buildSchema(BatchSchema.SelectionVectorMode.NONE);
> c1.setRecordCount(count);
> ...
> BatchGroup newGroup = new BatchGroup(c1, fs, outputFile, oContext);
> {code}
> When the spill batch size gets larger (to fix DRILL-5022), the result is that nothing is spilled as the first spillable batch is simply stored back into memory on the (supposedly) spilled batches list.
> The desired behavior is for all spillable batches to be written to disk. If the first batch is held back to work around some issue (to keep a schema, say?), then fine a different solution that allows the actual data to spill.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)