You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (JIRA)" <ji...@apache.org> on 2017/06/30 05:11:01 UTC
[jira] [Created] (DRILL-5636) External sort should not copy data
prior to spilling
Paul Rogers created DRILL-5636:
----------------------------------
Summary: External sort should not copy data prior to spilling
Key: DRILL-5636
URL: https://issues.apache.org/jira/browse/DRILL-5636
Project: Apache Drill
Issue Type: Improvement
Affects Versions: 1.8.0
Reporter: Paul Rogers
Assignee: Paul Rogers
Priority: Minor
The external sort spills data to disk under memory pressure. The sort code uses a generic mechanism to do the spilling:
* Use a "priority queue copier" to copy sorted records into a new batch
* Spill the new batch by writing the vectors for the newly-created batch
The above works fine when memory is plentiful. But, under low-memory conditions, the intermediate copy can cause OOM errors.
An improved algorithm is:
* Priority queue copier works vector-by-vector
* Serialize each vector to disk
* Release its memory
* Repeat for the next vector
The advantages of the above:
* Less intermediate memory use
* Perhaps better CPU cache performance through greater locality (all writes happen to a single vector at a time, rather than row by row)
* No change in disk format or disk write performance (because data is buffered prior to write anyway.)
*
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)