You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Tim Armstrong (Jira)" <ji...@apache.org> on 2020/06/13 01:09:00 UTC

[jira] [Commented] (IMPALA-3471) TopN should be able to spill

    [ https://issues.apache.org/jira/browse/IMPALA-3471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134613#comment-17134613 ] 

Tim Armstrong commented on IMPALA-3471:
---------------------------------------

I think we still want to use the regular external sort implementation for large limits, but we could make some further optimisations to avoid spilling as much data. Specifically, in SortCurrentInputRun() we could truncate the in-memory sorted run, and then when merging sorted runs we can apply the limit there too.

There are additional tricks that we could add to optimise this for spilling sorts further, mostly various ways to keep track of the upper bound on the row that would be past the threshold.

> TopN should be able to spill
> ----------------------------
>
>                 Key: IMPALA-3471
>                 URL: https://issues.apache.org/jira/browse/IMPALA-3471
>             Project: IMPALA
>          Issue Type: Improvement
>          Components: Backend
>    Affects Versions: Impala 2.6.0
>            Reporter: Jim Apple
>            Priority: Minor
>
> TopN nodes store OFFSET + LIMIT  tuples in memory. (In fact, in a vector which will throw an exception if allocation fails.) It would be nice to check allocations before they fail and spill when there isn't enough memory.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org