You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Bikramjeet Vig (JIRA)" <ji...@apache.org> on 2017/07/20 20:38:00 UTC

[jira] [Resolved] (IMPALA-5520) TopN node does not reuse string memory

     [ https://issues.apache.org/jira/browse/IMPALA-5520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bikramjeet Vig resolved IMPALA-5520.
------------------------------------
    Resolution: Fixed


IMPALA-5520: TopN node periodically reclaims old allocations

Currently TopN retains old string allocations in a tuple pool which is
held longer than necessary, resulting in unnecessary memory usage.
With this commit, the TopN node will periodically re-materialise the
rows stored in the priority queue and reclaim the old allocations.
This is done when the number of rows removed from the priority queue
is more than twice the N (limit + offset). Moreover, a new counter
called "TuplePoolReclamations" is added to the TopN node that keeps
track of the number of times the tuple pool is reclaimed.

Testing:
Test added to test_queries.py which sets a low mem_limit such
that the test would fail if reclamation is not implemented and pass
otherwise.

Performance:
Query 1 (expected general case):
select * from tpch.lineitem order by l_orderkey desc limit 10;

Query 2 (example worst case: data stored in reverse order before
feeding to the last TopN node):
select * from (select * from tpch.lineitem order by l_orderkey desc
limit 6001215) tb order by l_orderkey limit 10;


{noformat}
                       With Reclaim           Without Reclaim
                   Query 1     Query 2      Query 1     Query 2
MaxTuplePoolMem    3.96 KB     3.43 KB      110.2 MB    708.8 MB
Time (mean)        2s 218ms    6s 391ms     2s 021ms    6s 406ms
Time (stdev)       74.38ms     67.45ms      102.71ms    70.44ms
Reclaims            910         5861          N/A         N/A
{noformat}


We notice that memory footprint is orders of magnitude lower while
maintaining similar query runtimes. Cluster perf testing will be done
later.

Change-Id: I968f57f0ff2905bd581908bc5c5ee486b31e6aa8
Reviewed-on: http://gerrit.cloudera.org:8080/7400
Reviewed-by: Matthew Jacobs <mj...@cloudera.com>
Tested-by: Impala Public Jenkins

> TopN node does not reuse string memory
> --------------------------------------
>
>                 Key: IMPALA-5520
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5520
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>            Reporter: Tim Armstrong
>            Assignee: Bikramjeet Vig
>              Labels: ramp-up, resource-management
>
> In some cases TopN will use excessive memory. E.g. if you have a large number of input rows containing strings sorted in reverse order, it will allocate memory for all of the strings and never free it.
> We should either recycle the allocations or periodically re-materialise and garbage collect the old allocations
> There is a TODO in the code already.
> {code}
>       // TODO: DeepCopy() will allocate new buffers for the string data. This needs
>       // to be fixed to use a freelist
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)