You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tim Armstrong (JIRA)" <ji...@apache.org> on 2017/07/11 03:24:00 UTC

[jira] [Resolved] (IMPALA-5629) list::size() in BufferedTupleStreamV2::AdvanceWritePage() is expensive

     [ https://issues.apache.org/jira/browse/IMPALA-5629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tim Armstrong resolved IMPALA-5629.
-----------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0


IMPALA-5629: avoid expensive list::size() call

As a workaround until we move to GCC5+, explicitly track the pages_
list size. This is not too bad in practice since it is only mutated
in 3 places.

Testing:
Ran buffered-tuple-stream-v2-test (the only coverage of
BufferedTupleStreamV2 currently).

Reran the query with the perf issue, confirmed that it was no longer
spending lots of time in BufferedTupleStreamV2::AdvanceWritePage().

Change-Id: Id83fcf68dcc3ea729df167885f999ff32b861e66
Reviewed-on: http://gerrit.cloudera.org:8080/7382
Reviewed-by: Dan Hecht <dh...@cloudera.com>
Tested-by: Impala Public Jenkins

> list::size() in BufferedTupleStreamV2::AdvanceWritePage() is expensive
> ----------------------------------------------------------------------
>
>                 Key: IMPALA-5629
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5629
>             Project: IMPALA
>          Issue Type: Sub-task
>          Components: Backend
>    Affects Versions: Impala 2.10.0
>            Reporter: Tim Armstrong
>            Assignee: Tim Armstrong
>              Labels: perf
>             Fix For: Impala 2.10.0
>
>
> In a test run executing a very large join I saw a lot of CPU being burnt in BufferedTupleStreamV2::AdvanceWritePage() 
> It looks like it's all being spent iterating over the pages_ linked list. list::size() is an O(n) operation in some implementations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)