You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2017/09/06 20:25:00 UTC

[jira] [Resolved] (IMPALA-5210) Nested types : Scans spend 30% of CPU in impala::RuntimeProfile::Counter::Add and 8% in apic_timer_interrupt

     [ https://issues.apache.org/jira/browse/IMPALA-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tianyi Wang resolved IMPALA-5210.
---------------------------------
       Resolution: Fixed
    Fix Version/s: Impala 2.10.0


IMPALA-5210: Count rows and collection items in parquet scanner separately

This patch adds collection_items_read_counter in scan node, makes
rows_read_counter count top-level rows only, and updates these counters
in a less frequent manner.
When scanning nested columns, current code counts both top-level rows
and nested rows in rows_read_counter, which is inconsistent with
rows_returned_counter. Furthermore, rows_read_counter is updated eagerly
whenever a batch of collection items are read. As a result it spends
around 10% time updating the counter with the following simple query:
>select count(*) from
> customer c,
> c.c_orders o,
> o.o_lineitems l
>where
> c_mktsegment = 'BUILDING'
> and o_orderdate < '1995-03-15'
> and l_shipdate > '1995-03-15' and o_orderkey = 10;

This patch moves collection items counting into
collection_items_read_counter. Both counters are updated for every row
batch read. In the query described above, scanning time is decreased by
10.4%.

Change-Id: I7f6efddaea18507482940f5bdab7326b6482b067
Reviewed-on: http://gerrit.cloudera.org:8080/7776
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins



> Nested types : Scans spend 30% of CPU in impala::RuntimeProfile::Counter::Add and 8% in apic_timer_interrupt
> ------------------------------------------------------------------------------------------------------------
>
>                 Key: IMPALA-5210
>                 URL: https://issues.apache.org/jira/browse/IMPALA-5210
>             Project: IMPALA
>          Issue Type: Bug
>          Components: Backend
>    Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
>            Reporter: Mostafa Mokhtar
>            Assignee: Tianyi Wang
>              Labels: nested_types, newbie, perfomance
>             Fix For: Impala 2.10.0
>
>         Attachments: hotspots.csv, top-down.csv.zip, tpch-q3-nested-simplified.txt
>
>
> Methods that call RuntimeProfile::Counter::Add
> |Callers||Effective Time by Utilization|
> |impala::RuntimeProfile::Counter::Add|100.00%|
> |  impala::HdfsParquetScanner::AssembleCollection|99.20%|
> |  impala::ScopedTimer<impala::MonotonicStopWatch>::UpdateCounter|0.40%|
> |  impala::NestedLoopJoinNode::FindBuildMatches|0.30%|
> |  [Unknown stack frame(s)]|0.10%|
> |  impala::BlockingJoinNode::GetFirstProbeRow|0.00%|
> |  impala::HdfsParquetScanner::AssembleRows|0.00%|
> Methods called by RuntimeProfile::Counter::Add
> |Callees||Effective Time by Utilization||Effective Time by Utilization|
> |impala::RuntimeProfile::Counter::Add|100.00%|0.00%|
> |  impala::internal::AtomicInt<long>::Add|100.00%|0.00%|
> |    Barrier_AtomicIncrement|100.00%|62.80%|
> |      apic_timer_interrupt|29.30%|29.30%|
> |      retint_careful|7.40%|7.40%|
> |      common_interrupt|0.40%|0.10%|
> |      paranoid_schedule|0.00%|0.00%|
> |      __strncmp_sse42|0.00%|0.00%|
> |      [Outside any known module]|0.00%|0.00%|
> |      operator delete|0.00%|0.00%|
> |      impala::ExprContext::GetBooleanVal|0.00%|0.00%|
> Methods that call apic_timer_interrupt
> |Callers||Effective Time by Utilization||Effective Time by Utilization|
> |apic_timer_interrupt|100.00%|99.70%|
> |  impala::HdfsParquetScanner::AssembleCollection|32.10%|32.10%|
> |  Barrier_AtomicIncrement|32.00%|31.90%|
> |  impala::CollectionValueBuilder::GetFreeMemory|3.20%|3.20%|
> |  impala::DictDecoder<impala::StringValue>::GetNextValue|3.00%|3.00%|
> |  GetValue<int>|2.10%|2.10%|
> |  impala::HdfsParquetScanner::ReadCollectionItem|1.90%|1.90%|
> Query
> {code}
> select count(*) from
>   customer c,
>   c.c_orders o,
>   o.o_lineitems l
> where
>   c_mktsegment = 'BUILDING'
>   and o_orderdate < '1995-03-15'
>   and l_shipdate > '1995-03-15' and o_orderkey = 10;
> {code}
> Table
> {code}
> +--------------+------------------------------------+---------+
> | name         | type                               | comment |
> +--------------+------------------------------------+---------+
> | c_custkey    | bigint                             |         |
> | c_name       | string                             |         |
> | c_address    | string                             |         |
> | c_nationkey  | bigint                             |         |
> | c_phone      | string                             |         |
> | c_acctbal    | decimal(12,2)                      |         |
> | c_mktsegment | string                             |         |
> | c_comment    | string                             |         |
> | c_orders     | array<struct<                      |         |
> |              |   o_orderkey:bigint,               |         |
> |              |   o_orderstatus:string,            |         |
> |              |   o_totalprice:decimal(12,2),      |         |
> |              |   o_orderdate:string,              |         |
> |              |   o_orderpriority:string,          |         |
> |              |   o_clerk:string,                  |         |
> |              |   o_shippriority:bigint,           |         |
> |              |   o_comment:string,                |         |
> |              |   o_lineitems:array<struct<        |         |
> |              |     l_partkey:bigint,              |         |
> |              |     l_suppkey:bigint,              |         |
> |              |     l_linenumber:bigint,           |         |
> |              |     l_quantity:decimal(12,2),      |         |
> |              |     l_extendedprice:decimal(12,2), |         |
> |              |     l_discount:decimal(12,2),      |         |
> |              |     l_tax:decimal(12,2),           |         |
> |              |     l_returnflag:string,           |         |
> |              |     l_linestatus:string,           |         |
> |              |     l_shipdate:string,             |         |
> |              |     l_commitdate:string,           |         |
> |              |     l_receiptdate:string,          |         |
> |              |     l_shipinstruct:string,         |         |
> |              |     l_shipmode:string,             |         |
> |              |     l_comment:string               |         |
> |              |   >>                               |         |
> |              | >>                                 |         |
> +--------------+------------------------------------+---------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)