You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@impala.apache.org by "Tianyi Wang (JIRA)" <ji...@apache.org> on 2017/09/06 20:25:00 UTC
[jira] [Resolved] (IMPALA-5210) Nested types : Scans spend 30% of
CPU in impala::RuntimeProfile::Counter::Add and 8% in apic_timer_interrupt
[ https://issues.apache.org/jira/browse/IMPALA-5210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tianyi Wang resolved IMPALA-5210.
---------------------------------
Resolution: Fixed
Fix Version/s: Impala 2.10.0
IMPALA-5210: Count rows and collection items in parquet scanner separately
This patch adds collection_items_read_counter in scan node, makes
rows_read_counter count top-level rows only, and updates these counters
in a less frequent manner.
When scanning nested columns, current code counts both top-level rows
and nested rows in rows_read_counter, which is inconsistent with
rows_returned_counter. Furthermore, rows_read_counter is updated eagerly
whenever a batch of collection items are read. As a result it spends
around 10% time updating the counter with the following simple query:
>select count(*) from
> customer c,
> c.c_orders o,
> o.o_lineitems l
>where
> c_mktsegment = 'BUILDING'
> and o_orderdate < '1995-03-15'
> and l_shipdate > '1995-03-15' and o_orderkey = 10;
This patch moves collection items counting into
collection_items_read_counter. Both counters are updated for every row
batch read. In the query described above, scanning time is decreased by
10.4%.
Change-Id: I7f6efddaea18507482940f5bdab7326b6482b067
Reviewed-on: http://gerrit.cloudera.org:8080/7776
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins
> Nested types : Scans spend 30% of CPU in impala::RuntimeProfile::Counter::Add and 8% in apic_timer_interrupt
> ------------------------------------------------------------------------------------------------------------
>
> Key: IMPALA-5210
> URL: https://issues.apache.org/jira/browse/IMPALA-5210
> Project: IMPALA
> Issue Type: Bug
> Components: Backend
> Affects Versions: Impala 2.7.0, Impala 2.8.0, Impala 2.9.0
> Reporter: Mostafa Mokhtar
> Assignee: Tianyi Wang
> Labels: nested_types, newbie, perfomance
> Fix For: Impala 2.10.0
>
> Attachments: hotspots.csv, top-down.csv.zip, tpch-q3-nested-simplified.txt
>
>
> Methods that call RuntimeProfile::Counter::Add
> |Callers||Effective Time by Utilization|
> |impala::RuntimeProfile::Counter::Add|100.00%|
> | impala::HdfsParquetScanner::AssembleCollection|99.20%|
> | impala::ScopedTimer<impala::MonotonicStopWatch>::UpdateCounter|0.40%|
> | impala::NestedLoopJoinNode::FindBuildMatches|0.30%|
> | [Unknown stack frame(s)]|0.10%|
> | impala::BlockingJoinNode::GetFirstProbeRow|0.00%|
> | impala::HdfsParquetScanner::AssembleRows|0.00%|
> Methods called by RuntimeProfile::Counter::Add
> |Callees||Effective Time by Utilization||Effective Time by Utilization|
> |impala::RuntimeProfile::Counter::Add|100.00%|0.00%|
> | impala::internal::AtomicInt<long>::Add|100.00%|0.00%|
> | Barrier_AtomicIncrement|100.00%|62.80%|
> | apic_timer_interrupt|29.30%|29.30%|
> | retint_careful|7.40%|7.40%|
> | common_interrupt|0.40%|0.10%|
> | paranoid_schedule|0.00%|0.00%|
> | __strncmp_sse42|0.00%|0.00%|
> | [Outside any known module]|0.00%|0.00%|
> | operator delete|0.00%|0.00%|
> | impala::ExprContext::GetBooleanVal|0.00%|0.00%|
> Methods that call apic_timer_interrupt
> |Callers||Effective Time by Utilization||Effective Time by Utilization|
> |apic_timer_interrupt|100.00%|99.70%|
> | impala::HdfsParquetScanner::AssembleCollection|32.10%|32.10%|
> | Barrier_AtomicIncrement|32.00%|31.90%|
> | impala::CollectionValueBuilder::GetFreeMemory|3.20%|3.20%|
> | impala::DictDecoder<impala::StringValue>::GetNextValue|3.00%|3.00%|
> | GetValue<int>|2.10%|2.10%|
> | impala::HdfsParquetScanner::ReadCollectionItem|1.90%|1.90%|
> Query
> {code}
> select count(*) from
> customer c,
> c.c_orders o,
> o.o_lineitems l
> where
> c_mktsegment = 'BUILDING'
> and o_orderdate < '1995-03-15'
> and l_shipdate > '1995-03-15' and o_orderkey = 10;
> {code}
> Table
> {code}
> +--------------+------------------------------------+---------+
> | name | type | comment |
> +--------------+------------------------------------+---------+
> | c_custkey | bigint | |
> | c_name | string | |
> | c_address | string | |
> | c_nationkey | bigint | |
> | c_phone | string | |
> | c_acctbal | decimal(12,2) | |
> | c_mktsegment | string | |
> | c_comment | string | |
> | c_orders | array<struct< | |
> | | o_orderkey:bigint, | |
> | | o_orderstatus:string, | |
> | | o_totalprice:decimal(12,2), | |
> | | o_orderdate:string, | |
> | | o_orderpriority:string, | |
> | | o_clerk:string, | |
> | | o_shippriority:bigint, | |
> | | o_comment:string, | |
> | | o_lineitems:array<struct< | |
> | | l_partkey:bigint, | |
> | | l_suppkey:bigint, | |
> | | l_linenumber:bigint, | |
> | | l_quantity:decimal(12,2), | |
> | | l_extendedprice:decimal(12,2), | |
> | | l_discount:decimal(12,2), | |
> | | l_tax:decimal(12,2), | |
> | | l_returnflag:string, | |
> | | l_linestatus:string, | |
> | | l_shipdate:string, | |
> | | l_commitdate:string, | |
> | | l_receiptdate:string, | |
> | | l_shipinstruct:string, | |
> | | l_shipmode:string, | |
> | | l_comment:string | |
> | | >> | |
> | | >> | |
> +--------------+------------------------------------+---------+
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)