You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Matt McCline (JIRA)" <ji...@apache.org> on 2018/08/30 05:00:02 UTC
[jira] [Resolved] (HIVE-9068) Hive : With CBO disabled
Vectorization in Map join disabled causing 100% increase in elapsed time
and CPU (possibly due to redundant filter operator)
[ https://issues.apache.org/jira/browse/HIVE-9068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt McCline resolved HIVE-9068.
--------------------------------
Resolution: Incomplete
> Hive : With CBO disabled Vectorization in Map join disabled causing 100% increase in elapsed time and CPU (possibly due to redundant filter operator)
> -----------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-9068
> URL: https://issues.apache.org/jira/browse/HIVE-9068
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Matt McCline
> Priority: Major
> Fix For: 0.14.1
>
>
> With CBO off there is a redundant filter operator
> {code}
> Filter Operator
> predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
> {code}
> Possibly this is why Vectorization is getting disabled with CBO off, this operator doesn't exist with CBO on.
> Query
> {code}
> select
> count(*)
> from
> (SELECT
> 'store' as channel,
> 'ss_addr_sk' col_name,
> d_year,
> d_qoy,
> i_category,
> ss_ext_sales_price ext_sales_price
> FROM
> store_sales, item, date_dim
> WHERE
> ss_addr_sk IS NULL
> AND store_sales.ss_sold_date_sk = date_dim.d_date_sk
> AND store_sales.ss_item_sk = item.i_item_sk) a;
> {code}
> Explain with CBO OFF
> {code}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName: mmokhtar_20141210171212_02c36f60-ceea-4e18-a266-5baecfd023f2:6
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> filterExpr: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
> Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (ss_item_sk is not null and ss_addr_sk is null) (type: boolean)
> Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {ss_item_sk} {ss_sold_date_sk}
> 1 {i_item_sk}
> keys:
> 0 ss_item_sk (type: int)
> 1 i_item_sk (type: int)
> outputColumnNames: _col1, _col22, _col26
> input vertices:
> 1 Map 4
> Statistics: Num rows: 1946839936 Data size: 23362079232 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {_col1} {_col22} {_col26}
> 1 {d_date_sk}
> keys:
> 0 _col22 (type: int)
> 1 d_date_sk (type: int)
> outputColumnNames: _col1, _col22, _col26, _col51
> input vertices:
> 1 Map 3
> Statistics: Num rows: 2176800197 Data size: 34828803152 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((null is null and (_col22 = _col51)) and (_col1 = _col26)) (type: boolean)
> Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> Statistics: Num rows: 272100024 Data size: 4353600384 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: bigint)
> Map 3
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: d_date_sk (type: int)
> sort order: +
> Map-reduce partition columns: d_date_sk (type: int)
> Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: store_sales
> Partition key expr: ss_sold_date_sk
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Target column: ss_sold_date_sk
> Target Vertex: Map 1
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: i_item_sk (type: int)
> sort order: +
> Map-reduce partition columns: i_item_sk (type: int)
> Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Reducer 2
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> Explain with CBO on
> {code}
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE), Map 4 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName: mmokhtar_20141210171212_495d0eb9-d176-43d3-8101-84821a0c0fdf:5
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> filterExpr: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
> Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (ss_addr_sk is null and ss_item_sk is not null) (type: boolean)
> Statistics: Num rows: 1946839900 Data size: 23178336456 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: ss_item_sk (type: int), ss_sold_date_sk (type: int)
> outputColumnNames: _col0, _col2
> Statistics: Num rows: 1946839900 Data size: 15574719200 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0
> 1 {_col2}
> keys:
> 0 _col0 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col3
> input vertices:
> 0 Map 4
> Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col3 (type: int)
> outputColumnNames: _col3
> Statistics: Num rows: 1946839936 Data size: 7787359744 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0
> 1
> keys:
> 0 _col0 (type: int)
> 1 _col3 (type: int)
> input vertices:
> 0 Map 3
> Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
> Select Operator
> Statistics: Num rows: 3232152511019 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
> Group By Operator
> aggregations: count()
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: bigint)
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: d_date_sk is not null (type: boolean)
> Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 73049 Data size: 292196 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: store_sales
> Partition key expr: ss_sold_date_sk
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Target column: ss_sold_date_sk
> Target Vertex: Map 1
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: i_item_sk is not null (type: boolean)
> Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: i_item_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 462000 Data size: 1848000 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Reducer 2
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> Time taken: 3.874 seconds, Fetched: 144 row(s)
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)