You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Rajesh Balamohan (Jira)" <ji...@apache.org> on 2022/01/31 04:00:00 UTC
[jira] [Created] (HIVE-25913) Dynamic Partition Pruning Operator: Not working in iceberg tables
Rajesh Balamohan created HIVE-25913:
---------------------------------------
Summary: Dynamic Partition Pruning Operator: Not working in iceberg tables
Key: HIVE-25913
URL: https://issues.apache.org/jira/browse/HIVE-25913
Project: Hive
Issue Type: Improvement
Reporter: Rajesh Balamohan
Notice "Dynamic Partitioning Event Operator" missing in Map 3 in iceberg tables.
This causes heavy IO in iceberg tables leading to perf degradation.
{noformat}
ACID table
==========
explain select count(*) from store_sales, date_dim where d_month_seq between 1212 and 1212+11 and ss_store_sk is not null and ss_sold_date_sk=d_date_sk;
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1STAGE PLANS:
Stage: Stage-1
Tez
DagId: hive_20220131032425_be2fab7f-7943-4aa1-bbdd-289139ea0f90:17
Edges:
Map 1 <- Map 3 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
DagName: hive_20220131032425_be2fab7f-7943-4aa1-bbdd-289139ea0f90:17
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
filterExpr: ss_store_sk is not null (type: boolean)
Statistics: Num rows: 27503885621 Data size: 434880571744 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ss_store_sk is not null (type: boolean)
Statistics: Num rows: 26856185846 Data size: 424639398832 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ss_sold_date_sk (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 26856185846 Data size: 214849486768 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: bigint)
1 _col0 (type: bigint)
input vertices:
1 Map 3
Statistics: Num rows: 5279977323 Data size: 42239818584 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: count()
minReductionHashAggr: 0.99
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col0 (type: bigint)
Execution mode: vectorized, llap
LLAP IO: may be used (ACID table)
Map 3
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 876588 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 359 Data size: 4308 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 359 Data size: 2872 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: bigint)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 359 Data size: 2872 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 359 Data size: 2872 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: bigint)
minReductionHashAggr: 0.5013927
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 179 Data size: 1432 Basic stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target column: ss_sold_date_sk (bigint)
Target Input: store_sales
Partition key expr: ss_sold_date_sk
Statistics: Num rows: 179 Data size: 1432 Basic stats: COMPLETE Column stats: COMPLETE
Target Vertex: Map 1
Execution mode: vectorized, llap
LLAP IO: may be used (ACID table)
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
Iceberg table:
==============
Explain
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1STAGE PLANS:
Stage: Stage-1
Tez
DagId: hive_20220131032448_d1948a30-39f4-48e5-8fb3-9b9a7220824f:18
Edges:
Map 1 <- Map 3 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
DagName: hive_20220131032448_d1948a30-39f4-48e5-8fb3-9b9a7220824f:18
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
filterExpr: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean)
Statistics: Num rows: 28800426268 Data size: 437766480832 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean)
Statistics: Num rows: 25920383834 Data size: 393989835679 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: ss_sold_date_sk (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 25920383834 Data size: 393989835679 Basic stats: COMPLETE Column stats: NONE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: bigint)
1 _col0 (type: bigint)
input vertices:
1 Map 3
Statistics: Num rows: 28512422835 Data size: 433388828640 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
minReductionHashAggr: 0.99
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
null sort order:
sort order:
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
value expressions: _col0 (type: bigint)
Execution mode: vectorized, llap
LLAP IO: all inputs (cache only)
Map 3
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 832776 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: d_date_sk (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: bigint)
null sort order: a
sort order: +
Map-reduce partition columns: _col0 (type: bigint)
Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE Column stats: NONE
Execution mode: vectorized, llap
LLAP IO: all inputs (cache only)
Reducer 2
Execution mode: vectorized, llap
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
mode: mergepartial
outputColumnNames: _col0
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.SequenceFileInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0{noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)