You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Rajesh Balamohan (Jira)" <ji...@apache.org> on 2022/02/02 03:37:00 UTC
[jira] [Resolved] (HIVE-25913) Dynamic Partition Pruning Operator: Not working in iceberg tables
[ https://issues.apache.org/jira/browse/HIVE-25913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rajesh Balamohan resolved HIVE-25913.
-------------------------------------
Resolution: Not A Bug
> Dynamic Partition Pruning Operator: Not working in iceberg tables
> -----------------------------------------------------------------
>
> Key: HIVE-25913
> URL: https://issues.apache.org/jira/browse/HIVE-25913
> Project: Hive
> Issue Type: Improvement
> Reporter: Rajesh Balamohan
> Priority: Major
>
> Notice "Dynamic Partitioning Event Operator" missing in Map 3 in iceberg tables. Partition keys are added as embedded fields in iceberg tables. This causes issues in DynamicPartitionPruningOptimisation codepath. E.g [https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/DynamicPartitionPruningOptimization.java#L182]
> This causes heavy IO in iceberg tables leading to perf degradation.
> {noformat}
> ACID table
> ==========
> explain select count(*) from store_sales, date_dim where d_month_seq between 1212 and 1212+11 and ss_store_sk is not null and ss_sold_date_sk=d_date_sk;
> Explain
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: hive_20220131032425_be2fab7f-7943-4aa1-bbdd-289139ea0f90:17
> Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> DagName: hive_20220131032425_be2fab7f-7943-4aa1-bbdd-289139ea0f90:17
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> filterExpr: ss_store_sk is not null (type: boolean)
> Statistics: Num rows: 27503885621 Data size: 434880571744 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ss_store_sk is not null (type: boolean)
> Statistics: Num rows: 26856185846 Data size: 424639398832 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: ss_sold_date_sk (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 26856185846 Data size: 214849486768 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: bigint)
> 1 _col0 (type: bigint)
> input vertices:
> 1 Map 3
> Statistics: Num rows: 5279977323 Data size: 42239818584 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: count()
> minReductionHashAggr: 0.99
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> null sort order:
> sort order:
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: bigint)
> Execution mode: vectorized, llap
> LLAP IO: may be used (ACID table)
> Map 3
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 876588 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 359 Data size: 4308 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 359 Data size: 2872 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: bigint)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: bigint)
> Statistics: Num rows: 359 Data size: 2872 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 359 Data size: 2872 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: bigint)
> minReductionHashAggr: 0.5013927
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 179 Data size: 1432 Basic stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target column: ss_sold_date_sk (bigint)
> Target Input: store_sales
> Partition key expr: ss_sold_date_sk
> Statistics: Num rows: 179 Data size: 1432 Basic stats: COMPLETE Column stats: COMPLETE
> Target Vertex: Map 1
> Execution mode: vectorized, llap
> LLAP IO: may be used (ACID table)
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0
>
> Iceberg table:
> ==============
> Explain
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: hive_20220131032448_d1948a30-39f4-48e5-8fb3-9b9a7220824f:18
> Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)
> DagName: hive_20220131032448_d1948a30-39f4-48e5-8fb3-9b9a7220824f:18
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> filterExpr: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean)
> Statistics: Num rows: 28800426268 Data size: 437766480832 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (ss_store_sk is not null and ss_sold_date_sk is not null) (type: boolean)
> Statistics: Num rows: 25920383834 Data size: 393989835679 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: ss_sold_date_sk (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 25920383834 Data size: 393989835679 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: bigint)
> 1 _col0 (type: bigint)
> input vertices:
> 1 Map 3
> Statistics: Num rows: 28512422835 Data size: 433388828640 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> aggregations: count()
> minReductionHashAggr: 0.99
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> null sort order:
> sort order:
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: bigint)
> Execution mode: vectorized, llap
> LLAP IO: all inputs (cache only)
> Map 3
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 832776 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (d_month_seq BETWEEN 1212 AND 1223 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: d_date_sk (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: bigint)
> null sort order: a
> sort order: +
> Map-reduce partition columns: _col0 (type: bigint)
> Statistics: Num rows: 1 Data size: 11 Basic stats: COMPLETE Column stats: NONE
> Execution mode: vectorized, llap
> LLAP IO: all inputs (cache only)
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe Stage: Stage-0{noformat}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)