You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "liyunzhang_intel (JIRA)" <ji...@apache.org> on 2017/07/25 03:16:00 UTC
[jira] [Commented] (HIVE-16948) Invalid explain when running dynamic partition pruning query in HOS

    [ https://issues.apache.org/jira/browse/HIVE-16948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16099424#comment-16099424 ] 

liyunzhang_intel commented on HIVE-16948:
-----------------------------------------

the reason why Map4 does not exist in the explain is because of [CombineEquivalentWorkResolver|https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/CombineEquivalentWorkResolver.java]

before CombineEquivalentWorkResolver optimization is enabled,Map4 exists in the explain, after CombineEquivalentWorkResolver which will find and combine equivalent works.  Map4 is deleted because Map4 equals Map1. So we need to remove the spark dynamic pruning sink branch in Reducer 11 and Reducer 13 in Stage-2.

[~lirui], [~stakiar], [~csun] please help review, thanks!


> Invalid explain when running dynamic partition pruning query in HOS
> -------------------------------------------------------------------
>
>                 Key: HIVE-16948
>                 URL: https://issues.apache.org/jira/browse/HIVE-16948
>             Project: Hive
>          Issue Type: Bug
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>
>  union_subquery.q 
> {code}
> set hive.optimize.ppd=true;
> set hive.ppd.remove.duplicatefilters=true;
> set hive.spark.dynamic.partition.pruning=true;
> set hive.optimize.metadataonly=false;
> set hive.optimize.index.filter=true;
> set hive.strict.checks.cartesian.product=false;
> explain select ds from (select distinct(ds) as ds from srcpart union all select distinct(ds) as ds from srcpart) s where s.ds in (select max(srcpart.ds) from srcpart union all select min(srcpart.ds) from srcpart);
> {code}
> explain 
> {code}
> STAGE DEPENDENCIES:
>   Stage-2 is a root stage
>   Stage-1 depends on stages: Stage-2
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-2
>     Spark
>       Edges:
>         Reducer 11 <- Map 10 (GROUP, 1)
>         Reducer 13 <- Map 12 (GROUP, 1)
>       DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:2
>       Vertices:
>         Map 10 
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                     Group By Operator
>                       aggregations: max(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order: 
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>         Map 12 
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                     Group By Operator
>                       aggregations: min(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order: 
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>         Reducer 11 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: max(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                           target column name: ds
>                           target work: Map 1
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                           target column name: ds
>                           target work: Map 4
>         Reducer 13 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: min(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                           target column name: ds
>                           target work: Map 1
>                     Select Operator
>                       expressions: _col0 (type: string)
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                       Group By Operator
>                         keys: _col0 (type: string)
>                         mode: hash
>                         outputColumnNames: _col0
>                         Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                         Spark Partition Pruning Sink Operator
>                           partition key expr: ds
>                           Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                           target column name: ds
>                           target work: Map 4
>   Stage: Stage-1
>     Spark
>       Edges:
>         Reducer 2 <- Map 1 (GROUP, 2)
>         Reducer 3 <- Reducer 2 (PARTITION-LEVEL SORT, 2), Reducer 2 (PARTITION-LEVEL SORT, 2), Reducer 7 (PARTITION-LEVEL SORT, 2), Reducer 9 (PARTITION-LEVEL SORT, 2)
>         Reducer 7 <- Map 6 (GROUP, 1)
>         Reducer 9 <- Map 8 (GROUP, 1)
>       DagName: root_20170622231525_20a777e5-e659-4138-b605-65f8395e18e2:1
>       Vertices:
>         Map 1 
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   filterExpr: ds is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                   Group By Operator
>                     keys: ds (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 1 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: string)
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: string)
>                       Statistics: Num rows: 1 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
>         Map 6 
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                     Group By Operator
>                       aggregations: max(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order: 
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>         Map 8 
>             Map Operator Tree:
>                 TableScan
>                   alias: srcpart
>                   Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                   Select Operator
>                     expressions: ds (type: string)
>                     outputColumnNames: ds
>                     Statistics: Num rows: 1 Data size: 23248 Basic stats: PARTIAL Column stats: NONE
>                     Group By Operator
>                       aggregations: min(ds)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                       Reduce Output Operator
>                         sort order: 
>                         Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                         value expressions: _col0 (type: string)
>         Reducer 2 
>             Reduce Operator Tree:
>               Group By Operator
>                 keys: KEY._col0 (type: string)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 23248 Basic stats: COMPLETE Column stats: NONE
>                 Reduce Output Operator
>                   key expressions: _col0 (type: string)
>                   sort order: +
>                   Map-reduce partition columns: _col0 (type: string)
>                   Statistics: Num rows: 2 Data size: 46496 Basic stats: COMPLETE Column stats: NONE
>         Reducer 3 
>             Reduce Operator Tree:
>               Join Operator
>                 condition map:
>                      Left Semi Join 0 to 1
>                 keys:
>                   0 _col0 (type: string)
>                   1 _col0 (type: string)
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 2 Data size: 51145 Basic stats: COMPLETE Column stats: NONE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 2 Data size: 51145 Basic stats: COMPLETE Column stats: NONE
>                   table:
>                       input format: org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>         Reducer 7 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: max(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: string)
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: string)
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>         Reducer 9 
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: min(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                 Filter Operator
>                   predicate: _col0 is not null (type: boolean)
>                   Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: NONE
>                   Group By Operator
>                     keys: _col0 (type: string)
>                     mode: hash
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>                     Reduce Output Operator
>                       key expressions: _col0 (type: string)
>                       sort order: +
>                       Map-reduce partition columns: _col0 (type: string)
>                       Statistics: Num rows: 2 Data size: 368 Basic stats: COMPLETE Column stats: NONE
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> the target work of Reducer11 and Reducer13 is Map4 , but Map4 does not exist in the explain 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)