You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2017/06/12 22:52:00 UTC
[jira] [Updated] (HIVE-16793) Scalar sub-query: Scalar safety
checks for explicit group-bys
[ https://issues.apache.org/jira/browse/HIVE-16793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Gopal V updated HIVE-16793:
---------------------------
Description:
This query has an sq_count check, though is useless on a constant key.
{code}
hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type);
Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
OK
Plan optimized by CBO.
Vertex dependency in root stage
Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
Reducer 6 <- Map 5 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Map 1 vectorized, llap
File Output Operator [FS_64]
Select Operator [SEL_63] (rows=66666666 width=621)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
Filter Operator [FIL_62] (rows=66666666 width=625)
predicate:(_col5 > _col10)
Map Join Operator [MAPJOIN_61] (rows=200000000 width=625)
Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
<-Reducer 6 [BROADCAST_EDGE] vectorized, llap
BROADCAST [RS_58]
Select Operator [SEL_57] (rows=1 width=4)
Output:["_col0"]
Group By Operator [GBY_56] (rows=1 width=89)
Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
<-Map 5 [SIMPLE_EDGE] vectorized, llap
SHUFFLE [RS_55]
PartitionCols:_col0
Group By Operator [GBY_54] (rows=86 width=89)
Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
Select Operator [SEL_53] (rows=1212121 width=109)
Output:["_col1"]
Filter Operator [FIL_52] (rows=1212121 width=109)
predicate:(p_type = '1')
TableScan [TS_17] (rows=200000000 width=109)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
<-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621)
Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
<-Reducer 4 [BROADCAST_EDGE] vectorized, llap
BROADCAST [RS_51]
Select Operator [SEL_50] (rows=1 width=8)
Filter Operator [FIL_49] (rows=1 width=8)
predicate:(sq_count_check(_col0) <= 1)
Group By Operator [GBY_48] (rows=1 width=8)
Output:["_col0"],aggregations:["count(VALUE._col0)"]
<-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
PARTITION_ONLY_SHUFFLE [RS_47]
Group By Operator [GBY_46] (rows=1 width=8)
Output:["_col0"],aggregations:["count()"]
Select Operator [SEL_45] (rows=1 width=85)
Group By Operator [GBY_44] (rows=1 width=85)
Output:["_col0"],keys:KEY._col0
<-Map 2 [SIMPLE_EDGE] vectorized, llap
SHUFFLE [RS_43]
PartitionCols:_col0
Group By Operator [GBY_42] (rows=83 width=85)
Output:["_col0"],keys:'1'
Select Operator [SEL_41] (rows=1212121 width=105)
Filter Operator [FIL_40] (rows=1212121 width=105)
predicate:(p_type = '1')
TableScan [TS_2] (rows=200000000 width=105)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
<-Select Operator [SEL_59] (rows=200000000 width=621)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
TableScan [TS_0] (rows=200000000 width=621)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
{code}
-The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type.- Fixed by HIVE-16330
was:
This query has an sq_count check, though is useless on a constant key.
{code}
hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type);
Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
OK
Plan optimized by CBO.
Vertex dependency in root stage
Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
Reducer 6 <- Map 5 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Map 1 vectorized, llap
File Output Operator [FS_64]
Select Operator [SEL_63] (rows=66666666 width=621)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
Filter Operator [FIL_62] (rows=66666666 width=625)
predicate:(_col5 > _col10)
Map Join Operator [MAPJOIN_61] (rows=200000000 width=625)
Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
<-Reducer 6 [BROADCAST_EDGE] vectorized, llap
BROADCAST [RS_58]
Select Operator [SEL_57] (rows=1 width=4)
Output:["_col0"]
Group By Operator [GBY_56] (rows=1 width=89)
Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
<-Map 5 [SIMPLE_EDGE] vectorized, llap
SHUFFLE [RS_55]
PartitionCols:_col0
Group By Operator [GBY_54] (rows=86 width=89)
Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
Select Operator [SEL_53] (rows=1212121 width=109)
Output:["_col1"]
Filter Operator [FIL_52] (rows=1212121 width=109)
predicate:(p_type = '1')
TableScan [TS_17] (rows=200000000 width=109)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
<-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621)
Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
<-Reducer 4 [BROADCAST_EDGE] vectorized, llap
BROADCAST [RS_51]
Select Operator [SEL_50] (rows=1 width=8)
Filter Operator [FIL_49] (rows=1 width=8)
predicate:(sq_count_check(_col0) <= 1)
Group By Operator [GBY_48] (rows=1 width=8)
Output:["_col0"],aggregations:["count(VALUE._col0)"]
<-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
PARTITION_ONLY_SHUFFLE [RS_47]
Group By Operator [GBY_46] (rows=1 width=8)
Output:["_col0"],aggregations:["count()"]
Select Operator [SEL_45] (rows=1 width=85)
Group By Operator [GBY_44] (rows=1 width=85)
Output:["_col0"],keys:KEY._col0
<-Map 2 [SIMPLE_EDGE] vectorized, llap
SHUFFLE [RS_43]
PartitionCols:_col0
Group By Operator [GBY_42] (rows=83 width=85)
Output:["_col0"],keys:'1'
Select Operator [SEL_41] (rows=1212121 width=105)
Filter Operator [FIL_40] (rows=1212121 width=105)
predicate:(p_type = '1')
TableScan [TS_2] (rows=200000000 width=105)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
<-Select Operator [SEL_59] (rows=200000000 width=621)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
TableScan [TS_0] (rows=200000000 width=621)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
{code}
The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type.
{code}
hive> explain select * from part where p_size > (select max(p_size) from part group by p_type);
Warning: Map Join MAPJOIN[14][bigTable=?] in task 'Map 1' is a cross product
OK
Plan optimized by CBO.
Vertex dependency in root stage
Map 1 <- Reducer 3 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Map 1 vectorized, llap
File Output Operator [FS_26]
Select Operator [SEL_25] (rows=11000000000 width=621)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
Filter Operator [FIL_24] (rows=11000000000 width=625)
predicate:(_col5 > _col9)
Map Join Operator [MAPJOIN_23] (rows=33000000000 width=625)
Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
<-Reducer 3 [BROADCAST_EDGE] vectorized, llap
BROADCAST [RS_21]
Select Operator [SEL_20] (rows=165 width=4)
Output:["_col0"]
Group By Operator [GBY_19] (rows=165 width=109)
Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
<-Map 2 [SIMPLE_EDGE] vectorized, llap
SHUFFLE [RS_18]
PartitionCols:_col0
Group By Operator [GBY_17] (rows=14190 width=109)
Output:["_col0","_col1"],aggregations:["max(p_size)"],keys:p_type
Select Operator [SEL_16] (rows=200000000 width=109)
Output:["p_type","p_size"]
TableScan [TS_2] (rows=200000000 width=109)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
<-Select Operator [SEL_22] (rows=200000000 width=621)
Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
TableScan [TS_0] (rows=200000000 width=621)
tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
{code}
> Scalar sub-query: Scalar safety checks for explicit group-bys
> -------------------------------------------------------------
>
> Key: HIVE-16793
> URL: https://issues.apache.org/jira/browse/HIVE-16793
> Project: Hive
> Issue Type: Bug
> Components: SQL
> Affects Versions: 3.0.0
> Reporter: Gopal V
> Assignee: Vineet Garg
>
> This query has an sq_count check, though is useless on a constant key.
> {code}
> hive> explain select * from part where p_size > (select max(p_size) from part where p_type = '1' group by p_type);
> Warning: Map Join MAPJOIN[37][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[36][bigTable=?] in task 'Map 1' is a cross product
> OK
> Plan optimized by CBO.
> Vertex dependency in root stage
> Map 1 <- Reducer 4 (BROADCAST_EDGE), Reducer 6 (BROADCAST_EDGE)
> Reducer 3 <- Map 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (CUSTOM_SIMPLE_EDGE)
> Reducer 6 <- Map 5 (SIMPLE_EDGE)
> Stage-0
> Fetch Operator
> limit:-1
> Stage-1
> Map 1 vectorized, llap
> File Output Operator [FS_64]
> Select Operator [SEL_63] (rows=66666666 width=621)
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> Filter Operator [FIL_62] (rows=66666666 width=625)
> predicate:(_col5 > _col10)
> Map Join Operator [MAPJOIN_61] (rows=200000000 width=625)
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col10"]
> <-Reducer 6 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_58]
> Select Operator [SEL_57] (rows=1 width=4)
> Output:["_col0"]
> Group By Operator [GBY_56] (rows=1 width=89)
> Output:["_col0","_col1"],aggregations:["max(VALUE._col0)"],keys:KEY._col0
> <-Map 5 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_55]
> PartitionCols:_col0
> Group By Operator [GBY_54] (rows=86 width=89)
> Output:["_col0","_col1"],aggregations:["max(_col1)"],keys:'1'
> Select Operator [SEL_53] (rows=1212121 width=109)
> Output:["_col1"]
> Filter Operator [FIL_52] (rows=1212121 width=109)
> predicate:(p_type = '1')
> TableScan [TS_17] (rows=200000000 width=109)
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type","p_size"]
> <-Map Join Operator [MAPJOIN_60] (rows=200000000 width=621)
> Conds:(Inner),Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> <-Reducer 4 [BROADCAST_EDGE] vectorized, llap
> BROADCAST [RS_51]
> Select Operator [SEL_50] (rows=1 width=8)
> Filter Operator [FIL_49] (rows=1 width=8)
> predicate:(sq_count_check(_col0) <= 1)
> Group By Operator [GBY_48] (rows=1 width=8)
> Output:["_col0"],aggregations:["count(VALUE._col0)"]
> <-Reducer 3 [CUSTOM_SIMPLE_EDGE] vectorized, llap
> PARTITION_ONLY_SHUFFLE [RS_47]
> Group By Operator [GBY_46] (rows=1 width=8)
> Output:["_col0"],aggregations:["count()"]
> Select Operator [SEL_45] (rows=1 width=85)
> Group By Operator [GBY_44] (rows=1 width=85)
> Output:["_col0"],keys:KEY._col0
> <-Map 2 [SIMPLE_EDGE] vectorized, llap
> SHUFFLE [RS_43]
> PartitionCols:_col0
> Group By Operator [GBY_42] (rows=83 width=85)
> Output:["_col0"],keys:'1'
> Select Operator [SEL_41] (rows=1212121 width=105)
> Filter Operator [FIL_40] (rows=1212121 width=105)
> predicate:(p_type = '1')
> TableScan [TS_2] (rows=200000000 width=105)
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_type"]
> <-Select Operator [SEL_59] (rows=200000000 width=621)
> Output:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8"]
> TableScan [TS_0] (rows=200000000 width=621)
> tpch_flat_orc_1000@part,part,Tbl:COMPLETE,Col:COMPLETE,Output:["p_partkey","p_name","p_mfgr","p_brand","p_type","p_size","p_container","p_retailprice","p_comment"]
> {code}
> -The other version without the filter is missing the check, though the compiler cannot assume the nDV of p_type.- Fixed by HIVE-16330
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)