You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/04/02 19:35:52 UTC
[jira] [Created] (HIVE-10194) CBO (Calcite Return Path): Q94
generates cross product
Mostafa Mokhtar created HIVE-10194:
--------------------------------------
Summary: CBO (Calcite Return Path): Q94 generates cross product
Key: HIVE-10194
URL: https://issues.apache.org/jira/browse/HIVE-10194
Project: Hive
Issue Type: Sub-task
Reporter: Mostafa Mokhtar
Assignee: Jesus Camacho Rodriguez
Query
{code}
SELECT count(distinct ws_order_number) as order_count,
sum(ws_ext_ship_cost) as total_shipping_cost,
sum(ws_net_profit) as total_net_profit
FROM web_sales ws1
JOIN customer_address ca ON (ws1.ws_ship_addr_sk = ca.ca_address_sk)
JOIN web_site s ON (ws1.ws_web_site_sk = s.web_site_sk)
JOIN date_dim d ON (ws1.ws_ship_date_sk = d.d_date_sk)
LEFT SEMI JOIN (SELECT ws2.ws_order_number as ws_order_number
FROM web_sales ws2 JOIN web_sales ws3
ON (ws2.ws_order_number = ws3.ws_order_number)
WHERE ws2.ws_warehouse_sk <> ws3.ws_warehouse_sk
) ws_wh1
ON (ws1.ws_order_number = ws_wh1.ws_order_number)
LEFT OUTER JOIN web_returns wr1 ON (ws1.ws_order_number = wr1.wr_order_number)
WHERE d.d_date between '1999-05-01' and '1999-07-01' and
ca.ca_state = 'TX' and
s.web_company_name = 'pri' and
wr1.wr_order_number is null
limit 100
{code}
Plan
{code}
OK
Time taken: 0.23 seconds
Warning: Map Join MAPJOIN[83][bigTable=ws1] in task 'Map 2' is a cross product
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 2 <- Map 1 (BROADCAST_EDGE)
Map 8 <- Reducer 4 (BROADCAST_EDGE)
Reducer 3 <- Map 2 (SIMPLE_EDGE), Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (SIMPLE_EDGE)
Reducer 4 <- Map 10 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
Reducer 9 <- Map 8 (SIMPLE_EDGE)
DagName: mmokhtar_20150402132417_1bc8688b-59a0-4909-82a4-b9d386065bbd:3
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: ws1
filterExpr: (((ws_ship_addr_sk = ws_order_number) and (ws_ship_date_sk <> ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: boolean)
Statistics: Num rows: 143966864 Data size: 33110363004 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((ws_ship_addr_sk = ws_order_number) and (ws_ship_date_sk <> ws_web_site_sk)) and ws_ship_addr_sk is not null) (type: boolean)
Statistics: Num rows: 71974471 Data size: 1151483592 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ws_ship_addr_sk (type: int)
outputColumnNames: _col1
Statistics: Num rows: 71974471 Data size: 287862044 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
sort order:
Statistics: Num rows: 71974471 Data size: 287862044 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: int)
Execution mode: vectorized
Map 10
Map Operator Tree:
TableScan
alias: wr1
Statistics: Num rows: 13749816 Data size: 2585240312 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: wr_order_number (type: int)
sort order: +
Map-reduce partition columns: wr_order_number (type: int)
Statistics: Num rows: 13749816 Data size: 2585240312 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: ws1
Statistics: Num rows: 143966864 Data size: 33110363004 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0
1
outputColumnNames: _col1
input vertices:
0 Map 1
Statistics: Num rows: 5180969438964472 Data size: 20723877755857888 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col1 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 5180969438964472 Data size: 20723877755857888 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 73333928460636 Data size: 293335713842544 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 73333928460636 Data size: 293335713842544 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 5
Map Operator Tree:
TableScan
alias: ca
filterExpr: ((ca_state = 'TX') and ca_address_sk is not null) (type: boolean)
Statistics: Num rows: 800000 Data size: 811903688 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((ca_state = 'TX') and ca_address_sk is not null) (type: boolean)
Statistics: Num rows: 15686 Data size: 1411740 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ca_address_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 15686 Data size: 62744 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 15686 Data size: 62744 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 6
Map Operator Tree:
TableScan
alias: d
filterExpr: (d_date BETWEEN '1999-05-01' AND '1999-07-01' and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (d_date BETWEEN '1999-05-01' AND '1999-07-01' and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 7
Map Operator Tree:
TableScan
alias: ws1
filterExpr: (((ws_order_number is not null and ws_ship_addr_sk is not null) and ws_ship_date_sk is not null) and ws_web_site_sk is not null) (type: boolean)
Statistics: Num rows: 143966864 Data size: 33110363004 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((ws_order_number is not null and ws_ship_addr_sk is not null) and ws_ship_date_sk is not null) and ws_web_site_sk is not null) (type: boolean)
Statistics: Num rows: 143912892 Data size: 3453621540 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: ws_order_number (type: int)
sort order: +
Map-reduce partition columns: ws_order_number (type: int)
Statistics: Num rows: 143912892 Data size: 3453621540 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: ws_ship_date_sk (type: int), ws_ship_addr_sk (type: int), ws_web_site_sk (type: int), ws_ext_ship_cost (type: float), ws_net_profit (type: float)
Execution mode: vectorized
Map 8
Map Operator Tree:
TableScan
alias: s
filterExpr: ((web_company_name = 'pri') and web_site_sk is not null) (type: boolean)
Statistics: Num rows: 38 Data size: 70614 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((web_company_name = 'pri') and web_site_sk is not null) (type: boolean)
Statistics: Num rows: 5 Data size: 460 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: web_site_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 5 Data size: 20 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col30 (type: int)
1 _col0 (type: int)
outputColumnNames: _col31, _col32, _col33
input vertices:
0 Reducer 4
Statistics: Num rows: 5 Data size: 22 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col31 (type: int), _col32 (type: float), _col33 (type: float)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 5 Data size: 22 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count(DISTINCT _col0), sum(_col1), sum(_col2)
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 5 Data size: 22 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Statistics: Num rows: 5 Data size: 22 Basic stats: COMPLETE Column stats: NONE
TopN Hash Memory Usage: 0.04
value expressions: _col2 (type: double), _col3 (type: double)
Execution mode: vectorized
Reducer 3
Reduce Operator Tree:
Merge Join Operator
condition map:
Left Semi Join 0 to 1
keys:
0 ws_order_number (type: int)
1 _col0 (type: int)
outputColumnNames: _col2, _col11, _col13, _col17, _col28, _col33
Statistics: Num rows: 718857633877870 Data size: 17252583213068880 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col11 (type: int)
1 _col0 (type: int)
outputColumnNames: _col2, _col13, _col17, _col28, _col33
input vertices:
1 Map 5
Statistics: Num rows: 14094999486464 Data size: 281899989729280 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int)
1 _col0 (type: int)
outputColumnNames: _col13, _col17, _col28, _col33
input vertices:
1 Map 6
Statistics: Num rows: 7047403274240 Data size: 112758452387840 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col13 (type: int), _col17 (type: int), _col28 (type: float), _col33 (type: float)
outputColumnNames: _col2, _col3, _col4, _col5
Statistics: Num rows: 7047403274240 Data size: 112758452387840 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col3 (type: int)
sort order: +
Map-reduce partition columns: _col3 (type: int)
Statistics: Num rows: 7047403274240 Data size: 112758452387840 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col2 (type: int), _col4 (type: float), _col5 (type: float)
Reducer 4
Reduce Operator Tree:
Merge Join Operator
condition map:
Right Outer Join0 to 1
keys:
0 wr_order_number (type: int)
1 _col3 (type: int)
outputColumnNames: _col13, _col30, _col31, _col32, _col33
Statistics: Num rows: 12100482980189 Data size: 242009659603780 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: _col13 is null (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col30 (type: int)
sort order: +
Map-reduce partition columns: _col30 (type: int)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
value expressions: _col31 (type: int), _col32 (type: float), _col33 (type: float)
Reducer 9
Reduce Operator Tree:
Group By Operator
aggregations: count(DISTINCT KEY._col0:0._col0), sum(VALUE._col1), sum(VALUE._col2)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: NONE
Limit
Number of rows: 100
Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 1 Data size: 32 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: 100
Processor Tree:
ListSink
{code}
Logical plan
{code}
HiveSort(fetch=[100]): rowcount = 1.0, cumulative cost = {4.594004456323317E8 rows, 5.882012320482085E9 cpu, 9.353802456E12 io}, id = 1080
HiveAggregate(group=[{}], agg#0=[count(DISTINCT $0)], agg#1=[sum($1)], agg#2=[sum($2)]): rowcount = 1.0, cumulative cost = {4.594004336323317E8 rows, 5.882012319482085E9 cpu, 9.353802456E12 io}, id = 1078
HiveProject($f0=[$3], $f1=[$4], $f2=[$5]): rowcount = 464.4485504122314, cumulative cost = {4.594004336323317E8 rows, 5.882012319482085E9 cpu, 9.353802456E12 io}, id = 1076
HiveFilter(condition=[isnull($12)]): rowcount = 464.4485504122314, cumulative cost = {4.594004336323317E8 rows, 5.882012319482085E9 cpu, 9.353802456E12 io}, id = 1074
HiveProject(ws_ship_date_sk=[$1], ws_ship_addr_sk=[$2], ws_web_site_sk=[$3], ws_order_number=[$4], ws_ext_ship_cost=[$5], ws_net_profit=[$6], ca_address_sk=[$7], ca_state=[$8], web_site_sk=[$11], web_company_name=[$12], d_date_sk=[$9], d_date=[$10], wr_order_number=[$0]): rowcount = 1229395.3129411766, cumulative cost = {4.594004336323317E8 rows, 5.882012319482085E9 cpu, 9.353802456E12 io}, id = 1185
HiveJoin(condition=[=($3, $11)], joinType=[inner], joinAlgorithm=[map_join], cost=[{1.3749818235294119E7 rows, 1.3749820470588237E7 cpu, 0.0 io}]): rowcount = 1229395.3129411766, cumulative cost = {4.594004336323317E8 rows, 5.882012319482085E9 cpu, 9.353802456E12 io}, id = 1183
HiveJoin(condition=[=($4, $0)], joinType=[right], joinAlgorithm=[map_join], cost=[{1.3749817E7 rows, 2.7499633E7 cpu, 2.47496688E10 io}]): rowcount = 1.3749816E7, cumulative cost = {4.456506153970376E8 rows, 5.868262499011497E9 cpu, 9.353802456E12 io}, id = 1181
HiveTableScan(table=[[tpcds_bin_orc_200.web_returns]]): rowcount = 1.3749816E7, cumulative cost = {0}, id = 974
HiveJoin(condition=[=($0, $8)], joinType=[inner], joinAlgorithm=[map_join], cost=[{204.39703763146147 rows, 205.39703763146147 cpu, 0.0 io}]): rowcount = 1.0, cumulative cost = {4.319007983970376E8 rows, 5.840762866011497E9 cpu, 9.3290527872E12 io}, id = 1179
HiveJoin(condition=[=($1, $6)], joinType=[inner], joinAlgorithm=[map_join], cost=[{1.43966865E8 rows, 1.43966866E8 cpu, 0.0 io}]): rowcount = 203.39703763146147, cumulative cost = {4.31900594E8 rows, 5.840762660614459E9 cpu, 9.3290527872E12 io}, id = 1110
SemiJoin(condition=[=($3, $6)], joinType=[inner]): rowcount = 1.43966864E8, cumulative cost = {2.87933729E8 rows, 5.696795794614459E9 cpu, 9.3290527872E12 io}, id = 1058
HiveTableScan(table=[[tpcds_bin_orc_200.web_sales]]): rowcount = 1.43966864E8, cumulative cost = {0}, id = 958
HiveProject(ws_order_number=[$1]): rowcount = 1.655760644524185E11, cumulative cost = {2.87933728E8 rows, 5.696795793614459E9 cpu, 9.3290527872E12 io}, id = 1056
HiveFilter(condition=[<>($0, $2)]): rowcount = 1.655760644524185E11, cumulative cost = {2.87933728E8 rows, 5.696795793614459E9 cpu, 9.3290527872E12 io}, id = 1054
HiveJoin(condition=[=($1, $3)], joinType=[inner], joinAlgorithm=[common_join], cost=[{2.87933728E8 rows, 5.696795793614459E9 cpu, 9.3290527872E12 io}]): rowcount = 1.6564734127740878E11, cumulative cost = {2.87933728E8 rows, 5.696795793614459E9 cpu, 9.3290527872E12 io}, id = 1103
HiveTableScan(table=[[tpcds_bin_orc_200.web_sales]]): rowcount = 1.43966864E8, cumulative cost = {0}, id = 958
HiveTableScan(table=[[tpcds_bin_orc_200.web_sales]]): rowcount = 1.43966864E8, cumulative cost = {0}, id = 958
HiveFilter(condition=[=($1, 'TX')]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1060
HiveTableScan(table=[[tpcds_bin_orc_200.customer_address]]): rowcount = 800000.0, cumulative cost = {0}, id = 965
HiveFilter(condition=[between(false, $1, '1999-05-01', '1999-07-01')]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1068
HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 971
HiveFilter(condition=[=($1, 'pri')]): rowcount = 2.235294117647059, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1064
HiveTableScan(table=[[tpcds_bin_orc_200.web_site]]): rowcount = 38.0, cumulative cost = {0}, id = 968
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)