You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/02/18 04:47:11 UTC
[jira] [Updated] (HIVE-9713) CBO : inefficient join order created
for left join outer condition
[ https://issues.apache.org/jira/browse/HIVE-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar updated HIVE-9713:
----------------------------------
Description:
For the query below which is a subset of TPC-DS Query 80, CBO joins catalog_sales with catalog_returns first although the CE of the join is relatively high.
catalog_sales should be joined with the selective dimension tables first.
{code}
select count(*)
from
catalog_sales
,warehouse
,date_dim
,time_dim
,ship_mode
where
catalog_sales.cs_warehouse_sk = warehouse.w_warehouse_sk
and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
and catalog_sales.cs_sold_time_sk = time_dim.t_time_sk
and catalog_sales.cs_ship_mode_sk = ship_mode.sm_ship_mode_sk
and d_year = 2002
and t_time between 49530 AND 49530+28800
and sm_carrier in ('DIAMOND','AIRBORNE')
group by
w_warehouse_name
,w_warehouse_sq_ft
,w_city
,w_county
,w_state
,w_country
,d_year
{code}
Logical plan from CBO debug logs
{code}
2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1395
HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1393
HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391
HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1508
HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 0.0 io}, id = 1506
HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 0.0 io}, id = 1504
HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 0.0 io}, id = 1418
HiveJoin(condition=[AND(=($2, $7), =($4, $8))], joinType=[left]): rowcount = 8.252311488455487E15, cumulative cost = {3.15348608E8 rows, 0.0 cpu, 0.0 io}, id = 1413
HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324
HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136
HiveProject(cr_item_sk=[$2], cr_order_number=[$16], cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327
HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137
HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371
HiveFilter(condition=[between(false, $2, CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369
HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1138
HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375
HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): rowcount = 11718.0, cumulative cost = {0}, id = 1139
HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount = 16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1381
HiveFilter(condition=[>($5, 5E1)]): rowcount = 16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1379
HiveTableScan(table=[[tpcds_bin_orc_200.item]]): rowcount = 48000.0, cumulative cost = {0}, id = 1140
HiveProject(p_promo_sk=[$0], p_channel_tv=[$11]): rowcount = 225.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1387
HiveFilter(condition=[=($11, 'N')]): rowcount = 225.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1385
HiveTableScan(table=[[tpcds_bin_orc_200.promotion]]): rowcount = 450.0, cumulative cost = {0}, id = 1141
{code}
Explain plan
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
Map 3 <- Map 1 (BROADCAST_EDGE)
Map 4 <- Map 3 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
Reducer 5 <- Map 4 (SIMPLE_EDGE)
DagName: mmokhtar_20150217223434_d0ab6fa9-a1a3-47a5-8138-ba7435d9aea5:4
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: catalog_sales
filterExpr: (((cs_sold_date_sk is not null and cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is not null) (type: boolean)
Statistics: Num rows: 286549727 Data size: 65825832570 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((cs_sold_date_sk is not null and cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is not null) (type: boolean)
Statistics: Num rows: 285112475 Data size: 7974560516 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cs_sold_date_sk (type: int), cs_catalog_page_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: int), cs_order_number (type: int), cs_ext_sales_price (type: float), cs_net_profit (type: float)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
Statistics: Num rows: 285112475 Data size: 7974560516 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col2 (type: int), _col4 (type: int)
1 _col0 (type: int), _col1 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col5, _col6, _col9, _col10
input vertices:
1 Map 2
Statistics: Num rows: 2911 Data size: 93152 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 2911 Data size: 93152 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: int), _col2 (type: int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 (type: float)
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: catalog_returns
filterExpr: cr_item_sk is not null (type: boolean)
Statistics: Num rows: 28798881 Data size: 5764329494 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: cr_item_sk is not null (type: boolean)
Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cr_item_sk (type: int), cr_order_number (type: int), cr_return_amount (type: float), cr_net_loss (type: float)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col2 (type: float), _col3 (type: float)
Execution mode: vectorized
Map 3
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col1, _col2, _col3, _col5, _col6, _col9, _col10
input vertices:
0 Map 1
Statistics: Num rows: 1456 Data size: 40768 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col1 (type: int)
sort order: +
Map-reduce partition columns: _col1 (type: int)
Statistics: Num rows: 1456 Data size: 40768 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col2 (type: int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 (type: float)
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
alias: catalog_page
filterExpr: cp_catalog_page_sk is not null (type: boolean)
Statistics: Num rows: 11718 Data size: 5400282 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: cp_catalog_page_sk is not null (type: boolean)
Statistics: Num rows: 11718 Data size: 1218672 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cp_catalog_page_sk (type: int), cp_catalog_page_id (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 11718 Data size: 1218672 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: int)
1 _col0 (type: int)
outputColumnNames: _col2, _col3, _col5, _col6, _col9, _col10, _col14
input vertices:
0 Map 3
Statistics: Num rows: 1456 Data size: 180544 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3, _col5, _col6, _col9, _col10, _col14
input vertices:
1 Map 6
Statistics: Num rows: 486 Data size: 58320 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col3 (type: int)
1 _col0 (type: int)
outputColumnNames: _col5, _col6, _col9, _col10, _col14
input vertices:
1 Map 7
Statistics: Num rows: 243 Data size: 28188 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col14 (type: string), _col5 (type: float), COALESCE(_col9,0) (type: float), (_col6 - COALESCE(_col10,0)) (type: float)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 243 Data size: 28188 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: sum(_col1), sum(_col2), sum(_col3)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: double), _col2 (type: double), _col3 (type: double)
Execution mode: vectorized
Map 6
Map Operator Tree:
TableScan
alias: item
filterExpr: ((i_current_price > 50.0) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((i_current_price > 50.0) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 16000 Data size: 127832 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: i_item_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 16000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 16000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 7
Map Operator Tree:
TableScan
alias: promotion
filterExpr: ((p_channel_tv = 'N') and p_promo_sk is not null) (type: boolean)
Statistics: Num rows: 450 Data size: 530848 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((p_channel_tv = 'N') and p_promo_sk is not null) (type: boolean)
Statistics: Num rows: 225 Data size: 20025 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: p_promo_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 225 Data size: 900 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 225 Data size: 900 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Reducer 5
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0), sum(VALUE._col1), sum(VALUE._col2)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
was:
For the query below which is a subset of TPC-DS Query 66, CBO joins catalog_sales with catalog_returns first although the CE of the join is relatively high.
catalog_sales should be joined with the selective dimension tables first.
{code}
select count(*)
from
catalog_sales
,warehouse
,date_dim
,time_dim
,ship_mode
where
catalog_sales.cs_warehouse_sk = warehouse.w_warehouse_sk
and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
and catalog_sales.cs_sold_time_sk = time_dim.t_time_sk
and catalog_sales.cs_ship_mode_sk = ship_mode.sm_ship_mode_sk
and d_year = 2002
and t_time between 49530 AND 49530+28800
and sm_carrier in ('DIAMOND','AIRBORNE')
group by
w_warehouse_name
,w_warehouse_sq_ft
,w_city
,w_county
,w_state
,w_country
,d_year
{code}
Logical plan from CBO debug logs
{code}
2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1395
HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1393
HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391
HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1508
HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 0.0 io}, id = 1506
HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 0.0 io}, id = 1504
HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 0.0 io}, id = 1418
HiveJoin(condition=[AND(=($2, $7), =($4, $8))], joinType=[left]): rowcount = 8.252311488455487E15, cumulative cost = {3.15348608E8 rows, 0.0 cpu, 0.0 io}, id = 1413
HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324
HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136
HiveProject(cr_item_sk=[$2], cr_order_number=[$16], cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327
HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137
HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371
HiveFilter(condition=[between(false, $2, CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369
HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1138
HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375
HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): rowcount = 11718.0, cumulative cost = {0}, id = 1139
HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount = 16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1381
HiveFilter(condition=[>($5, 5E1)]): rowcount = 16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1379
HiveTableScan(table=[[tpcds_bin_orc_200.item]]): rowcount = 48000.0, cumulative cost = {0}, id = 1140
HiveProject(p_promo_sk=[$0], p_channel_tv=[$11]): rowcount = 225.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1387
HiveFilter(condition=[=($11, 'N')]): rowcount = 225.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1385
HiveTableScan(table=[[tpcds_bin_orc_200.promotion]]): rowcount = 450.0, cumulative cost = {0}, id = 1141
{code}
Explain plan
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 2 (BROADCAST_EDGE)
Map 3 <- Map 1 (BROADCAST_EDGE)
Map 4 <- Map 3 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
Reducer 5 <- Map 4 (SIMPLE_EDGE)
DagName: mmokhtar_20150217223434_d0ab6fa9-a1a3-47a5-8138-ba7435d9aea5:4
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: catalog_sales
filterExpr: (((cs_sold_date_sk is not null and cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is not null) (type: boolean)
Statistics: Num rows: 286549727 Data size: 65825832570 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((cs_sold_date_sk is not null and cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is not null) (type: boolean)
Statistics: Num rows: 285112475 Data size: 7974560516 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cs_sold_date_sk (type: int), cs_catalog_page_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: int), cs_order_number (type: int), cs_ext_sales_price (type: float), cs_net_profit (type: float)
outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
Statistics: Num rows: 285112475 Data size: 7974560516 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Left Outer Join0 to 1
keys:
0 _col2 (type: int), _col4 (type: int)
1 _col0 (type: int), _col1 (type: int)
outputColumnNames: _col0, _col1, _col2, _col3, _col5, _col6, _col9, _col10
input vertices:
1 Map 2
Statistics: Num rows: 2911 Data size: 93152 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 2911 Data size: 93152 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: int), _col2 (type: int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 (type: float)
Execution mode: vectorized
Map 2
Map Operator Tree:
TableScan
alias: catalog_returns
filterExpr: cr_item_sk is not null (type: boolean)
Statistics: Num rows: 28798881 Data size: 5764329494 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: cr_item_sk is not null (type: boolean)
Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cr_item_sk (type: int), cr_order_number (type: int), cr_return_amount (type: float), cr_net_loss (type: float)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col2 (type: float), _col3 (type: float)
Execution mode: vectorized
Map 3
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col1, _col2, _col3, _col5, _col6, _col9, _col10
input vertices:
0 Map 1
Statistics: Num rows: 1456 Data size: 40768 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col1 (type: int)
sort order: +
Map-reduce partition columns: _col1 (type: int)
Statistics: Num rows: 1456 Data size: 40768 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col2 (type: int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 (type: float)
Execution mode: vectorized
Map 4
Map Operator Tree:
TableScan
alias: catalog_page
filterExpr: cp_catalog_page_sk is not null (type: boolean)
Statistics: Num rows: 11718 Data size: 5400282 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: cp_catalog_page_sk is not null (type: boolean)
Statistics: Num rows: 11718 Data size: 1218672 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: cp_catalog_page_sk (type: int), cp_catalog_page_id (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 11718 Data size: 1218672 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: int)
1 _col0 (type: int)
outputColumnNames: _col2, _col3, _col5, _col6, _col9, _col10, _col14
input vertices:
0 Map 3
Statistics: Num rows: 1456 Data size: 180544 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int)
1 _col0 (type: int)
outputColumnNames: _col3, _col5, _col6, _col9, _col10, _col14
input vertices:
1 Map 6
Statistics: Num rows: 486 Data size: 58320 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col3 (type: int)
1 _col0 (type: int)
outputColumnNames: _col5, _col6, _col9, _col10, _col14
input vertices:
1 Map 7
Statistics: Num rows: 243 Data size: 28188 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col14 (type: string), _col5 (type: float), COALESCE(_col9,0) (type: float), (_col6 - COALESCE(_col10,0)) (type: float)
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 243 Data size: 28188 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
aggregations: sum(_col1), sum(_col2), sum(_col3)
keys: _col0 (type: string)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string)
sort order: +
Map-reduce partition columns: _col0 (type: string)
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: _col1 (type: double), _col2 (type: double), _col3 (type: double)
Execution mode: vectorized
Map 6
Map Operator Tree:
TableScan
alias: item
filterExpr: ((i_current_price > 50.0) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((i_current_price > 50.0) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 16000 Data size: 127832 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: i_item_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 16000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 16000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 7
Map Operator Tree:
TableScan
alias: promotion
filterExpr: ((p_channel_tv = 'N') and p_promo_sk is not null) (type: boolean)
Statistics: Num rows: 450 Data size: 530848 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((p_channel_tv = 'N') and p_promo_sk is not null) (type: boolean)
Statistics: Num rows: 225 Data size: 20025 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: p_promo_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 225 Data size: 900 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 225 Data size: 900 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Reducer 5
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0), sum(VALUE._col1), sum(VALUE._col2)
keys: KEY._col0 (type: string)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
> CBO : inefficient join order created for left join outer condition
> ------------------------------------------------------------------
>
> Key: HIVE-9713
> URL: https://issues.apache.org/jira/browse/HIVE-9713
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Laljo John Pullokkaran
> Fix For: 1.2.0
>
>
> For the query below which is a subset of TPC-DS Query 80, CBO joins catalog_sales with catalog_returns first although the CE of the join is relatively high.
> catalog_sales should be joined with the selective dimension tables first.
> {code}
> select count(*)
> from
> catalog_sales
> ,warehouse
> ,date_dim
> ,time_dim
> ,ship_mode
> where
> catalog_sales.cs_warehouse_sk = warehouse.w_warehouse_sk
> and catalog_sales.cs_sold_date_sk = date_dim.d_date_sk
> and catalog_sales.cs_sold_time_sk = time_dim.t_time_sk
> and catalog_sales.cs_ship_mode_sk = ship_mode.sm_ship_mode_sk
> and d_year = 2002
> and t_time between 49530 AND 49530+28800
> and sm_carrier in ('DIAMOND','AIRBORNE')
> group by
> w_warehouse_name
> ,w_warehouse_sq_ft
> ,w_city
> ,w_county
> ,w_state
> ,w_country
> ,d_year
> {code}
> Logical plan from CBO debug logs
> {code}
> 2015-02-17 22:34:04,577 DEBUG [main]: parse.CalcitePlanner (CalcitePlanner.java:apply(743)) - Plan After Join Reordering:
> HiveProject(catalog_page_id=[$0], sales=[$1], returns=[$2], profit=[$3]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1395
> HiveAggregate(group=[{0}], agg#0=[sum($1)], agg#1=[sum($2)], agg#2=[sum($3)]): rowcount = 10590.0, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1393
> HiveProject($f0=[$14], $f1=[$5], $f2=[coalesce($9, 0)], $f3=[-($6, coalesce($10, 0))]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1391
> HiveJoin(condition=[=($3, $17)], joinType=[inner]): rowcount = 1.368586152225262E8, cumulative cost = {8.25242586823495E15 rows, 0.0 cpu, 0.0 io}, id = 1508
> HiveJoin(condition=[=($2, $15)], joinType=[inner]): rowcount = 2.737172304450524E8, cumulative cost = {8.252425594517495E15 rows, 0.0 cpu, 0.0 io}, id = 1506
> HiveJoin(condition=[=($1, $13)], joinType=[inner]): rowcount = 8.211516913351573E8, cumulative cost = {8.252424773349804E15 rows, 0.0 cpu, 0.0 io}, id = 1504
> HiveJoin(condition=[=($0, $11)], joinType=[inner]): rowcount = 1.1296953399027347E11, cumulative cost = {8.252311803804096E15 rows, 0.0 cpu, 0.0 io}, id = 1418
> HiveJoin(condition=[AND(=($2, $7), =($4, $8))], joinType=[left]): rowcount = 8.252311488455487E15, cumulative cost = {3.15348608E8 rows, 0.0 cpu, 0.0 io}, id = 1413
> HiveProject(cs_sold_date_sk=[$0], cs_catalog_page_sk=[$12], cs_item_sk=[$15], cs_promo_sk=[$16], cs_order_number=[$17], cs_ext_sales_price=[$23], cs_net_profit=[$33]): rowcount = 2.86549727E8, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1324
> HiveTableScan(table=[[tpcds_bin_orc_200.catalog_sales]]): rowcount = 2.86549727E8, cumulative cost = {0}, id = 1136
> HiveProject(cr_item_sk=[$2], cr_order_number=[$16], cr_return_amount=[$18], cr_net_loss=[$26]): rowcount = 2.8798881E7, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1327
> HiveTableScan(table=[[tpcds_bin_orc_200.catalog_returns]]): rowcount = 2.8798881E7, cumulative cost = {0}, id = 1137
> HiveProject(d_date_sk=[$0], d_date=[$2]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1371
> HiveFilter(condition=[between(false, $2, CAST('1998-08-04'):DATE, CAST('1998-09-04'):DATE)]): rowcount = 1.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1369
> HiveTableScan(table=[[tpcds_bin_orc_200.date_dim]]): rowcount = 73049.0, cumulative cost = {0}, id = 1138
> HiveProject(cp_catalog_page_sk=[$0], cp_catalog_page_id=[$1]): rowcount = 11718.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1375
> HiveTableScan(table=[[tpcds_bin_orc_200.catalog_page]]): rowcount = 11718.0, cumulative cost = {0}, id = 1139
> HiveProject(i_item_sk=[$0], i_current_price=[$5]): rowcount = 16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1381
> HiveFilter(condition=[>($5, 5E1)]): rowcount = 16000.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1379
> HiveTableScan(table=[[tpcds_bin_orc_200.item]]): rowcount = 48000.0, cumulative cost = {0}, id = 1140
> HiveProject(p_promo_sk=[$0], p_channel_tv=[$11]): rowcount = 225.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1387
> HiveFilter(condition=[=($11, 'N')]): rowcount = 225.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1385
> HiveTableScan(table=[[tpcds_bin_orc_200.promotion]]): rowcount = 450.0, cumulative cost = {0}, id = 1141
> {code}
> Explain plan
> {code}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 2 (BROADCAST_EDGE)
> Map 3 <- Map 1 (BROADCAST_EDGE)
> Map 4 <- Map 3 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE), Map 7 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
> DagName: mmokhtar_20150217223434_d0ab6fa9-a1a3-47a5-8138-ba7435d9aea5:4
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: catalog_sales
> filterExpr: (((cs_sold_date_sk is not null and cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is not null) (type: boolean)
> Statistics: Num rows: 286549727 Data size: 65825832570 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((cs_sold_date_sk is not null and cs_catalog_page_sk is not null) and cs_item_sk is not null) and cs_promo_sk is not null) (type: boolean)
> Statistics: Num rows: 285112475 Data size: 7974560516 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cs_sold_date_sk (type: int), cs_catalog_page_sk (type: int), cs_item_sk (type: int), cs_promo_sk (type: int), cs_order_number (type: int), cs_ext_sales_price (type: float), cs_net_profit (type: float)
> outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5, _col6
> Statistics: Num rows: 285112475 Data size: 7974560516 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Left Outer Join0 to 1
> keys:
> 0 _col2 (type: int), _col4 (type: int)
> 1 _col0 (type: int), _col1 (type: int)
> outputColumnNames: _col0, _col1, _col2, _col3, _col5, _col6, _col9, _col10
> input vertices:
> 1 Map 2
> Statistics: Num rows: 2911 Data size: 93152 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 2911 Data size: 93152 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: int), _col2 (type: int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 (type: float)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
> alias: catalog_returns
> filterExpr: cr_item_sk is not null (type: boolean)
> Statistics: Num rows: 28798881 Data size: 5764329494 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: cr_item_sk is not null (type: boolean)
> Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cr_item_sk (type: int), cr_order_number (type: int), cr_return_amount (type: float), cr_net_loss (type: float)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int)
> sort order: ++
> Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
> Statistics: Num rows: 28798881 Data size: 456171072 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col2 (type: float), _col3 (type: float)
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (d_date BETWEEN 1998-08-04 AND 1998-09-04 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 3579352 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col1, _col2, _col3, _col5, _col6, _col9, _col10
> input vertices:
> 0 Map 1
> Statistics: Num rows: 1456 Data size: 40768 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col1 (type: int)
> sort order: +
> Map-reduce partition columns: _col1 (type: int)
> Statistics: Num rows: 1456 Data size: 40768 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col2 (type: int), _col3 (type: int), _col5 (type: float), _col6 (type: float), _col9 (type: float), _col10 (type: float)
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: catalog_page
> filterExpr: cp_catalog_page_sk is not null (type: boolean)
> Statistics: Num rows: 11718 Data size: 5400282 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: cp_catalog_page_sk is not null (type: boolean)
> Statistics: Num rows: 11718 Data size: 1218672 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: cp_catalog_page_sk (type: int), cp_catalog_page_id (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 11718 Data size: 1218672 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col1 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col2, _col3, _col5, _col6, _col9, _col10, _col14
> input vertices:
> 0 Map 3
> Statistics: Num rows: 1456 Data size: 180544 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col2 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col3, _col5, _col6, _col9, _col10, _col14
> input vertices:
> 1 Map 6
> Statistics: Num rows: 486 Data size: 58320 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col3 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col5, _col6, _col9, _col10, _col14
> input vertices:
> 1 Map 7
> Statistics: Num rows: 243 Data size: 28188 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col14 (type: string), _col5 (type: float), COALESCE(_col9,0) (type: float), (_col6 - COALESCE(_col10,0)) (type: float)
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 243 Data size: 28188 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: sum(_col1), sum(_col2), sum(_col3)
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: _col1 (type: double), _col2 (type: double), _col3 (type: double)
> Execution mode: vectorized
> Map 6
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: ((i_current_price > 50.0) and i_item_sk is not null) (type: boolean)
> Statistics: Num rows: 48000 Data size: 68732712 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((i_current_price > 50.0) and i_item_sk is not null) (type: boolean)
> Statistics: Num rows: 16000 Data size: 127832 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: i_item_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 16000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 16000 Data size: 64000 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 7
> Map Operator Tree:
> TableScan
> alias: promotion
> filterExpr: ((p_channel_tv = 'N') and p_promo_sk is not null) (type: boolean)
> Statistics: Num rows: 450 Data size: 530848 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((p_channel_tv = 'N') and p_promo_sk is not null) (type: boolean)
> Statistics: Num rows: 225 Data size: 20025 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: p_promo_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 225 Data size: 900 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 225 Data size: 900 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Reducer 5
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0), sum(VALUE._col1), sum(VALUE._col2)
> keys: KEY._col0 (type: string)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 121 Data size: 15004 Basic stats: COMPLETE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)