You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/03/27 00:26:52 UTC
[jira] [Created] (HIVE-10107) Union All : Vertex missing stats
resulting in OOM and in-efficient plans
Mostafa Mokhtar created HIVE-10107:
--------------------------------------
Summary: Union All : Vertex missing stats resulting in OOM and in-efficient plans
Key: HIVE-10107
URL: https://issues.apache.org/jira/browse/HIVE-10107
Project: Hive
Issue Type: Bug
Components: Physical Optimizer
Affects Versions: 0.14.0
Reporter: Mostafa Mokhtar
Assignee: Prasanth Jayachandran
Fix For: 1.2.0
Reducer Vertices sending data to a Union all edge are missing statistics and as a result we either use very few reducers in the UNION ALL edge or decide to broadcast the results of UNION ALL.
Query
{code}
select
count(*) rowcount
from
(select
ss_item_sk, ss_ticket_number, ss_store_sk
from
store_sales a, store_returns b
where
a.ss_item_sk = b.sr_item_sk
and a.ss_ticket_number = b.sr_ticket_number union all select
ss_item_sk, ss_ticket_number, ss_store_sk
from
store_sales c, store_returns d
where
c.ss_item_sk = d.sr_item_sk
and c.ss_ticket_number = d.sr_ticket_number) t
group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
having rowcount > 100000000;
{code}
Plan snippet
{code}
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
Reducer 4 <- Union 3 (SIMPLE_EDGE)
Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)
Reducer 4
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (_col3 > 100000000) (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
Select Operator
expressions: _col3 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 7
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ss_item_sk (type: int), ss_ticket_number (type: int)
1 sr_item_sk (type: int), sr_ticket_number (type: int)
outputColumnNames: _col1, _col6, _col8, _col27, _col34
Filter Operator
predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
Select Operator
expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
outputColumnNames: _col0, _col1, _col2
Group By Operator
aggregations: count()
keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
sort order: +++
Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
value expressions: _col3 (type: bigint)
{code}
The full explain plan
{code}
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
Reducer 4 <- Union 3 (SIMPLE_EDGE)
Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)
DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: a
filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
sort order: ++
Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: ss_store_sk (type: int)
Map 5
Map Operator Tree:
TableScan
alias: b
filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
sort order: ++
Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
Map 6
Map Operator Tree:
TableScan
alias: c
filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
sort order: ++
Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
value expressions: ss_store_sk (type: int)
Map 8
Map Operator Tree:
TableScan
alias: d
filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
sort order: ++
Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
Reducer 2
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ss_item_sk (type: int), ss_ticket_number (type: int)
1 sr_item_sk (type: int), sr_ticket_number (type: int)
outputColumnNames: _col1, _col6, _col8, _col27, _col34
Filter Operator
predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
Select Operator
expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
outputColumnNames: _col0, _col1, _col2
Group By Operator
aggregations: count()
keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
sort order: +++
Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
value expressions: _col3 (type: bigint)
Reducer 4
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1, _col2, _col3
Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (_col3 > 100000000) (type: boolean)
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
Select Operator
expressions: _col3 (type: bigint)
outputColumnNames: _col0
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
File Output Operator
compressed: false
Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 7
Reduce Operator Tree:
Merge Join Operator
condition map:
Inner Join 0 to 1
keys:
0 ss_item_sk (type: int), ss_ticket_number (type: int)
1 sr_item_sk (type: int), sr_ticket_number (type: int)
outputColumnNames: _col1, _col6, _col8, _col27, _col34
Filter Operator
predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
Select Operator
expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
outputColumnNames: _col0, _col1, _col2
Group By Operator
aggregations: count()
keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
mode: hash
outputColumnNames: _col0, _col1, _col2, _col3
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
sort order: +++
Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
value expressions: _col3 (type: bigint)
Union 3
Vertex: Union 3
Stage: Stage-0
Fetch Operator
limit: -1
Processor Tree:
ListSink
{code}
Also TPC-DS Q54 fails with OOM, this failure happens when we chose a different plan.
The OOM happens in vertexName=Map 14
{code}
explain
with my_customers as (
select c_customer_sk
, c_current_addr_sk
from
( select cs_sold_date_sk sold_date_sk,
cs_bill_customer_sk customer_sk,
cs_item_sk item_sk
from catalog_sales
union all
select ws_sold_date_sk sold_date_sk,
ws_bill_customer_sk customer_sk,
ws_item_sk item_sk
from web_sales
) cs_or_ws_sales,
item,
date_dim,
customer
where sold_date_sk = d_date_sk
and item_sk = i_item_sk
and i_category = 'Jewelry'
and i_class = 'football'
and c_customer_sk = cs_or_ws_sales.customer_sk
and d_moy = 3
and d_year = 2000
group by c_customer_sk
, c_current_addr_sk
)
, my_revenue as (
select c_customer_sk,
sum(ss_ext_sales_price) as revenue
from my_customers,
store_sales,
customer_address,
store,
date_dim
where c_current_addr_sk = ca_address_sk
and ca_county = s_county
and ca_state = s_state
and ss_sold_date_sk = d_date_sk
and c_customer_sk = ss_customer_sk
and d_month_seq between (1203)
and (1205)
group by c_customer_sk
)
, segments as
(select cast((revenue/50) as int) as segment
from my_revenue
)
select segment, count(*) as num_customers, segment*50 as segment_base
from segments
group by segment
order by segment, num_customers
limit 100
OK
STAGE DEPENDENCIES:
Stage-1 is a root stage
Stage-0 depends on stages: Stage-1
STAGE PLANS:
Stage: Stage-1
Tez
Edges:
Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
Map 10 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
Map 12 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
Map 14 <- Union 11 (BROADCAST_EDGE)
Map 6 <- Map 7 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
Map 8 <- Map 14 (BROADCAST_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
Reducer 9 <- Map 8 (SIMPLE_EDGE)
DagName: mmokhtar_20150208232525_9976b56b-8f4b-48c8-a909-aa653c20051c:1
Vertices:
Map 1
Map Operator Tree:
TableScan
alias: store_sales
filterExpr: ss_customer_sk is not null (type: boolean)
Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ss_customer_sk is not null (type: boolean)
Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ss_customer_sk (type: int), ss_ext_sales_price (type: float), ss_sold_date_sk (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int)
1 _col0 (type: int)
outputColumnNames: _col0, _col1
input vertices:
1 Map 5
Statistics: Num rows: 90081226648 Data size: 720649813184 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col5 (type: int)
outputColumnNames: _col1, _col10
input vertices:
1 Map 6
Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col10 (type: int), _col1 (type: float)
outputColumnNames: _col0, _col1
Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: sum(_col1)
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: double)
Execution mode: vectorized
Map 10
Map Operator Tree:
TableScan
alias: catalog_sales
filterExpr: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean)
Filter Operator
predicate: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean)
Select Operator
expressions: cs_sold_date_sk (type: int), cs_bill_customer_sk (type: int), cs_item_sk (type: int)
outputColumnNames: _col0, _col1, _col2
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col1, _col2
input vertices:
1 Map 13
Reduce Output Operator
key expressions: _col2 (type: int)
sort order: +
Map-reduce partition columns: _col2 (type: int)
value expressions: _col1 (type: int)
Execution mode: vectorized
Map 12
Map Operator Tree:
TableScan
alias: web_sales
filterExpr: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean)
Filter Operator
predicate: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean)
Select Operator
expressions: ws_sold_date_sk (type: int), ws_bill_customer_sk (type: int), ws_item_sk (type: int)
outputColumnNames: _col0, _col1, _col2
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col0 (type: int)
outputColumnNames: _col1, _col2
input vertices:
1 Map 13
Reduce Output Operator
key expressions: _col2 (type: int)
sort order: +
Map-reduce partition columns: _col2 (type: int)
value expressions: _col1 (type: int)
Execution mode: vectorized
Map 13
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 624 Data size: 7488 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target Input: catalog_sales
Partition key expr: cs_sold_date_sk
Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
Target column: cs_sold_date_sk
Target Vertex: Map 10
Select Operator
expressions: _col0 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target Input: web_sales
Partition key expr: ws_sold_date_sk
Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
Target column: ws_sold_date_sk
Target Vertex: Map 12
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 14
Map Operator Tree:
TableScan
alias: item
filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: i_item_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int)
1 _col0 (type: int)
outputColumnNames: _col1
input vertices:
0 Union 11
Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: int)
sort order: +
Map-reduce partition columns: _col1 (type: int)
Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Execution mode: vectorized
Map 5
Map Operator Tree:
TableScan
alias: date_dim
filterExpr: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean)
Statistics: Num rows: 36524 Data size: 292192 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: d_date_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: _col0 (type: int)
outputColumnNames: _col0
Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
Group By Operator
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0
Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE
Dynamic Partitioning Event Operator
Target Input: store_sales
Partition key expr: ss_sold_date_sk
Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE
Target column: ss_sold_date_sk
Target Vertex: Map 1
Execution mode: vectorized
Map 6
Map Operator Tree:
TableScan
alias: customer_address
filterExpr: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean)
Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean)
Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: ca_address_sk (type: int), ca_county (type: string), ca_state (type: string)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col1 (type: string), _col2 (type: string)
1 _col0 (type: string), _col1 (type: string)
outputColumnNames: _col0
input vertices:
1 Map 7
Statistics: Num rows: 778829 Data size: 3115316 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col1 (type: int)
outputColumnNames: _col5
input vertices:
1 Reducer 9
Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col5 (type: int)
sort order: +
Map-reduce partition columns: _col5 (type: int)
Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Execution mode: vectorized
Map 7
Map Operator Tree:
TableScan
alias: store
filterExpr: (s_county is not null and s_state is not null) (type: boolean)
Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (s_county is not null and s_state is not null) (type: boolean)
Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: s_county (type: string), s_state (type: string)
outputColumnNames: _col0, _col1
Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
Reduce Output Operator
key expressions: _col0 (type: string), _col1 (type: string)
sort order: ++
Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
Execution mode: vectorized
Map 8
Map Operator Tree:
TableScan
alias: customer
filterExpr: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean)
Statistics: Num rows: 80000000 Data size: 68801615852 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean)
Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: c_customer_sk (type: int), c_current_addr_sk (type: int)
outputColumnNames: _col0, _col1
Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col0 (type: int)
1 _col1 (type: int)
outputColumnNames: _col0, _col1
input vertices:
1 Map 14
Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Group By Operator
keys: _col0 (type: int), _col1 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: int)
sort order: ++
Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Execution mode: vectorized
Reducer 2
Reduce Operator Tree:
Group By Operator
aggregations: sum(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: UDFToInteger((_col1 / 50.0)) (type: int)
outputColumnNames: _col0
Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
keys: _col0 (type: int)
mode: hash
outputColumnNames: _col0, _col1
Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int)
sort order: +
Map-reduce partition columns: _col0 (type: int)
Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
value expressions: _col1 (type: bigint)
Reducer 3
Reduce Operator Tree:
Group By Operator
aggregations: count(VALUE._col0)
keys: KEY._col0 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
Select Operator
expressions: _col0 (type: int), _col1 (type: bigint), (_col0 * 50) (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: _col0 (type: int), _col1 (type: bigint)
sort order: ++
Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
TopN Hash Memory Usage: 0.04
value expressions: _col2 (type: int)
Reducer 4
Reduce Operator Tree:
Select Operator
expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 (type: bigint), VALUE._col0 (type: int)
outputColumnNames: _col0, _col1, _col2
Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
Limit
Number of rows: 100
Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE
File Output Operator
compressed: false
Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
Reducer 9
Reduce Operator Tree:
Group By Operator
keys: KEY._col0 (type: int), KEY._col1 (type: int)
mode: mergepartial
outputColumnNames: _col0, _col1
Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: int)
sort order: +
Map-reduce partition columns: _col1 (type: int)
Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE
value expressions: _col0 (type: int)
Union 11
Vertex: Union 11
Stage: Stage-0
Fetch Operator
limit: 100
Processor Tree:
ListSink
{code}
In Map 14 Data size is 0
{code}
p 14
Map Operator Tree:
TableScan
alias: item
filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator
predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator
expressions: i_item_sk (type: int)
outputColumnNames: _col0
Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE
Map Join Operator
condition map:
Inner Join 0 to 1
keys:
0 _col2 (type: int)
1 _col0 (type: int)
outputColumnNames: _col1
input vertices:
0 Union 11
Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Reduce Output Operator
key expressions: _col1 (type: int)
sort order: +
Map-reduce partition columns: _col1 (type: int)
Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
Execution mode: vectorized
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)