You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2015/06/16 01:02:02 UTC
[jira] [Updated] (HIVE-10107) Union All : Vertex missing stats
resulting in OOM and in-efficient plans
[ https://issues.apache.org/jira/browse/HIVE-10107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar updated HIVE-10107:
-----------------------------------
Fix Version/s: 1.2.1
> Union All : Vertex missing stats resulting in OOM and in-efficient plans
> ------------------------------------------------------------------------
>
> Key: HIVE-10107
> URL: https://issues.apache.org/jira/browse/HIVE-10107
> Project: Hive
> Issue Type: Bug
> Components: Physical Optimizer
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Pengcheng Xiong
> Fix For: 1.2.1
>
>
> Reducer Vertices sending data to a Union all edge are missing statistics and as a result we either use very few reducers in the UNION ALL edge or decide to broadcast the results of UNION ALL.
> Query
> {code}
> select
> count(*) rowcount
> from
> (select
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales a, store_returns b
> where
> a.ss_item_sk = b.sr_item_sk
> and a.ss_ticket_number = b.sr_ticket_number union all select
> ss_item_sk, ss_ticket_number, ss_store_sk
> from
> store_sales c, store_returns d
> where
> c.ss_item_sk = d.sr_item_sk
> and c.ss_ticket_number = d.sr_ticket_number) t
> group by t.ss_store_sk , t.ss_item_sk , t.ss_ticket_number
> having rowcount > 100000000;
> {code}
> Plan snippet
> {code}
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
> Reducer 4 <- Union 3 (SIMPLE_EDGE)
> Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)
> Reducer 4
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (_col3 > 100000000) (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
> Select Operator
> expressions: _col3 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Reducer 7
> Reduce Operator Tree:
> Merge Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 ss_item_sk (type: int), ss_ticket_number (type: int)
> 1 sr_item_sk (type: int), sr_ticket_number (type: int)
> outputColumnNames: _col1, _col6, _col8, _col27, _col34
> Filter Operator
> predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
> Select Operator
> expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Group By Operator
> aggregations: count()
> keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
> sort order: +++
> Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
> value expressions: _col3 (type: bigint)
> {code}
> The full explain plan
> {code}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 5 (SIMPLE_EDGE), Union 3 (CONTAINS)
> Reducer 4 <- Union 3 (SIMPLE_EDGE)
> Reducer 7 <- Map 6 (SIMPLE_EDGE), Map 8 (SIMPLE_EDGE), Union 3 (CONTAINS)
> DagName: mmokhtar_20150214132727_95878ea1-ee6a-4b7e-bc86-843abd5cf664:7
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: a
> filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
> sort order: ++
> Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
> Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: ss_store_sk (type: int)
> Map 5
> Map Operator Tree:
> TableScan
> alias: b
> filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
> sort order: ++
> Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
> Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
> Map 6
> Map Operator Tree:
> TableScan
> alias: c
> filterExpr: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 550076554 Data size: 47370018896 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (ss_item_sk is not null and ss_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: ss_item_sk (type: int), ss_ticket_number (type: int)
> sort order: ++
> Map-reduce partition columns: ss_item_sk (type: int), ss_ticket_number (type: int)
> Statistics: Num rows: 550076554 Data size: 6549093948 Basic stats: COMPLETE Column stats: COMPLETE
> value expressions: ss_store_sk (type: int)
> Map 8
> Map Operator Tree:
> TableScan
> alias: d
> filterExpr: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 55578005 Data size: 4155315616 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (sr_item_sk is not null and sr_ticket_number is not null) (type: boolean)
> Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: sr_item_sk (type: int), sr_ticket_number (type: int)
> sort order: ++
> Map-reduce partition columns: sr_item_sk (type: int), sr_ticket_number (type: int)
> Statistics: Num rows: 55578005 Data size: 444624040 Basic stats: COMPLETE Column stats: COMPLETE
> Reducer 2
> Reduce Operator Tree:
> Merge Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 ss_item_sk (type: int), ss_ticket_number (type: int)
> 1 sr_item_sk (type: int), sr_ticket_number (type: int)
> outputColumnNames: _col1, _col6, _col8, _col27, _col34
> Filter Operator
> predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
> Select Operator
> expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Group By Operator
> aggregations: count()
> keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
> sort order: +++
> Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
> value expressions: _col3 (type: bigint)
> Reducer 4
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int), KEY._col1 (type: int), KEY._col2 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1, _col2, _col3
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (_col3 > 100000000) (type: boolean)
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
> Select Operator
> expressions: _col3 (type: bigint)
> outputColumnNames: _col0
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 0 Data size: 0 Basic stats: NONE Column stats: COMPLETE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Reducer 7
> Reduce Operator Tree:
> Merge Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 ss_item_sk (type: int), ss_ticket_number (type: int)
> 1 sr_item_sk (type: int), sr_ticket_number (type: int)
> outputColumnNames: _col1, _col6, _col8, _col27, _col34
> Filter Operator
> predicate: ((_col1 = _col27) and (_col8 = _col34)) (type: boolean)
> Select Operator
> expressions: _col1 (type: int), _col8 (type: int), _col6 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Group By Operator
> aggregations: count()
> keys: _col2 (type: int), _col0 (type: int), _col1 (type: int)
> mode: hash
> outputColumnNames: _col0, _col1, _col2, _col3
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int), _col2 (type: int)
> sort order: +++
> Map-reduce partition columns: _col0 (type: int), _col1 (type: int), _col2 (type: int)
> value expressions: _col3 (type: bigint)
> Union 3
> Vertex: Union 3
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> Also TPC-DS Q54 fails with OOM, this failure happens when we chose a different plan.
> The OOM happens in vertexName=Map 14
> {code}
> explain
> with my_customers as (
> select c_customer_sk
> , c_current_addr_sk
> from
> ( select cs_sold_date_sk sold_date_sk,
> cs_bill_customer_sk customer_sk,
> cs_item_sk item_sk
> from catalog_sales
> union all
> select ws_sold_date_sk sold_date_sk,
> ws_bill_customer_sk customer_sk,
> ws_item_sk item_sk
> from web_sales
> ) cs_or_ws_sales,
> item,
> date_dim,
> customer
> where sold_date_sk = d_date_sk
> and item_sk = i_item_sk
> and i_category = 'Jewelry'
> and i_class = 'football'
> and c_customer_sk = cs_or_ws_sales.customer_sk
> and d_moy = 3
> and d_year = 2000
> group by c_customer_sk
> , c_current_addr_sk
> )
> , my_revenue as (
> select c_customer_sk,
> sum(ss_ext_sales_price) as revenue
> from my_customers,
> store_sales,
> customer_address,
> store,
> date_dim
> where c_current_addr_sk = ca_address_sk
> and ca_county = s_county
> and ca_state = s_state
> and ss_sold_date_sk = d_date_sk
> and c_customer_sk = ss_customer_sk
> and d_month_seq between (1203)
> and (1205)
> group by c_customer_sk
> )
> , segments as
> (select cast((revenue/50) as int) as segment
> from my_revenue
> )
> select segment, count(*) as num_customers, segment*50 as segment_base
> from segments
> group by segment
> order by segment, num_customers
> limit 100
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 5 (BROADCAST_EDGE), Map 6 (BROADCAST_EDGE)
> Map 10 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
> Map 12 <- Map 13 (BROADCAST_EDGE), Union 11 (CONTAINS)
> Map 14 <- Union 11 (BROADCAST_EDGE)
> Map 6 <- Map 7 (BROADCAST_EDGE), Reducer 9 (BROADCAST_EDGE)
> Map 8 <- Map 14 (BROADCAST_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)
> Reducer 4 <- Reducer 3 (SIMPLE_EDGE)
> Reducer 9 <- Map 8 (SIMPLE_EDGE)
> DagName: mmokhtar_20150208232525_9976b56b-8f4b-48c8-a909-aa653c20051c:1
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> filterExpr: ss_customer_sk is not null (type: boolean)
> Statistics: Num rows: 82510879939 Data size: 6873789738208 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ss_customer_sk is not null (type: boolean)
> Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: ss_customer_sk (type: int), ss_ext_sales_price (type: float), ss_sold_date_sk (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 80566020964 Data size: 951594129356 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col2 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col0, _col1
> input vertices:
> 1 Map 5
> Statistics: Num rows: 90081226648 Data size: 720649813184 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col5 (type: int)
> outputColumnNames: _col1, _col10
> input vertices:
> 1 Map 6
> Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col10 (type: int), _col1 (type: float)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> aggregations: sum(_col1)
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 99089351460 Data size: 792714811684 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: double)
> Execution mode: vectorized
> Map 10
> Map Operator Tree:
> TableScan
> alias: catalog_sales
> filterExpr: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean)
> Filter Operator
> predicate: (cs_item_sk is not null and cs_bill_customer_sk is not null) (type: boolean)
> Select Operator
> expressions: cs_sold_date_sk (type: int), cs_bill_customer_sk (type: int), cs_item_sk (type: int)
> outputColumnNames: _col0, _col1, _col2
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col1, _col2
> input vertices:
> 1 Map 13
> Reduce Output Operator
> key expressions: _col2 (type: int)
> sort order: +
> Map-reduce partition columns: _col2 (type: int)
> value expressions: _col1 (type: int)
> Execution mode: vectorized
> Map 12
> Map Operator Tree:
> TableScan
> alias: web_sales
> filterExpr: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean)
> Filter Operator
> predicate: (ws_item_sk is not null and ws_bill_customer_sk is not null) (type: boolean)
> Select Operator
> expressions: ws_sold_date_sk (type: int), ws_bill_customer_sk (type: int), ws_item_sk (type: int)
> outputColumnNames: _col0, _col1, _col2
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col1, _col2
> input vertices:
> 1 Map 13
> Reduce Output Operator
> key expressions: _col2 (type: int)
> sort order: +
> Map-reduce partition columns: _col2 (type: int)
> value expressions: _col1 (type: int)
> Execution mode: vectorized
> Map 13
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((d_moy = 3) and (d_year = 2000)) and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 624 Data size: 7488 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: catalog_sales
> Partition key expr: cs_sold_date_sk
> Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
> Target column: cs_sold_date_sk
> Target Vertex: Map 10
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: web_sales
> Partition key expr: ws_sold_date_sk
> Statistics: Num rows: 312 Data size: 1248 Basic stats: COMPLETE Column stats: COMPLETE
> Target column: ws_sold_date_sk
> Target Vertex: Map 12
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 624 Data size: 2496 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 14
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
> Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: i_item_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col2 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col1
> input vertices:
> 0 Union 11
> Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Reduce Output Operator
> key expressions: _col1 (type: int)
> sort order: +
> Map-reduce partition columns: _col1 (type: int)
> Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Execution mode: vectorized
> Map 5
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (d_month_seq BETWEEN 1203 AND 1205 and d_date_sk is not null) (type: boolean)
> Statistics: Num rows: 36524 Data size: 292192 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: d_date_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: _col0 (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 36524 Data size: 146096 Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE
> Dynamic Partitioning Event Operator
> Target Input: store_sales
> Partition key expr: ss_sold_date_sk
> Statistics: Num rows: 18262 Data size: 73048 Basic stats: COMPLETE Column stats: COMPLETE
> Target column: ss_sold_date_sk
> Target Vertex: Map 1
> Execution mode: vectorized
> Map 6
> Map Operator Tree:
> TableScan
> alias: customer_address
> filterExpr: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean)
> Statistics: Num rows: 40000000 Data size: 40595195284 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: ((ca_county is not null and ca_state is not null) and ca_address_sk is not null) (type: boolean)
> Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: ca_address_sk (type: int), ca_county (type: string), ca_state (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 40000000 Data size: 7520000000 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col1 (type: string), _col2 (type: string)
> 1 _col0 (type: string), _col1 (type: string)
> outputColumnNames: _col0
> input vertices:
> 1 Map 7
> Statistics: Num rows: 778829 Data size: 3115316 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col1 (type: int)
> outputColumnNames: _col5
> input vertices:
> 1 Reducer 9
> Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Reduce Output Operator
> key expressions: _col5 (type: int)
> sort order: +
> Map-reduce partition columns: _col5 (type: int)
> Statistics: Num rows: 47909545988 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Execution mode: vectorized
> Map 7
> Map Operator Tree:
> TableScan
> alias: store
> filterExpr: (s_county is not null and s_state is not null) (type: boolean)
> Statistics: Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (s_county is not null and s_state is not null) (type: boolean)
> Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: s_county (type: string), s_state (type: string)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> key expressions: _col0 (type: string), _col1 (type: string)
> sort order: ++
> Map-reduce partition columns: _col0 (type: string), _col1 (type: string)
> Statistics: Num rows: 1704 Data size: 313536 Basic stats: COMPLETE Column stats: COMPLETE
> Execution mode: vectorized
> Map 8
> Map Operator Tree:
> TableScan
> alias: customer
> filterExpr: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean)
> Statistics: Num rows: 80000000 Data size: 68801615852 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (c_customer_sk is not null and c_current_addr_sk is not null) (type: boolean)
> Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: c_customer_sk (type: int), c_current_addr_sk (type: int)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 80000000 Data size: 640000000 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col0 (type: int)
> 1 _col1 (type: int)
> outputColumnNames: _col0, _col1
> input vertices:
> 1 Map 14
> Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Group By Operator
> keys: _col0 (type: int), _col1 (type: int)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: int)
> sort order: ++
> Map-reduce partition columns: _col0 (type: int), _col1 (type: int)
> Statistics: Num rows: 87108263547 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Execution mode: vectorized
> Reducer 2
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0)
> keys: KEY._col0 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: UDFToInteger((_col1 / 50.0)) (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> aggregations: count()
> keys: _col0 (type: int)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: int)
> sort order: +
> Map-reduce partition columns: _col0 (type: int)
> Statistics: Num rows: 49544675730 Data size: 396357405842 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: bigint)
> Reducer 3
> Reduce Operator Tree:
> Group By Operator
> aggregations: count(VALUE._col0)
> keys: KEY._col0 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: int), _col1 (type: bigint), (_col0 * 50) (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: bigint)
> sort order: ++
> Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
> TopN Hash Memory Usage: 0.04
> value expressions: _col2 (type: int)
> Reducer 4
> Reduce Operator Tree:
> Select Operator
> expressions: KEY.reducesinkkey0 (type: int), KEY.reducesinkkey1 (type: bigint), VALUE._col0 (type: int)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 24772337865 Data size: 198178702921 Basic stats: COMPLETE Column stats: NONE
> Limit
> Number of rows: 100
> Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 100 Data size: 800 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Reducer 9
> Reduce Operator Tree:
> Group By Operator
> keys: KEY._col0 (type: int), KEY._col1 (type: int)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Reduce Output Operator
> key expressions: _col1 (type: int)
> sort order: +
> Map-reduce partition columns: _col1 (type: int)
> Statistics: Num rows: 43554131773 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> value expressions: _col0 (type: int)
> Union 11
> Vertex: Union 11
> Stage: Stage-0
> Fetch Operator
> limit: 100
> Processor Tree:
> ListSink
> {code}
> In Map 14 Data size is 0
> {code}
> p 14
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: COMPLETE
> Filter Operator
> predicate: (((i_category = 'Jewelry') and (i_class = 'football')) and i_item_sk is not null) (type: boolean)
> Statistics: Num rows: 4200 Data size: 781200 Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: i_item_sk (type: int)
> outputColumnNames: _col0
> Statistics: Num rows: 4200 Data size: 16800 Basic stats: COMPLETE Column stats: COMPLETE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> keys:
> 0 _col2 (type: int)
> 1 _col0 (type: int)
> outputColumnNames: _col1
> input vertices:
> 0 Union 11
> Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Reduce Output Operator
> key expressions: _col1 (type: int)
> sort order: +
> Map-reduce partition columns: _col1 (type: int)
> Statistics: Num rows: 79189328781 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> Execution mode: vectorized
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)