You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Laljo John Pullokkaran (JIRA)" <ji...@apache.org> on 2014/10/11 03:55:36 UTC
[jira] [Commented] (HIVE-7985) With CBO enabled cross product is
generated when a subquery is present
[ https://issues.apache.org/jira/browse/HIVE-7985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14167938#comment-14167938 ]
Laljo John Pullokkaran commented on HIVE-7985:
----------------------------------------------
Need to wait for OPTIQ-438 to get checked in.
> With CBO enabled cross product is generated when a subquery is present
> ----------------------------------------------------------------------
>
> Key: HIVE-7985
> URL: https://issues.apache.org/jira/browse/HIVE-7985
> Project: Hive
> Issue Type: Bug
> Components: CBO
> Affects Versions: 0.14.0
> Reporter: Mostafa Mokhtar
> Assignee: Laljo John Pullokkaran
> Priority: Critical
> Fix For: 0.14.0
>
> Attachments: HIVE-7985.patch
>
>
> This is a regression introduced in the latest build of the CBO branch.
> Removing the subquery for item will remove the cross products
> Query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales from store_sales, date_dim, item where item.i_item_id in (select i.i_item_id from item i where i_color in ('purple','burlywood','indian')) and ss_item_sk = i_item_sk and ss_sold_date_sk = d_date_sk and d_year = 2001 and d_moy = 1 group by i_item_id;
> {code}
> {code}
> Warning: Map Join MAPJOIN[38][bigTable=?] in task 'Map 1' is a cross product
> Warning: Map Join MAPJOIN[39][bigTable=store_sales] in task 'Map 4' is a cross product
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> Edges:
> Map 1 <- Map 3 (BROADCAST_EDGE)
> Map 4 <- Map 1 (BROADCAST_EDGE), Map 2 (BROADCAST_EDGE)
> Reducer 5 <- Map 4 (SIMPLE_EDGE)
> DagName: mmokhtar_20140904141313_9c253f7e-aad1-4ca4-9be1-ea45e3d34496:1
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: item
> filterExpr: (true and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: i_item_id is not null (type: boolean)
> Statistics: Num rows: 231000 Data size: 331931080 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {i_item_sk} {i_item_id}
> 1 {d_date_sk}
> keys:
> 0
> 1
> outputColumnNames: _col0, _col1, _col25
> input vertices:
> 1 Map 3
> Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: int), _col1 (type: string), _col25 (type: int)
> outputColumnNames: _col0, _col1, _col25
> Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 254100 Data size: 365124192 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: int), _col1 (type: string), _col25 (type: int)
> Execution mode: vectorized
> Map 2
> Map Operator Tree:
> TableScan
> alias: i
> filterExpr: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 462000 Data size: 663862160 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: ((i_color) IN ('purple', 'burlywood', 'indian') and i_item_id is not null) (type: boolean)
> Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: i_item_id (type: string)
> outputColumnNames: _col0
> Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 115500 Data size: 165965540 Basic stats: COMPLETE Column stats: NONE
> Execution mode: vectorized
> Map 3
> Map Operator Tree:
> TableScan
> alias: date_dim
> filterExpr: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
> Statistics: Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: ((d_year = 2001) and (d_moy = 1)) (type: boolean)
> Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
> value expressions: d_date_sk (type: int)
> Execution mode: vectorized
> Map 4
> Map Operator Tree:
> TableScan
> alias: store_sales
> Statistics: Num rows: 82510879939 Data size: 7203833257964 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Inner Join 0 to 1
> condition expressions:
> 0 {ss_sold_date_sk} {ss_item_sk} {ss_ext_sales_price}
> 1 {_col0} {_col1} {_col25}
> keys:
> 0
> 1
> outputColumnNames: _col0, _col2, _col15, _col27, _col28, _col52
> input vertices:
> 1 Map 1
> Statistics: Num rows: 90761969664 Data size: 7924217282560 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: ((_col2 = _col27) and (_col0 = _col52)) (type: boolean)
> Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col15 (type: float), _col28 (type: string)
> outputColumnNames: _col15, _col59
> Statistics: Num rows: 22690492416 Data size: 1981054320640 Basic stats: COMPLETE Column stats: NONE
> Map Join Operator
> condition map:
> Left Semi Join 0 to 1
> condition expressions:
> 0 {_col15} {_col59}
> 1
> keys:
> 0 _col59 (type: string)
> 1 _col0 (type: string)
> outputColumnNames: _col15, _col59
> input vertices:
> 1 Map 2
> Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col59 (type: string), _col15 (type: float)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> aggregations: sum(_col1)
> keys: _col0 (type: string)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: string)
> sort order: +
> Map-reduce partition columns: _col0 (type: string)
> Statistics: Num rows: 24959541248 Data size: 2179159818240 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col1 (type: double)
> Execution mode: vectorized
> Reducer 5
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0)
> keys: KEY._col0 (type: string)
> mode: mergepartial
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: _col0 (type: string), _col1 (type: double)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 12479770624 Data size: 1089579909120 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.TextInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {code}
> No cross product generated for this query
> {code}
> select i_item_id,sum(ss_ext_sales_price) total_sales
> from
> store_sales,
> date_dim,
> item
> where i_color in ('purple','burlywood','indian')
> and ss_item_sk = i_item_sk
> and ss_sold_date_sk = d_date_sk
> and d_year = 2001
> and d_moy = 1
> group by i_item_id;
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)