You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Hari Sankar Sivarama Subramaniyan (JIRA)" <ji...@apache.org> on 2015/12/10 22:25:11 UTC

[jira] [Closed] (CALCITE-1017) hive.mapred.mode=strict throws an error even if the final plan does not have cartesian product in it.

     [ https://issues.apache.org/jira/browse/CALCITE-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hari Sankar Sivarama Subramaniyan closed CALCITE-1017.
------------------------------------------------------
    Resolution: Invalid

This is a Hive issue and I accidentally created the jira under Calcite. Sorry for the confusion.

> hive.mapred.mode=strict throws an error even if the final plan does not have cartesian product in it.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: CALCITE-1017
>                 URL: https://issues.apache.org/jira/browse/CALCITE-1017
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Hari Sankar Sivarama Subramaniyan
>            Assignee: Julian Hyde
>
> {code}
> Vertex dependency in root stage
> Reducer 10 <- Reducer 9 (SIMPLE_EDGE)
> Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 11 (SIMPLE_EDGE)
> Reducer 3 <- Map 12 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
> Reducer 4 <- Map 13 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
> Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
> Reducer 6 <- Map 15 (SIMPLE_EDGE), Reducer 5 (SIMPLE_EDGE)
> Reducer 7 <- Map 16 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
> Reducer 8 <- Map 17 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE)
> Reducer 9 <- Reducer 8 (SIMPLE_EDGE)
> Stage-0
>    Fetch Operator
>       limit:100
>       Stage-1
>          Reducer 10
>          File Output Operator [FS_63]
>             compressed:false
>             Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE Column stats: NONE
>             table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
>             Limit [LIM_62]
>                Number of rows:100
>                Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE Column stats: NONE
>                Select Operator [SEL_61]
>                |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"]
>                |  Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
>                |<-Reducer 9 [SIMPLE_EDGE]
>                   Reduce Output Operator [RS_60]
>                      key expressions:_col0 (type: string), _col1 (type: string), _col2 (type: string)
>                      sort order:+++
>                      Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
>                      value expressions:_col3 (type: bigint), _col4 (type: double), _col5 (type: double), _col6 (type: double), _col7 (type: bigint), _col8 (type: double), _col9 (type: double), _col10 (type: double), _col11 (type: bigint), _col12 (type: double), _col13 (type: double)
>                      Select Operator [SEL_58]
>                         outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
>                         Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
>                         Group By Operator [GBY_57]
>                         |  aggregations:["count(VALUE._col0)","avg(VALUE._col1)","stddev_samp(VALUE._col2)","count(VALUE._col3)","avg(VALUE._col4)","stddev_samp(VALUE._col5)","count(VALUE._col6)","avg(VALUE._col7)","stddev_samp(VALUE._col8)"]
>                         |  keys:KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string)
>                         |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
>                         |  Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
>                         |<-Reducer 8 [SIMPLE_EDGE]
>                            Reduce Output Operator [RS_56]
>                               key expressions:_col0 (type: string), _col1 (type: string), _col2 (type: string)
>                               Map-reduce partition columns:_col0 (type: string), _col1 (type: string), _col2 (type: string)
>                               sort order:+++
>                               Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
>                               value expressions:_col3 (type: bigint), _col4 (type: struct<count:bigint,sum:double,input:int>), _col5 (type: struct<count:bigint,sum:double,variance:double>), _col6 (type: bigint), _col7 (type: struct<count:bigint,sum:double,input:int>), _col8 (type: struct<count:bigint,sum:double,variance:double>), _col9 (type: bigint), _col10 (type: struct<count:bigint,sum:double,input:int>), _col11 (type: struct<count:bigint,sum:double,variance:double>)
>                               Group By Operator [GBY_55]
>                                  aggregations:["count(_col5)","avg(_col5)","stddev_samp(_col5)","count(_col10)","avg(_col10)","stddev_samp(_col10)","count(_col14)","avg(_col14)","stddev_samp(_col14)"]
>                                  keys:_col22 (type: string), _col24 (type: string), _col25 (type: string)
>                                  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
>                                  Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
>                                  Select Operator [SEL_54]
>                                     outputColumnNames:["_col22","_col24","_col25","_col5","_col10","_col14"]
>                                     Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
>                                     Merge Join Operator [MERGEJOIN_113]
>                                     |  condition map:[{"":"Inner Join 0 to 1"}]
>                                     |  keys:{"0":"_col1 (type: int)","1":"_col0 (type: int)"}
>                                     |  outputColumnNames:["_col5","_col10","_col14","_col22","_col24","_col25"]
>                                     |  Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
>                                     |<-Map 17 [SIMPLE_EDGE]
>                                     |  Reduce Output Operator [RS_52]
>                                     |     key expressions:_col0 (type: int)
>                                     |     Map-reduce partition columns:_col0 (type: int)
>                                     |     sort order:+
>                                     |     Statistics:Num rows: 231000 Data size: 331780228 Basic stats: COMPLETE Column stats: NONE
>                                     |     value expressions:_col1 (type: string), _col2 (type: string)
>                                     |     Select Operator [SEL_18]
>                                     |        outputColumnNames:["_col0","_col1","_col2"]
>                                     |        Statistics:Num rows: 231000 Data size: 331780228 Basic stats: COMPLETE Column stats: NONE
>                                     |        Filter Operator [FIL_106]
>                                     |           predicate:i_item_sk is not null (type: boolean)
>                                     |           Statistics:Num rows: 231000 Data size: 331780228 Basic stats: COMPLETE Column stats: NONE
>                                     |           TableScan [TS_17]
>                                     |              alias:item
>                                     |              Statistics:Num rows: 462000 Data size: 663560457 Basic stats: COMPLETE Column stats: NONE
>                                     |<-Reducer 7 [SIMPLE_EDGE]
>                                        Reduce Output Operator [RS_50]
>                                           key expressions:_col1 (type: int)
>                                           Map-reduce partition columns:_col1 (type: int)
>                                           sort order:+
>                                           Statistics:Num rows: 26735 Data size: 29919145 Basic stats: COMPLETE Column stats: NONE
>                                           value expressions:_col5 (type: int), _col10 (type: int), _col14 (type: int), _col22 (type: string)
>                                           Merge Join Operator [MERGEJOIN_112]
>                                           |  condition map:[{"":"Inner Join 0 to 1"}]
>                                           |  keys:{"0":"_col3 (type: int)","1":"_col0 (type: int)"}
>                                           |  outputColumnNames:["_col1","_col5","_col10","_col14","_col22"]
>                                           |  Statistics:Num rows: 26735 Data size: 29919145 Basic stats: COMPLETE Column stats: NONE
>                                           |<-Map 16 [SIMPLE_EDGE]
>                                           |  Reduce Output Operator [RS_47]
>                                           |     key expressions:_col0 (type: int)
>                                           |     Map-reduce partition columns:_col0 (type: int)
>                                           |     sort order:+
>                                           |     Statistics:Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
>                                           |     value expressions:_col1 (type: string)
>                                           |     Select Operator [SEL_16]
>                                           |        outputColumnNames:["_col0","_col1"]
>                                           |        Statistics:Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
>                                           |        Filter Operator [FIL_105]
>                                           |           predicate:s_store_sk is not null (type: boolean)
>                                           |           Statistics:Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
>                                           |           TableScan [TS_15]
>                                           |              alias:store
>                                           |              Statistics:Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: NONE
>                                           |<-Reducer 6 [SIMPLE_EDGE]
>                                              Reduce Output Operator [RS_45]
>                                                 key expressions:_col3 (type: int)
>                                                 Map-reduce partition columns:_col3 (type: int)
>                                                 sort order:+
>                                                 Statistics:Num rows: 24305 Data size: 27199223 Basic stats: COMPLETE Column stats: NONE
>                                                 value expressions:_col1 (type: int), _col5 (type: int), _col10 (type: int), _col14 (type: int)
>                                                 Merge Join Operator [MERGEJOIN_111]
>                                                 |  condition map:[{"":"Inner Join 0 to 1"}]
>                                                 |  keys:{"0":"_col11 (type: int)","1":"_col0 (type: int)"}
>                                                 |  outputColumnNames:["_col1","_col3","_col5","_col10","_col14"]
>                                                 |  Statistics:Num rows: 24305 Data size: 27199223 Basic stats: COMPLETE Column stats: NONE
>                                                 |<-Map 15 [SIMPLE_EDGE]
>                                                 |  Reduce Output Operator [RS_42]
>                                                 |     key expressions:_col0 (type: int)
>                                                 |     Map-reduce partition columns:_col0 (type: int)
>                                                 |     sort order:+
>                                                 |     Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                 |     Select Operator [SEL_14]
>                                                 |        outputColumnNames:["_col0"]
>                                                 |        Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                 |        Filter Operator [FIL_104]
>                                                 |           predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk is not null) (type: boolean)
>                                                 |           Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                 |           TableScan [TS_12]
>                                                 |              alias:d1
>                                                 |              Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
>                                                 |<-Reducer 5 [SIMPLE_EDGE]
>                                                    Reduce Output Operator [RS_40]
>                                                       key expressions:_col11 (type: int)
>                                                       Map-reduce partition columns:_col11 (type: int)
>                                                       sort order:+
>                                                       Statistics:Num rows: 22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE
>                                                       value expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col10 (type: int), _col14 (type: int)
>                                                       Merge Join Operator [MERGEJOIN_110]
>                                                       |  condition map:[{"":"Inner Join 0 to 1"}]
>                                                       |  keys:{"0":"_col6 (type: int)","1":"_col0 (type: int)"}
>                                                       |  outputColumnNames:["_col1","_col3","_col5","_col10","_col11","_col14"]
>                                                       |  Statistics:Num rows: 22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE
>                                                       |<-Map 14 [SIMPLE_EDGE]
>                                                       |  Reduce Output Operator [RS_37]
>                                                       |     key expressions:_col0 (type: int)
>                                                       |     Map-reduce partition columns:_col0 (type: int)
>                                                       |     sort order:+
>                                                       |     Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                       |     Select Operator [SEL_11]
>                                                       |        outputColumnNames:["_col0"]
>                                                       |        Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                       |        Filter Operator [FIL_103]
>                                                       |           predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk is not null) (type: boolean)
>                                                       |           Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                       |           TableScan [TS_9]
>                                                       |              alias:d1
>                                                       |              Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
>                                                       |<-Reducer 4 [SIMPLE_EDGE]
>                                                          Reduce Output Operator [RS_35]
>                                                             key expressions:_col6 (type: int)
>                                                             Map-reduce partition columns:_col6 (type: int)
>                                                             sort order:+
>                                                             Statistics:Num rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
>                                                             value expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col10 (type: int), _col11 (type: int), _col14 (type: int)
>                                                             Merge Join Operator [MERGEJOIN_109]
>                                                             |  condition map:[{"":"Inner Join 0 to 1"}]
>                                                             |  keys:{"0":"_col0 (type: int)","1":"_col0 (type: int)"}
>                                                             |  outputColumnNames:["_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
>                                                             |  Statistics:Num rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
>                                                             |<-Map 13 [SIMPLE_EDGE]
>                                                             |  Reduce Output Operator [RS_32]
>                                                             |     key expressions:_col0 (type: int)
>                                                             |     Map-reduce partition columns:_col0 (type: int)
>                                                             |     sort order:+
>                                                             |     Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                             |     Select Operator [SEL_8]
>                                                             |        outputColumnNames:["_col0"]
>                                                             |        Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                             |        Filter Operator [FIL_102]
>                                                             |           predicate:((d_quarter_name = '2000Q1') and d_date_sk is not null) (type: boolean)
>                                                             |           Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
>                                                             |           TableScan [TS_6]
>                                                             |              alias:d1
>                                                             |              Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
>                                                             |<-Reducer 3 [SIMPLE_EDGE]
>                                                                Reduce Output Operator [RS_30]
>                                                                   key expressions:_col0 (type: int)
>                                                                   Map-reduce partition columns:_col0 (type: int)
>                                                                   sort order:+
>                                                                   Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                   value expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col6 (type: int), _col10 (type: int), _col11 (type: int), _col14 (type: int)
>                                                                   Merge Join Operator [MERGEJOIN_108]
>                                                                   |  condition map:[{"":"Inner Join 0 to 1"}]
>                                                                   |  keys:{"0":"_col8 (type: int), _col7 (type: int)","1":"_col1 (type: int), _col2 (type: int)"}
>                                                                   |  outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
>                                                                   |  Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                   |<-Map 12 [SIMPLE_EDGE]
>                                                                   |  Reduce Output Operator [RS_27]
>                                                                   |     key expressions:_col1 (type: int), _col2 (type: int)
>                                                                   |     Map-reduce partition columns:_col1 (type: int), _col2 (type: int)
>                                                                   |     sort order:++
>                                                                   |     Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                   |     value expressions:_col0 (type: int), _col3 (type: int)
>                                                                   |     Select Operator [SEL_5]
>                                                                   |        outputColumnNames:["_col0","_col1","_col2","_col3"]
>                                                                   |        Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                   |        Filter Operator [FIL_101]
>                                                                   |           predicate:((cs_bill_customer_sk is not null and cs_item_sk is not null) and cs_sold_date_sk is not null) (type: boolean)
>                                                                   |           Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                   |           TableScan [TS_4]
>                                                                   |              alias:catalog_sales
>                                                                   |              Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                   |<-Reducer 2 [SIMPLE_EDGE]
>                                                                      Reduce Output Operator [RS_25]
>                                                                         key expressions:_col8 (type: int), _col7 (type: int)
>                                                                         Map-reduce partition columns:_col8 (type: int), _col7 (type: int)
>                                                                         sort order:++
>                                                                         Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                         value expressions:_col0 (type: int), _col1 (type: int), _col3 (type: int), _col5 (type: int), _col6 (type: int), _col10 (type: int)
>                                                                         Merge Join Operator [MERGEJOIN_107]
>                                                                         |  condition map:[{"":"Inner Join 0 to 1"}]
>                                                                         |  keys:{"0":"_col2 (type: int), _col1 (type: int), _col4 (type: int)","1":"_col2 (type: int), _col1 (type: int), _col3 (type: int)"}
>                                                                         |  outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col7","_col8","_col10"]
>                                                                         |  Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                         |<-Map 1 [SIMPLE_EDGE]
>                                                                         |  Reduce Output Operator [RS_20]
>                                                                         |     key expressions:_col2 (type: int), _col1 (type: int), _col4 (type: int)
>                                                                         |     Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col4 (type: int)
>                                                                         |     sort order:+++
>                                                                         |     Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                         |     value expressions:_col0 (type: int), _col3 (type: int), _col5 (type: int)
>                                                                         |     Select Operator [SEL_1]
>                                                                         |        outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5"]
>                                                                         |        Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                         |        Filter Operator [FIL_99]
>                                                                         |           predicate:((((ss_customer_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) and ss_sold_date_sk is not null) and ss_store_sk is not null) (type: boolean)
>                                                                         |           Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                         |           TableScan [TS_0]
>                                                                         |              alias:store_sales
>                                                                         |              Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                         |<-Map 11 [SIMPLE_EDGE]
>                                                                            Reduce Output Operator [RS_22]
>                                                                               key expressions:_col2 (type: int), _col1 (type: int), _col3 (type: int)
>                                                                               Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col3 (type: int)
>                                                                               sort order:+++
>                                                                               Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                               value expressions:_col0 (type: int), _col4 (type: int)
>                                                                               Select Operator [SEL_3]
>                                                                                  outputColumnNames:["_col0","_col1","_col2","_col3","_col4"]
>                                                                                  Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                                  Filter Operator [FIL_100]
>                                                                                     predicate:(((sr_customer_sk is not null and sr_item_sk is not null) and sr_ticket_number is not null) and sr_returned_date_sk is not null) (type: boolean)
>                                                                                     Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
>                                                                                     TableScan [TS_2]
>                                                                                        alias:store_returns
>                                                                                        Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
> {code}
> The query is :
> {code}
>  explain select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where web_sales.ws_item_sk = item.i_item_sk and item.i_category in ('Jewelry', 'Sports', 'Books') and web_sales.ws_sold_date_sk = date_dim.d_date_sk and date_dim.d_date between '2001-01-12' and '2001-02-11' group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100;
> {code}
> It seems that in SemanticAnalyzer.genJoinReduceSinkChild() we look for Join predicates only in 'ON' clause. If the join condition happens in 'WHERE' clause of the query, we aggressively throw an exception assuming this join is a cartesian product in strict mode. We should delay this check post physical optimizer until the plan is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)