You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@calcite.apache.org by "Hari Sankar Sivarama Subramaniyan (JIRA)" <ji...@apache.org> on 2015/12/10 21:37:10 UTC

[jira] [Created] (CALCITE-1017) hive.mapred.mode=strict throws an error even if the final plan does not have cartesian product in it.

Hari Sankar Sivarama Subramaniyan created CALCITE-1017:
----------------------------------------------------------

             Summary: hive.mapred.mode=strict throws an error even if the final plan does not have cartesian product in it.
                 Key: CALCITE-1017
                 URL: https://issues.apache.org/jira/browse/CALCITE-1017
             Project: Calcite
          Issue Type: Bug
            Reporter: Hari Sankar Sivarama Subramaniyan
            Assignee: Julian Hyde


{code}
Vertex dependency in root stage
Reducer 10 <- Reducer 9 (SIMPLE_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 11 (SIMPLE_EDGE)
Reducer 3 <- Map 12 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Map 13 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Map 15 (SIMPLE_EDGE), Reducer 5 (SIMPLE_EDGE)
Reducer 7 <- Map 16 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
Reducer 8 <- Map 17 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE)
Reducer 9 <- Reducer 8 (SIMPLE_EDGE)

Stage-0
   Fetch Operator
      limit:100
      Stage-1
         Reducer 10
         File Output Operator [FS_63]
            compressed:false
            Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE Column stats: NONE
            table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
            Limit [LIM_62]
               Number of rows:100
               Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE Column stats: NONE
               Select Operator [SEL_61]
               |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"]
               |  Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
               |<-Reducer 9 [SIMPLE_EDGE]
                  Reduce Output Operator [RS_60]
                     key expressions:_col0 (type: string), _col1 (type: string), _col2 (type: string)
                     sort order:+++
                     Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
                     value expressions:_col3 (type: bigint), _col4 (type: double), _col5 (type: double), _col6 (type: double), _col7 (type: bigint), _col8 (type: double), _col9 (type: double), _col10 (type: double), _col11 (type: bigint), _col12 (type: double), _col13 (type: double)
                     Select Operator [SEL_58]
                        outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
                        Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
                        Group By Operator [GBY_57]
                        |  aggregations:["count(VALUE._col0)","avg(VALUE._col1)","stddev_samp(VALUE._col2)","count(VALUE._col3)","avg(VALUE._col4)","stddev_samp(VALUE._col5)","count(VALUE._col6)","avg(VALUE._col7)","stddev_samp(VALUE._col8)"]
                        |  keys:KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2 (type: string)
                        |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
                        |  Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column stats: NONE
                        |<-Reducer 8 [SIMPLE_EDGE]
                           Reduce Output Operator [RS_56]
                              key expressions:_col0 (type: string), _col1 (type: string), _col2 (type: string)
                              Map-reduce partition columns:_col0 (type: string), _col1 (type: string), _col2 (type: string)
                              sort order:+++
                              Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
                              value expressions:_col3 (type: bigint), _col4 (type: struct<count:bigint,sum:double,input:int>), _col5 (type: struct<count:bigint,sum:double,variance:double>), _col6 (type: bigint), _col7 (type: struct<count:bigint,sum:double,input:int>), _col8 (type: struct<count:bigint,sum:double,variance:double>), _col9 (type: bigint), _col10 (type: struct<count:bigint,sum:double,input:int>), _col11 (type: struct<count:bigint,sum:double,variance:double>)
                              Group By Operator [GBY_55]
                                 aggregations:["count(_col5)","avg(_col5)","stddev_samp(_col5)","count(_col10)","avg(_col10)","stddev_samp(_col10)","count(_col14)","avg(_col14)","stddev_samp(_col14)"]
                                 keys:_col22 (type: string), _col24 (type: string), _col25 (type: string)
                                 outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
                                 Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
                                 Select Operator [SEL_54]
                                    outputColumnNames:["_col22","_col24","_col25","_col5","_col10","_col14"]
                                    Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
                                    Merge Join Operator [MERGEJOIN_113]
                                    |  condition map:[{"":"Inner Join 0 to 1"}]
                                    |  keys:{"0":"_col1 (type: int)","1":"_col0 (type: int)"}
                                    |  outputColumnNames:["_col5","_col10","_col14","_col22","_col24","_col25"]
                                    |  Statistics:Num rows: 254100 Data size: 364958258 Basic stats: COMPLETE Column stats: NONE
                                    |<-Map 17 [SIMPLE_EDGE]
                                    |  Reduce Output Operator [RS_52]
                                    |     key expressions:_col0 (type: int)
                                    |     Map-reduce partition columns:_col0 (type: int)
                                    |     sort order:+
                                    |     Statistics:Num rows: 231000 Data size: 331780228 Basic stats: COMPLETE Column stats: NONE
                                    |     value expressions:_col1 (type: string), _col2 (type: string)
                                    |     Select Operator [SEL_18]
                                    |        outputColumnNames:["_col0","_col1","_col2"]
                                    |        Statistics:Num rows: 231000 Data size: 331780228 Basic stats: COMPLETE Column stats: NONE
                                    |        Filter Operator [FIL_106]
                                    |           predicate:i_item_sk is not null (type: boolean)
                                    |           Statistics:Num rows: 231000 Data size: 331780228 Basic stats: COMPLETE Column stats: NONE
                                    |           TableScan [TS_17]
                                    |              alias:item
                                    |              Statistics:Num rows: 462000 Data size: 663560457 Basic stats: COMPLETE Column stats: NONE
                                    |<-Reducer 7 [SIMPLE_EDGE]
                                       Reduce Output Operator [RS_50]
                                          key expressions:_col1 (type: int)
                                          Map-reduce partition columns:_col1 (type: int)
                                          sort order:+
                                          Statistics:Num rows: 26735 Data size: 29919145 Basic stats: COMPLETE Column stats: NONE
                                          value expressions:_col5 (type: int), _col10 (type: int), _col14 (type: int), _col22 (type: string)
                                          Merge Join Operator [MERGEJOIN_112]
                                          |  condition map:[{"":"Inner Join 0 to 1"}]
                                          |  keys:{"0":"_col3 (type: int)","1":"_col0 (type: int)"}
                                          |  outputColumnNames:["_col1","_col5","_col10","_col14","_col22"]
                                          |  Statistics:Num rows: 26735 Data size: 29919145 Basic stats: COMPLETE Column stats: NONE
                                          |<-Map 16 [SIMPLE_EDGE]
                                          |  Reduce Output Operator [RS_47]
                                          |     key expressions:_col0 (type: int)
                                          |     Map-reduce partition columns:_col0 (type: int)
                                          |     sort order:+
                                          |     Statistics:Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
                                          |     value expressions:_col1 (type: string)
                                          |     Select Operator [SEL_16]
                                          |        outputColumnNames:["_col0","_col1"]
                                          |        Statistics:Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
                                          |        Filter Operator [FIL_105]
                                          |           predicate:s_store_sk is not null (type: boolean)
                                          |           Statistics:Num rows: 852 Data size: 1628138 Basic stats: COMPLETE Column stats: NONE
                                          |           TableScan [TS_15]
                                          |              alias:store
                                          |              Statistics:Num rows: 1704 Data size: 3256276 Basic stats: COMPLETE Column stats: NONE
                                          |<-Reducer 6 [SIMPLE_EDGE]
                                             Reduce Output Operator [RS_45]
                                                key expressions:_col3 (type: int)
                                                Map-reduce partition columns:_col3 (type: int)
                                                sort order:+
                                                Statistics:Num rows: 24305 Data size: 27199223 Basic stats: COMPLETE Column stats: NONE
                                                value expressions:_col1 (type: int), _col5 (type: int), _col10 (type: int), _col14 (type: int)
                                                Merge Join Operator [MERGEJOIN_111]
                                                |  condition map:[{"":"Inner Join 0 to 1"}]
                                                |  keys:{"0":"_col11 (type: int)","1":"_col0 (type: int)"}
                                                |  outputColumnNames:["_col1","_col3","_col5","_col10","_col14"]
                                                |  Statistics:Num rows: 24305 Data size: 27199223 Basic stats: COMPLETE Column stats: NONE
                                                |<-Map 15 [SIMPLE_EDGE]
                                                |  Reduce Output Operator [RS_42]
                                                |     key expressions:_col0 (type: int)
                                                |     Map-reduce partition columns:_col0 (type: int)
                                                |     sort order:+
                                                |     Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                |     Select Operator [SEL_14]
                                                |        outputColumnNames:["_col0"]
                                                |        Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                |        Filter Operator [FIL_104]
                                                |           predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk is not null) (type: boolean)
                                                |           Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                |           TableScan [TS_12]
                                                |              alias:d1
                                                |              Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                                                |<-Reducer 5 [SIMPLE_EDGE]
                                                   Reduce Output Operator [RS_40]
                                                      key expressions:_col11 (type: int)
                                                      Map-reduce partition columns:_col11 (type: int)
                                                      sort order:+
                                                      Statistics:Num rows: 22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE
                                                      value expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col10 (type: int), _col14 (type: int)
                                                      Merge Join Operator [MERGEJOIN_110]
                                                      |  condition map:[{"":"Inner Join 0 to 1"}]
                                                      |  keys:{"0":"_col6 (type: int)","1":"_col0 (type: int)"}
                                                      |  outputColumnNames:["_col1","_col3","_col5","_col10","_col11","_col14"]
                                                      |  Statistics:Num rows: 22096 Data size: 24726566 Basic stats: COMPLETE Column stats: NONE
                                                      |<-Map 14 [SIMPLE_EDGE]
                                                      |  Reduce Output Operator [RS_37]
                                                      |     key expressions:_col0 (type: int)
                                                      |     Map-reduce partition columns:_col0 (type: int)
                                                      |     sort order:+
                                                      |     Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |     Select Operator [SEL_11]
                                                      |        outputColumnNames:["_col0"]
                                                      |        Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |        Filter Operator [FIL_103]
                                                      |           predicate:((d_quarter_name) IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk is not null) (type: boolean)
                                                      |           Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |           TableScan [TS_9]
                                                      |              alias:d1
                                                      |              Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                                                      |<-Reducer 4 [SIMPLE_EDGE]
                                                         Reduce Output Operator [RS_35]
                                                            key expressions:_col6 (type: int)
                                                            Map-reduce partition columns:_col6 (type: int)
                                                            sort order:+
                                                            Statistics:Num rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
                                                            value expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col10 (type: int), _col11 (type: int), _col14 (type: int)
                                                            Merge Join Operator [MERGEJOIN_109]
                                                            |  condition map:[{"":"Inner Join 0 to 1"}]
                                                            |  keys:{"0":"_col0 (type: int)","1":"_col0 (type: int)"}
                                                            |  outputColumnNames:["_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
                                                            |  Statistics:Num rows: 20088 Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
                                                            |<-Map 13 [SIMPLE_EDGE]
                                                            |  Reduce Output Operator [RS_32]
                                                            |     key expressions:_col0 (type: int)
                                                            |     Map-reduce partition columns:_col0 (type: int)
                                                            |     sort order:+
                                                            |     Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                            |     Select Operator [SEL_8]
                                                            |        outputColumnNames:["_col0"]
                                                            |        Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                            |        Filter Operator [FIL_102]
                                                            |           predicate:((d_quarter_name = '2000Q1') and d_date_sk is not null) (type: boolean)
                                                            |           Statistics:Num rows: 18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                            |           TableScan [TS_6]
                                                            |              alias:d1
                                                            |              Statistics:Num rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                                                            |<-Reducer 3 [SIMPLE_EDGE]
                                                               Reduce Output Operator [RS_30]
                                                                  key expressions:_col0 (type: int)
                                                                  Map-reduce partition columns:_col0 (type: int)
                                                                  sort order:+
                                                                  Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  value expressions:_col1 (type: int), _col3 (type: int), _col5 (type: int), _col6 (type: int), _col10 (type: int), _col11 (type: int), _col14 (type: int)
                                                                  Merge Join Operator [MERGEJOIN_108]
                                                                  |  condition map:[{"":"Inner Join 0 to 1"}]
                                                                  |  keys:{"0":"_col8 (type: int), _col7 (type: int)","1":"_col1 (type: int), _col2 (type: int)"}
                                                                  |  outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
                                                                  |  Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |<-Map 12 [SIMPLE_EDGE]
                                                                  |  Reduce Output Operator [RS_27]
                                                                  |     key expressions:_col1 (type: int), _col2 (type: int)
                                                                  |     Map-reduce partition columns:_col1 (type: int), _col2 (type: int)
                                                                  |     sort order:++
                                                                  |     Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |     value expressions:_col0 (type: int), _col3 (type: int)
                                                                  |     Select Operator [SEL_5]
                                                                  |        outputColumnNames:["_col0","_col1","_col2","_col3"]
                                                                  |        Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |        Filter Operator [FIL_101]
                                                                  |           predicate:((cs_bill_customer_sk is not null and cs_item_sk is not null) and cs_sold_date_sk is not null) (type: boolean)
                                                                  |           Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |           TableScan [TS_4]
                                                                  |              alias:catalog_sales
                                                                  |              Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |<-Reducer 2 [SIMPLE_EDGE]
                                                                     Reduce Output Operator [RS_25]
                                                                        key expressions:_col8 (type: int), _col7 (type: int)
                                                                        Map-reduce partition columns:_col8 (type: int), _col7 (type: int)
                                                                        sort order:++
                                                                        Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        value expressions:_col0 (type: int), _col1 (type: int), _col3 (type: int), _col5 (type: int), _col6 (type: int), _col10 (type: int)
                                                                        Merge Join Operator [MERGEJOIN_107]
                                                                        |  condition map:[{"":"Inner Join 0 to 1"}]
                                                                        |  keys:{"0":"_col2 (type: int), _col1 (type: int), _col4 (type: int)","1":"_col2 (type: int), _col1 (type: int), _col3 (type: int)"}
                                                                        |  outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col7","_col8","_col10"]
                                                                        |  Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |<-Map 1 [SIMPLE_EDGE]
                                                                        |  Reduce Output Operator [RS_20]
                                                                        |     key expressions:_col2 (type: int), _col1 (type: int), _col4 (type: int)
                                                                        |     Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col4 (type: int)
                                                                        |     sort order:+++
                                                                        |     Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |     value expressions:_col0 (type: int), _col3 (type: int), _col5 (type: int)
                                                                        |     Select Operator [SEL_1]
                                                                        |        outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5"]
                                                                        |        Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |        Filter Operator [FIL_99]
                                                                        |           predicate:((((ss_customer_sk is not null and ss_item_sk is not null) and ss_ticket_number is not null) and ss_sold_date_sk is not null) and ss_store_sk is not null) (type: boolean)
                                                                        |           Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |           TableScan [TS_0]
                                                                        |              alias:store_sales
                                                                        |              Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |<-Map 11 [SIMPLE_EDGE]
                                                                           Reduce Output Operator [RS_22]
                                                                              key expressions:_col2 (type: int), _col1 (type: int), _col3 (type: int)
                                                                              Map-reduce partition columns:_col2 (type: int), _col1 (type: int), _col3 (type: int)
                                                                              sort order:+++
                                                                              Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                              value expressions:_col0 (type: int), _col4 (type: int)
                                                                              Select Operator [SEL_3]
                                                                                 outputColumnNames:["_col0","_col1","_col2","_col3","_col4"]
                                                                                 Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                                 Filter Operator [FIL_100]
                                                                                    predicate:(((sr_customer_sk is not null and sr_item_sk is not null) and sr_ticket_number is not null) and sr_returned_date_sk is not null) (type: boolean)
                                                                                    Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                                    TableScan [TS_2]
                                                                                       alias:store_returns
                                                                                       Statistics:Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
{code}

The query is :
{code}
 explain select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id ,sum(ws_ext_sales_price) as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by i_class) as revenueratio from web_sales ,item ,date_dim where web_sales.ws_item_sk = item.i_item_sk and item.i_category in ('Jewelry', 'Sports', 'Books') and web_sales.ws_sold_date_sk = date_dim.d_date_sk and date_dim.d_date between '2001-01-12' and '2001-02-11' group by i_item_id ,i_item_desc ,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc ,revenueratio limit 100;
{code}

It seems that in SemanticAnalyzer.genJoinReduceSinkChild() we do not distinguish look for Join predicates only in 'ON' clause. If the join condition happens in 'WHERE' clause of the query, we aggressively throw an exception assuming this join is a cartesian product in strict mode. We should delay this check post physical optimizer until the plan is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)