You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Chaoyu Tang (JIRA)" <ji...@apache.org> on 2016/02/26 04:30:18 UTC
[jira] [Created] (HIVE-13164) Predicate pushdown may cause cross-product in left semi join

Chaoyu Tang created HIVE-13164:
----------------------------------

             Summary: Predicate pushdown may cause cross-product in left semi join
                 Key: HIVE-13164
                 URL: https://issues.apache.org/jira/browse/HIVE-13164
             Project: Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: Chaoyu Tang
            Assignee: Chaoyu Tang


For some left semi join queries like followings:
select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t2.value = 'val_0';
or 
select count(1) from (select value from t1 where key = 0) t1 left semi join (select value from t2 where key = 0) t2 on t1.value = 'val_0';
Their plans show that they have been converted to keyless cross-product due to the predicate pushdown and the dropping of the on condition.
{code}
LOGICAL PLAN:
t1:t1 
  TableScan (TS_0)
    alias: t1
    Statistics: Num rows: 1453 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
    Filter Operator (FIL_18)
      predicate: (key = 0) (type: boolean)
      Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
      Select Operator (SEL_2)
        Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
        Reduce Output Operator (RS_9)
          sort order: 
          Statistics: Num rows: 726 Data size: 2904 Basic stats: COMPLETE Column stats: NONE
          Join Operator (JOIN_11)
            condition map:
                 Left Semi Join 0 to 1
            keys:
              0 
              1 
            Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
            Group By Operator (GBY_13)
              aggregations: count(1)
              mode: hash
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
              Reduce Output Operator (RS_14)
                sort order: 
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                value expressions: _col0 (type: bigint)
                Group By Operator (GBY_15)
                  aggregations: count(VALUE._col0)
                  mode: mergepartial
                  outputColumnNames: _col0
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                  File Output Operator (FS_17)
                    compressed: false
                    Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                    table:
                        input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                        output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                        serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
t2:t2 
  TableScan (TS_3)
    alias: t2
    Statistics: Num rows: 645 Data size: 5812 Basic stats: COMPLETE Column stats: NONE
    Filter Operator (FIL_19)
      predicate: ((key = 0) and (value = 'val_0')) (type: boolean)
      Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
      Select Operator (SEL_5)
        Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
        Group By Operator (GBY_8)
          keys: 'val_0' (type: string)
          mode: hash
          outputColumnNames: _col0
          Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
          Reduce Output Operator (RS_10)
            sort order: 
            Statistics: Num rows: 161 Data size: 1450 Basic stats: COMPLETE Column stats: NONE
            Join Operator (JOIN_11)
              condition map:
                   Left Semi Join 0 to 1
              keys:
                0 
                1 
              Statistics: Num rows: 798 Data size: 3194 Basic stats: COMPLETE Column stats: NONE
{code}
[~gopalv], do you think these plans are valid or not? Thanks 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)