You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Jesus Camacho Rodriguez (JIRA)" <ji...@apache.org> on 2016/05/26 14:32:12 UTC
[jira] [Updated] (HIVE-13816) Infer constants directly when we
create semijoin
[ https://issues.apache.org/jira/browse/HIVE-13816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jesus Camacho Rodriguez updated HIVE-13816:
-------------------------------------------
Target Version/s: (was: 2.1.0)
> Infer constants directly when we create semijoin
> ------------------------------------------------
>
> Key: HIVE-13816
> URL: https://issues.apache.org/jira/browse/HIVE-13816
> Project: Hive
> Issue Type: Sub-task
> Components: Parser
> Affects Versions: 2.1.0
> Reporter: Jesus Camacho Rodriguez
> Assignee: Jesus Camacho Rodriguez
>
> Follow-up on HIVE-13068.
> When we create a left semijoin, we could infer the constants from the SEL below when we create the GB to remove duplicates on the right hand side.
> Ex. ql/src/test/results/clientpositive/constprog_semijoin.q.out
> {noformat}
> explain select table1.id, table1.val, table1.val1 from table1 left semi join table3 on table1.dimid = table3.id and table3.id = 100 where table1.dimid = 100;
> {noformat}
> Plan:
> {noformat}
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Map Reduce
> Map Operator Tree:
> TableScan
> alias: table1
> Statistics: Num rows: 10 Data size: 200 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (((dimid = 100) = true) and (dimid = 100)) (type: boolean)
> Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: id (type: int), val (type: string), val1 (type: string)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: 100 (type: int), true (type: boolean)
> sort order: ++
> Map-reduce partition columns: 100 (type: int), true (type: boolean)
> Statistics: Num rows: 2 Data size: 40 Basic stats: COMPLETE Column stats: NONE
> value expressions: _col0 (type: int), _col1 (type: string), _col2 (type: string)
> TableScan
> alias: table3
> Statistics: Num rows: 5 Data size: 15 Basic stats: COMPLETE Column stats: NONE
> Filter Operator
> predicate: (((id = 100) = true) and (id = 100)) (type: boolean)
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
> Select Operator
> expressions: 100 (type: int), true (type: boolean)
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
> Group By Operator
> keys: _col0 (type: int), _col1 (type: boolean)
> mode: hash
> outputColumnNames: _col0, _col1
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> key expressions: _col0 (type: int), _col1 (type: boolean)
> sort order: ++
> Map-reduce partition columns: _col0 (type: int), _col1 (type: boolean)
> Statistics: Num rows: 1 Data size: 3 Basic stats: COMPLETE Column stats: NONE
> Reduce Operator Tree:
> Join Operator
> condition map:
> Left Semi Join 0 to 1
> keys:
> 0 100 (type: int), true (type: boolean)
> 1 _col0 (type: int), _col1 (type: boolean)
> outputColumnNames: _col0, _col1, _col2
> Statistics: Num rows: 2 Data size: 44 Basic stats: COMPLETE Column stats: NONE
> File Output Operator
> compressed: false
> Statistics: Num rows: 2 Data size: 44 Basic stats: COMPLETE Column stats: NONE
> table:
> input format: org.apache.hadoop.mapred.SequenceFileInputFormat
> output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)