You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2019/10/22 16:00:00 UTC

[jira] [Commented] (HIVE-16792) Estimate Rows When Joining BIGINT to INT Column

    [ https://issues.apache.org/jira/browse/HIVE-16792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957193#comment-16957193 ] 

David Mollitor commented on HIVE-16792:
---------------------------------------

Any updates on this watchers?

> Estimate Rows When Joining BIGINT to INT Column
> -----------------------------------------------
>
>                 Key: HIVE-16792
>                 URL: https://issues.apache.org/jira/browse/HIVE-16792
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 2.1.1
>            Reporter: David Mollitor
>            Priority: Minor
>
> {code:sql}
> create table test1
> (a int);
> create table test2
> (z bigint);
> INSERT INTO test1 VALUES (1);
> INSERT INTO test2 VALUES (2147483648);
> analyze table test1 compute statistics for columns;
> analyze table test2 compute statistics for columns;
> EXPLAIN SELECT * FROM test1 t1 INNER JOIN test2 t2 ON t1.a=t2.z;
> {code}
> {code}
> Explain
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 depends on stages: Stage-3
> ""
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         t2 
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         t2 
>           TableScan
>             alias: t2
>             filterExpr: z is not null (type: boolean)
>             Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
>             Filter Operator
>               predicate: z is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
>               HashTable Sink Operator
>                 keys:
>                   0 UDFToLong(a) (type: bigint)
>                   1 z (type: bigint)
>   Stage: Stage-3
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: t1
>             filterExpr: UDFToLong(a) is not null (type: boolean)
>             Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
>             Filter Operator
>               predicate: UDFToLong(a) is not null (type: boolean)
>               Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
>               Map Join Operator
>                 condition map:
>                      Inner Join 0 to 1
>                 keys:
>                   0 UDFToLong(a) (type: bigint)
>                   1 z (type: bigint)
>                 outputColumnNames: _col0, _col4"
>                 Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: int), _col4 (type: bigint)"
>                   outputColumnNames: _col0, _col1"
>                   Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: NONE
>                     table:
>                         input format: org.apache.hadoop.mapred.TextInputFormat
>                         output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {code}
> I would expect that perhaps Hive would be smart enough to know that this join is not going to produce any rows because the MIN VALUE of table test2 is more than INTEGER.MAX_VALUE.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)