You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Zoltan Haindrich (Jira)" <ji...@apache.org> on 2020/01/31 15:43:00 UTC

[jira] [Updated] (HIVE-22811) Stat based min/max aggregates are not serviced from colstats in nested cases

     [ https://issues.apache.org/jira/browse/HIVE-22811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zoltan Haindrich updated HIVE-22811:
------------------------------------
    Summary: Stat based min/max aggregates are not serviced from colstats in nested cases  (was: Statistics are not exploit in nested cases)

> Stat based min/max aggregates are not serviced from colstats in nested cases
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-22811
>                 URL: https://issues.apache.org/jira/browse/HIVE-22811
>             Project: Hive
>          Issue Type: Improvement
>          Components: Statistics
>            Reporter: Zoltan Haindrich
>            Priority: Major
>
> The statsOptimizer is able to use min/max/etc values to service simple queries
> {code}
> (select max(id) from t t0)
> {code}
> however the same doesn't happen for queries like:
> {code}
> explain select * from u where u.id>(select max(id) from t t0);
> {code}
> explain:
> {code}
> | Plan optimized by CBO.                             |
> |                                                    |
> | Vertex dependency in root stage                    |
> | Reducer 3 <- Map 1 (BROADCAST_EDGE), Map 2 (CUSTOM_SIMPLE_EDGE) |
> |                                                    |
> | Stage-0                                            |
> |   Fetch Operator                                   |
> |     limit:-1                                       |
> |     Stage-1                                        |
> |       Reducer 3 vectorized                         |
> |       File Output Operator [FS_31]                 |
> |         Select Operator [SEL_30] (rows=1 width=8)  |
> |           Output:["_col0","_col1"]                 |
> |           Filter Operator [FIL_29] (rows=1 width=12) |
> |             predicate:(_col0 > _col2)              |
> |             Map Join Operator [MAPJOIN_28] (rows=3 width=12) |
> |               Conds:(Inner),Output:["_col0","_col1","_col2"] |
> |             <-Map 1 [BROADCAST_EDGE] vectorized    |
> |               BROADCAST [RS_25]                    |
> |                 Select Operator [SEL_24] (rows=3 width=8) |
> |                   Output:["_col0","_col1"]         |
> |                   Filter Operator [FIL_23] (rows=3 width=8) |
> |                     predicate:id is not null       |
> |                     TableScan [TS_0] (rows=3 width=8) |
> |                       default@u,u,Tbl:COMPLETE,Col:COMPLETE,Output:["id","cnt"] |
> |             <-Filter Operator [FIL_27] (rows=1 width=4) |
> |                 predicate:_col0 is not null        |
> |                 Group By Operator [GBY_26] (rows=1 width=4) |
> |                   Output:["_col0"],aggregations:["max(VALUE._col0)"] |
> |                 <-Map 2 [CUSTOM_SIMPLE_EDGE] vectorized |
> |                   PARTITION_ONLY_SHUFFLE [RS_22]   |
> |                     Group By Operator [GBY_21] (rows=1 width=4) |
> |                       Output:["_col0"],aggregations:["max(id)"] |
> |                       Select Operator [SEL_20] (rows=4 width=4) |
> |                         Output:["id"]              |
> |                         TableScan [TS_3] (rows=4 width=4) |
> |                           default@t,t0,Tbl:COMPLETE,Col:COMPLETE,Output:["id"] |
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)