You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Wei Zhang (Jira)" <ji...@apache.org> on 2021/06/01 06:17:00 UTC

[jira] [Updated] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

     [ https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wei Zhang updated HIVE-25170:
-----------------------------
    Status: Patch Available  (was: Open)

> Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25170
>                 URL: https://issues.apache.org/jira/browse/HIVE-25170
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 3.1.2
>            Reporter: Wei Zhang
>            Assignee: Wei Zhang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> SET hive.remove.orderby.in.subquery=false;
> EXPLAIN
> SELECT constant_col, key, max(value)
> FROM
> (
>   SELECT 'constant' as constant_col, key, value
>   FROM src
>   DISTRIBUTE BY constant_col, key
>   SORT BY constant_col, key, value
> ) a
> GROUP BY constant_col, key
> LIMIT 10;
> OK
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
>   Fetch Operator
>     limit:10
>     Stage-1
>       Reducer 3
>       File Output Operator [FS_10]
>         Limit [LIM_9] (rows=1 width=368)
>           Number of rows:10
>           Select Operator [SEL_8] (rows=1 width=368)
>             Output:["_col0","_col1","_col2"]
>             Group By Operator [GBY_7] (rows=1 width=368)
>               Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
>             <-Reducer 2 [SIMPLE_EDGE]
>               SHUFFLE [RS_6]
>                 PartitionCols:'constant', 'constant'
>                 Group By Operator [GBY_5] (rows=1 width=368)
>                   Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
>                   Select Operator [SEL_3] (rows=500 width=178)
>                     Output:["_col2"]
>                   <-Map 1 [SIMPLE_EDGE]
>                     SHUFFLE [RS_2]
>                       PartitionCols:'constant', _col1
>                       Select Operator [SEL_1] (rows=500 width=178)
>                         Output:["_col1","_col2"]
>                         TableScan [TS_0] (rows=500 width=10)
>                           src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
> Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 'constant', it should be 'constant', _col1
>  
> That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate the colExprMap structure in the key part, while the key columns are generated by newSortCols, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.
> Constant propagation optimizer uses this colExprMap and finds extra const expression in the mismatched map, resulting in this error.
>  
> In fact, colExprMap is used by multiple optimizers, which makes this quite a serious problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)