You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Wei Zhang (Jira)" <ji...@apache.org> on 2021/05/27 04:04:00 UTC

[jira] [Updated] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

     [ https://issues.apache.org/jira/browse/HIVE-25170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wei Zhang updated HIVE-25170:
-----------------------------
    Description: 
 
{code:java}
// code placeholder

EXPLAIN
SELECT constant_col, key, max(value)
FROM
(
  SELECT 'constant' as constant_col, key, value
  FROM src
  DISTRIBUTE BY constant_col, key
  SORT BY constant_col, key, value
) a
GROUP BY constant_col, key
LIMIT 10;

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
    limit:10
    Stage-1
      Reducer 3
      File Output Operator [FS_10]
        Limit [LIM_9] (rows=1 width=368)
          Number of rows:10
          Select Operator [SEL_8] (rows=1 width=368)
            Output:["_col0","_col1","_col2"]
            Group By Operator [GBY_7] (rows=1 width=368)
              Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
            <-Reducer 2 [SIMPLE_EDGE]
              SHUFFLE [RS_6]
                PartitionCols:'constant', 'constant'
                Group By Operator [GBY_5] (rows=1 width=368)
                  Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
                  Select Operator [SEL_3] (rows=500 width=178)
                    Output:["_col2"]
                  <-Map 1 [SIMPLE_EDGE]
                    SHUFFLE [RS_2]
                      PartitionCols:'constant', _col1
                      Select Operator [SEL_1] (rows=500 width=178)
                        Output:["_col1","_col2"]
                        TableScan [TS_0] (rows=500 width=10)
                          src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 'constant', it should be 'constant', _col1

 

That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate the colExprMap structure in the key part, while the key columns are generated by newSortCols, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.

Constant propagation optimizer uses this colExprMap and finds extra const expression in the mismatched map, resulting in this error.

 

In fact, colExprMap is used by multiple optimizers, which makes this quite a serious problem.

  was:
 
{code:java}
// code placeholder

EXPLAIN
SELECT constant_col, key, max(value)
FROM
(
  SELECT 'constant' as constant_col, key, value
  FROM src
  DISTRIBUTE BY constant_col, key
  SORT BY constant_col, key, value
) a
GROUP BY constant_col, key
LIMIT 10;

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
    limit:10
    Stage-1
      Reducer 3
      File Output Operator [FS_10]
        Limit [LIM_9] (rows=1 width=368)
          Number of rows:10
          Select Operator [SEL_8] (rows=1 width=368)
            Output:["_col0","_col1","_col2"]
            Group By Operator [GBY_7] (rows=1 width=368)
              Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
            <-Reducer 2 [SIMPLE_EDGE]
              SHUFFLE [RS_6]
                PartitionCols:'constant', 'constant'
                Group By Operator [GBY_5] (rows=1 width=368)
                  Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
                  Select Operator [SEL_3] (rows=500 width=178)
                    Output:["_col2"]
                  <-Map 1 [SIMPLE_EDGE]
                    SHUFFLE [RS_2]
                      PartitionCols:'constant', _col1
                      Select Operator [SEL_1] (rows=500 width=178)
                        Output:["_col1","_col2"]
                        TableScan [TS_0] (rows=500 width=10)
                          src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
Obviously, the `PartitionCols` in Reducer 2 is wrong. Instead of `'constant', 'constant'`, it should be `'constant', _col1`

 

That's because after HIVE-13808,  `SemanticAnalyzer` uses `sortCols` to generate the `colExprMap` structure in the key part, while the key columns are generated by `newSortCols`, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.

 


> Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25170
>                 URL: https://issues.apache.org/jira/browse/HIVE-25170
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 3.1.2
>            Reporter: Wei Zhang
>            Assignee: Wei Zhang
>            Priority: Major
>
>  
> {code:java}
> // code placeholder
> EXPLAIN
> SELECT constant_col, key, max(value)
> FROM
> (
>   SELECT 'constant' as constant_col, key, value
>   FROM src
>   DISTRIBUTE BY constant_col, key
>   SORT BY constant_col, key, value
> ) a
> GROUP BY constant_col, key
> LIMIT 10;
> OK
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
>   Fetch Operator
>     limit:10
>     Stage-1
>       Reducer 3
>       File Output Operator [FS_10]
>         Limit [LIM_9] (rows=1 width=368)
>           Number of rows:10
>           Select Operator [SEL_8] (rows=1 width=368)
>             Output:["_col0","_col1","_col2"]
>             Group By Operator [GBY_7] (rows=1 width=368)
>               Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
>             <-Reducer 2 [SIMPLE_EDGE]
>               SHUFFLE [RS_6]
>                 PartitionCols:'constant', 'constant'
>                 Group By Operator [GBY_5] (rows=1 width=368)
>                   Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
>                   Select Operator [SEL_3] (rows=500 width=178)
>                     Output:["_col2"]
>                   <-Map 1 [SIMPLE_EDGE]
>                     SHUFFLE [RS_2]
>                       PartitionCols:'constant', _col1
>                       Select Operator [SEL_1] (rows=500 width=178)
>                         Output:["_col1","_col2"]
>                         TableScan [TS_0] (rows=500 width=10)
>                           src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
> Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 'constant', it should be 'constant', _col1
>  
> That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate the colExprMap structure in the key part, while the key columns are generated by newSortCols, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.
> Constant propagation optimizer uses this colExprMap and finds extra const expression in the mismatched map, resulting in this error.
>  
> In fact, colExprMap is used by multiple optimizers, which makes this quite a serious problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)