You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Wei Zhang (Jira)" <ji...@apache.org> on 2021/05/27 03:57:00 UTC

[jira] [Created] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

Wei Zhang created HIVE-25170:
--------------------------------

             Summary: Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer
                 Key: HIVE-25170
                 URL: https://issues.apache.org/jira/browse/HIVE-25170
             Project: Hive
          Issue Type: Bug
          Components: Query Planning
    Affects Versions: 3.1.2
            Reporter: Wei Zhang
            Assignee: Wei Zhang


 
{code:java}
// code placeholder

EXPLAIN
SELECT constant_col, key, max(value)
FROM
(
  SELECT 'constant' as constant_col, key, value
  FROM src
  DISTRIBUTE BY constant_col, key
  SORT BY constant_col, key, value
) a
GROUP BY constant_col, key
LIMIT 10;

OK
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
  Fetch Operator
    limit:10
    Stage-1
      Reducer 3
      File Output Operator [FS_10]
        Limit [LIM_9] (rows=1 width=368)
          Number of rows:10
          Select Operator [SEL_8] (rows=1 width=368)
            Output:["_col0","_col1","_col2"]
            Group By Operator [GBY_7] (rows=1 width=368)
              Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
            <-Reducer 2 [SIMPLE_EDGE]
              SHUFFLE [RS_6]
                PartitionCols:'constant', 'constant'
                Group By Operator [GBY_5] (rows=1 width=368)
                  Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
                  Select Operator [SEL_3] (rows=500 width=178)
                    Output:["_col2"]
                  <-Map 1 [SIMPLE_EDGE]
                    SHUFFLE [RS_2]
                      PartitionCols:'constant', _col1
                      Select Operator [SEL_1] (rows=500 width=178)
                        Output:["_col1","_col2"]
                        TableScan [TS_0] (rows=500 width=10)
                          src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
Obviously, the `PartitionCols` in Reducer 2 is wrong. Instead of `'constant', 'constant'`, it should be `'constant', _col1`

 

That's because after HIVE-13808,  `SemanticAnalyzer` uses `sortCols` to generate the `colExprMap` structure in the key part, while the key columns are generated by `newSortCols`, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)