You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/08/01 00:10:00 UTC

[jira] [Work logged] (HIVE-25170) Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer

     [ https://issues.apache.org/jira/browse/HIVE-25170?focusedWorklogId=632014&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-632014 ]

ASF GitHub Bot logged work on HIVE-25170:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 01/Aug/21 00:09
            Start Date: 01/Aug/21 00:09
    Worklog Time Spent: 10m 
      Work Description: github-actions[bot] commented on pull request #2331:
URL: https://github.com/apache/hive/pull/2331#issuecomment-890420049


   This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
   Feel free to reach out on the dev@hive.apache.org list if the patch is in need of reviews.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: gitbox-unsubscribe@hive.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 632014)
    Time Spent: 20m  (was: 10m)

> Data error in constant propagation caused by wrong colExprMap generated in SemanticAnalyzer
> -------------------------------------------------------------------------------------------
>
>                 Key: HIVE-25170
>                 URL: https://issues.apache.org/jira/browse/HIVE-25170
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 3.1.2
>            Reporter: Wei Zhang
>            Assignee: Wei Zhang
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
>  
> {code:java}
> SET hive.remove.orderby.in.subquery=false;
> EXPLAIN
> SELECT constant_col, key, max(value)
> FROM
> (
>   SELECT 'constant' as constant_col, key, value
>   FROM src
>   DISTRIBUTE BY constant_col, key
>   SORT BY constant_col, key, value
> ) a
> GROUP BY constant_col, key
> LIMIT 10;
> OK
> Vertex dependency in root stage
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> Reducer 3 <- Reducer 2 (SIMPLE_EDGE)Stage-0
>   Fetch Operator
>     limit:10
>     Stage-1
>       Reducer 3
>       File Output Operator [FS_10]
>         Limit [LIM_9] (rows=1 width=368)
>           Number of rows:10
>           Select Operator [SEL_8] (rows=1 width=368)
>             Output:["_col0","_col1","_col2"]
>             Group By Operator [GBY_7] (rows=1 width=368)
>               Output:["_col0","_col1","_col2"],aggregations:["max(VALUE._col0)"],keys:'constant', 'constant'
>             <-Reducer 2 [SIMPLE_EDGE]
>               SHUFFLE [RS_6]
>                 PartitionCols:'constant', 'constant'
>                 Group By Operator [GBY_5] (rows=1 width=368)
>                   Output:["_col0","_col1","_col2"],aggregations:["max(_col2)"],keys:'constant', 'constant'
>                   Select Operator [SEL_3] (rows=500 width=178)
>                     Output:["_col2"]
>                   <-Map 1 [SIMPLE_EDGE]
>                     SHUFFLE [RS_2]
>                       PartitionCols:'constant', _col1
>                       Select Operator [SEL_1] (rows=500 width=178)
>                         Output:["_col1","_col2"]
>                         TableScan [TS_0] (rows=500 width=10)
>                           src,src,Tbl:COMPLETE,Col:COMPLETE,Output:["key","value"]{code}
> Obviously, the PartitionCols in Reducer 2 is wrong. Instead of 'constant', 'constant', it should be 'constant', _col1
>  
> That's because after HIVE-13808,  SemanticAnalyzer uses sortCols to generate the colExprMap structure in the key part, while the key columns are generated by newSortCols, leading to a column and expr mismatch when the constant column is not the trailing column in the key columns.
> Constant propagation optimizer uses this colExprMap and finds extra const expression in the mismatched map, resulting in this error.
>  
> In fact, colExprMap is used by multiple optimizers, which makes this quite a serious problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)