You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Attila Magyar (Jira)" <ji...@apache.org> on 2020/04/22 11:11:00 UTC

[jira] [Work started] (HIVE-23265) Duplicate rowsets are returned with Limit and Offset ste

     [ https://issues.apache.org/jira/browse/HIVE-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on HIVE-23265 started by Attila Magyar.
--------------------------------------------
> Duplicate rowsets are returned with Limit and Offset ste
> --------------------------------------------------------
>
>                 Key: HIVE-23265
>                 URL: https://issues.apache.org/jira/browse/HIVE-23265
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, Vectorization
>    Affects Versions: 3.1.0, 3.1.2
>            Reporter: Chiran Ravani
>            Assignee: Attila Magyar
>            Priority: Critical
>         Attachments: 000000_0
>
>
> We have a query which produces duplicate results even when there is no duplicate records in underlying tables.
> Sample Query
> {code:java}
> select * from orderdatatest_ext order by col1 limit 1000,50
> {code}
> The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
> set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
> Data in table is as follows.
> {code:java}
> 1,1
> 1,2
> 1,3
> .
> .
> 1,1500
> {code}
> Results with hive.vectorized.execution.reduce.enabled=true
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1  | orderdatatest_ext.col2  |
> +-------------------------+-------------------------+
> | 1                       | 1001                    |
> | 1                       | 1002                    |
> | 1                       | 1003                    |
> | 1                       | 1004                    |
> | 1                       | 1005                    |
> | 1                       | 1006                    |
> | 1                       | 1007                    |
> | 1                       | 1008                    |
> | 1                       | 1009                    |
> | 1                       | 1010                    |
> | 1                       | 1011                    |
> | 1                       | 1012                    |
> | 1                       | 1013                    |
> | 1                       | 1014                    |
> | 1                       | 1015                    |
> | 1                       | 1016                    |
> | 1                       | 1017                    |
> | 1                       | 1018                    |
> | 1                       | 1019                    |
> | 1                       | 1020                    |
> | 1                       | 1021                    |
> | 1                       | 1022                    |
> | 1                       | 1023                    |
> | 1                       | 1024                    |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> +-------------------------+-------------------------+
> {code}
> Results with hive.vectorized.execution.reduce.enabled=false
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1  | orderdatatest_ext.col2  |
> +-------------------------+-------------------------+
> | 1                       | 1001                    |
> | 1                       | 1002                    |
> | 1                       | 1003                    |
> | 1                       | 1004                    |
> | 1                       | 1005                    |
> | 1                       | 1006                    |
> | 1                       | 1007                    |
> | 1                       | 1008                    |
> | 1                       | 1009                    |
> | 1                       | 1010                    |
> | 1                       | 1011                    |
> | 1                       | 1012                    |
> | 1                       | 1013                    |
> | 1                       | 1014                    |
> | 1                       | 1015                    |
> | 1                       | 1016                    |
> | 1                       | 1017                    |
> | 1                       | 1018                    |
> | 1                       | 1019                    |
> | 1                       | 1020                    |
> | 1                       | 1021                    |
> | 1                       | 1022                    |
> | 1                       | 1023                    |
> | 1                       | 1024                    |
> | 1                       | 1025                    |
> | 1                       | 1026                    |
> | 1                       | 1027                    |
> | 1                       | 1028                    |
> | 1                       | 1029                    |
> | 1                       | 1030                    |
> | 1                       | 1031                    |
> | 1                       | 1032                    |
> | 1                       | 1033                    |
> | 1                       | 1034                    |
> | 1                       | 1035                    |
> | 1                       | 1036                    |
> | 1                       | 1037                    |
> | 1                       | 1038                    |
> | 1                       | 1039                    |
> | 1                       | 1040                    |
> | 1                       | 1041                    |
> | 1                       | 1042                    |
> | 1                       | 1043                    |
> | 1                       | 1044                    |
> | 1                       | 1045                    |
> | 1                       | 1046                    |
> | 1                       | 1047                    |
> | 1                       | 1048                    |
> | 1                       | 1049                    |
> | 1                       | 1050                    |
> +-------------------------+-------------------------+
> {code}
> Table DDL
> {code:java}
> CREATE EXTERNAL TABLE  orderdatatest_ext (col1 int, col2 int) stored as orc
> {code}
> Attached sample ORC file.
> Problem appears to be with VectorLimitOperator.
> {code}
> 2020-04-20 15:35:49,693 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[6]
> 2020-04-20 15:35:49,747 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_INTERMEDIATE_Map_1:0, RECORDS_OUT_OPERATOR_SEL_6:1500,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[8]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_OPERATOR_SEL_8:1050, RECORDS_OUT_INTERMEDIATE_Reducer_2:0,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorLimitOperator|: Initializing operator LIM[9]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorLimitOperator|: RECORDS_OUT_INTERMEDIATE_Reducer_2:0, RECORDS_OUT_OPERATOR_LIM_9:1050,
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)