You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Attila Magyar (Jira)" <ji...@apache.org> on 2020/04/22 08:15:00 UTC
[jira] [Assigned] (HIVE-23265) Duplicate rowsets are returned with
Limit and Offset ste
[ https://issues.apache.org/jira/browse/HIVE-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Attila Magyar reassigned HIVE-23265:
------------------------------------
Assignee: Attila Magyar
> Duplicate rowsets are returned with Limit and Offset ste
> --------------------------------------------------------
>
> Key: HIVE-23265
> URL: https://issues.apache.org/jira/browse/HIVE-23265
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2, Vectorization
> Affects Versions: 3.1.0, 3.1.2
> Reporter: Chiran Ravani
> Assignee: Attila Magyar
> Priority: Critical
> Attachments: 000000_0
>
>
> We have a query which produces duplicate results even when there is no duplicate records in underlying tables.
> Sample Query
> {code:java}
> select * from orderdatatest_ext order by col1 limit 1000,50
> {code}
> The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
> set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
> Data in table is as follows.
> {code:java}
> 1,1
> 1,2
> 1,3
> .
> .
> 1,1500
> {code}
> Results with hive.vectorized.execution.reduce.enabled=true
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1 | orderdatatest_ext.col2 |
> +-------------------------+-------------------------+
> | 1 | 1001 |
> | 1 | 1002 |
> | 1 | 1003 |
> | 1 | 1004 |
> | 1 | 1005 |
> | 1 | 1006 |
> | 1 | 1007 |
> | 1 | 1008 |
> | 1 | 1009 |
> | 1 | 1010 |
> | 1 | 1011 |
> | 1 | 1012 |
> | 1 | 1013 |
> | 1 | 1014 |
> | 1 | 1015 |
> | 1 | 1016 |
> | 1 | 1017 |
> | 1 | 1018 |
> | 1 | 1019 |
> | 1 | 1020 |
> | 1 | 1021 |
> | 1 | 1022 |
> | 1 | 1023 |
> | 1 | 1024 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> +-------------------------+-------------------------+
> {code}
> Results with hive.vectorized.execution.reduce.enabled=false
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1 | orderdatatest_ext.col2 |
> +-------------------------+-------------------------+
> | 1 | 1001 |
> | 1 | 1002 |
> | 1 | 1003 |
> | 1 | 1004 |
> | 1 | 1005 |
> | 1 | 1006 |
> | 1 | 1007 |
> | 1 | 1008 |
> | 1 | 1009 |
> | 1 | 1010 |
> | 1 | 1011 |
> | 1 | 1012 |
> | 1 | 1013 |
> | 1 | 1014 |
> | 1 | 1015 |
> | 1 | 1016 |
> | 1 | 1017 |
> | 1 | 1018 |
> | 1 | 1019 |
> | 1 | 1020 |
> | 1 | 1021 |
> | 1 | 1022 |
> | 1 | 1023 |
> | 1 | 1024 |
> | 1 | 1025 |
> | 1 | 1026 |
> | 1 | 1027 |
> | 1 | 1028 |
> | 1 | 1029 |
> | 1 | 1030 |
> | 1 | 1031 |
> | 1 | 1032 |
> | 1 | 1033 |
> | 1 | 1034 |
> | 1 | 1035 |
> | 1 | 1036 |
> | 1 | 1037 |
> | 1 | 1038 |
> | 1 | 1039 |
> | 1 | 1040 |
> | 1 | 1041 |
> | 1 | 1042 |
> | 1 | 1043 |
> | 1 | 1044 |
> | 1 | 1045 |
> | 1 | 1046 |
> | 1 | 1047 |
> | 1 | 1048 |
> | 1 | 1049 |
> | 1 | 1050 |
> +-------------------------+-------------------------+
> {code}
> Table DDL
> {code:java}
> CREATE EXTERNAL TABLE orderdatatest_ext (col1 int, col2 int) stored as orc
> {code}
> Attached sample ORC file.
> Problem appears to be with VectorLimitOperator.
> {code}
> 2020-04-20 15:35:49,693 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[6]
> 2020-04-20 15:35:49,747 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_INTERMEDIATE_Map_1:0, RECORDS_OUT_OPERATOR_SEL_6:1500,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[8]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_OPERATOR_SEL_8:1050, RECORDS_OUT_INTERMEDIATE_Reducer_2:0,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorLimitOperator|: Initializing operator LIM[9]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorLimitOperator|: RECORDS_OUT_INTERMEDIATE_Reducer_2:0, RECORDS_OUT_OPERATOR_LIM_9:1050,
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)