You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chiran Ravani (Jira)" <ji...@apache.org> on 2020/04/21 16:53:00 UTC
[jira] [Updated] (HIVE-23265) Duplicate rowsets are returned with
Limit and Offset ste
[ https://issues.apache.org/jira/browse/HIVE-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chiran Ravani updated HIVE-23265:
---------------------------------
Description:
We have a query which produces duplicate results even when there is no duplicate records in underlying tables.
Sample Query
{code:java}
select * from orderdatatest_ext order by col1 limit 1000,50
{code}
The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
Data in table is as follows.
{code:java}
1,1
1,2
1,3
.
.
1,1500
{code}
Results with hive.vectorized.execution.reduce.enabled=true
{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1 | orderdatatest_ext.col2 |
+-------------------------+-------------------------+
| 1 | 1001 |
| 1 | 1002 |
| 1 | 1003 |
| 1 | 1004 |
| 1 | 1005 |
| 1 | 1006 |
| 1 | 1007 |
| 1 | 1008 |
| 1 | 1009 |
| 1 | 1010 |
| 1 | 1011 |
| 1 | 1012 |
| 1 | 1013 |
| 1 | 1014 |
| 1 | 1015 |
| 1 | 1016 |
| 1 | 1017 |
| 1 | 1018 |
| 1 | 1019 |
| 1 | 1020 |
| 1 | 1021 |
| 1 | 1022 |
| 1 | 1023 |
| 1 | 1024 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
+-------------------------+-------------------------+
{code}
Results with hive.vectorized.execution.reduce.enabled=false
{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1 | orderdatatest_ext.col2 |
+-------------------------+-------------------------+
| 1 | 1001 |
| 1 | 1002 |
| 1 | 1003 |
| 1 | 1004 |
| 1 | 1005 |
| 1 | 1006 |
| 1 | 1007 |
| 1 | 1008 |
| 1 | 1009 |
| 1 | 1010 |
| 1 | 1011 |
| 1 | 1012 |
| 1 | 1013 |
| 1 | 1014 |
| 1 | 1015 |
| 1 | 1016 |
| 1 | 1017 |
| 1 | 1018 |
| 1 | 1019 |
| 1 | 1020 |
| 1 | 1021 |
| 1 | 1022 |
| 1 | 1023 |
| 1 | 1024 |
| 1 | 1025 |
| 1 | 1026 |
| 1 | 1027 |
| 1 | 1028 |
| 1 | 1029 |
| 1 | 1030 |
| 1 | 1031 |
| 1 | 1032 |
| 1 | 1033 |
| 1 | 1034 |
| 1 | 1035 |
| 1 | 1036 |
| 1 | 1037 |
| 1 | 1038 |
| 1 | 1039 |
| 1 | 1040 |
| 1 | 1041 |
| 1 | 1042 |
| 1 | 1043 |
| 1 | 1044 |
| 1 | 1045 |
| 1 | 1046 |
| 1 | 1047 |
| 1 | 1048 |
| 1 | 1049 |
| 1 | 1050 |
+-------------------------+-------------------------+
{code}
Table DDL
{code:java}
CREATE EXTERNAL TABLE orderdatatest_ext (col1 int, col2 int) stored as orc
{code}
Attached sample ORC file.
Problem appears to be with VectorLimitOperator.
{code}
2020-04-20 15:35:49,693 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[6]
2020-04-20 15:35:49,747 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_INTERMEDIATE_Map_1:0, RECORDS_OUT_OPERATOR_SEL_6:1500,
2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[8]
2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_OPERATOR_SEL_8:1050, RECORDS_OUT_INTERMEDIATE_Reducer_2:0,
2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorLimitOperator|: Initializing operator LIM[9]
2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorLimitOperator|: RECORDS_OUT_INTERMEDIATE_Reducer_2:0, RECORDS_OUT_OPERATOR_LIM_9:1050,
{code}
was:
We have a query which produces duplicate results even when there is no duplicate records in underlying tables.
Sample Query
{code:java}
select * from orderdatatest_ext order by col1 limit 1000,50
{code}
The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
Data in table is as follows.
{code:java}
1,1
1,2
1,3
.
.
1,1500
{code}
Results with hive.vectorized.execution.reduce.enabled=true
{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1 | orderdatatest_ext.col2 |
+-------------------------+-------------------------+
| 1 | 1001 |
| 1 | 1002 |
| 1 | 1003 |
| 1 | 1004 |
| 1 | 1005 |
| 1 | 1006 |
| 1 | 1007 |
| 1 | 1008 |
| 1 | 1009 |
| 1 | 1010 |
| 1 | 1011 |
| 1 | 1012 |
| 1 | 1013 |
| 1 | 1014 |
| 1 | 1015 |
| 1 | 1016 |
| 1 | 1017 |
| 1 | 1018 |
| 1 | 1019 |
| 1 | 1020 |
| 1 | 1021 |
| 1 | 1022 |
| 1 | 1023 |
| 1 | 1024 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
+-------------------------+-------------------------+
{code}
Results with hive.vectorized.execution.reduce.enabled=false
{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1 | orderdatatest_ext.col2 |
+-------------------------+-------------------------+
| 1 | 1001 |
| 1 | 1002 |
| 1 | 1003 |
| 1 | 1004 |
| 1 | 1005 |
| 1 | 1006 |
| 1 | 1007 |
| 1 | 1008 |
| 1 | 1009 |
| 1 | 1010 |
| 1 | 1011 |
| 1 | 1012 |
| 1 | 1013 |
| 1 | 1014 |
| 1 | 1015 |
| 1 | 1016 |
| 1 | 1017 |
| 1 | 1018 |
| 1 | 1019 |
| 1 | 1020 |
| 1 | 1021 |
| 1 | 1022 |
| 1 | 1023 |
| 1 | 1024 |
| 1 | 1025 |
| 1 | 1026 |
| 1 | 1027 |
| 1 | 1028 |
| 1 | 1029 |
| 1 | 1030 |
| 1 | 1031 |
| 1 | 1032 |
| 1 | 1033 |
| 1 | 1034 |
| 1 | 1035 |
| 1 | 1036 |
| 1 | 1037 |
| 1 | 1038 |
| 1 | 1039 |
| 1 | 1040 |
| 1 | 1041 |
| 1 | 1042 |
| 1 | 1043 |
| 1 | 1044 |
| 1 | 1045 |
| 1 | 1046 |
| 1 | 1047 |
| 1 | 1048 |
| 1 | 1049 |
| 1 | 1050 |
+-------------------------+-------------------------+
{code}
Table DDL
{code:java}
CREATE EXTERNAL TABLE orderdatatest_ext (col1 int, col2 int) stored as orc
{code}
Attached sample ORC file.
> Duplicate rowsets are returned with Limit and Offset ste
> --------------------------------------------------------
>
> Key: HIVE-23265
> URL: https://issues.apache.org/jira/browse/HIVE-23265
> Project: Hive
> Issue Type: Bug
> Components: HiveServer2, Vectorization
> Affects Versions: 3.1.0, 3.1.2
> Reporter: Chiran Ravani
> Priority: Critical
> Attachments: 000000_0
>
>
> We have a query which produces duplicate results even when there is no duplicate records in underlying tables.
> Sample Query
> {code:java}
> select * from orderdatatest_ext order by col1 limit 1000,50
> {code}
> The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
> set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
> Data in table is as follows.
> {code:java}
> 1,1
> 1,2
> 1,3
> .
> .
> 1,1500
> {code}
> Results with hive.vectorized.execution.reduce.enabled=true
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1 | orderdatatest_ext.col2 |
> +-------------------------+-------------------------+
> | 1 | 1001 |
> | 1 | 1002 |
> | 1 | 1003 |
> | 1 | 1004 |
> | 1 | 1005 |
> | 1 | 1006 |
> | 1 | 1007 |
> | 1 | 1008 |
> | 1 | 1009 |
> | 1 | 1010 |
> | 1 | 1011 |
> | 1 | 1012 |
> | 1 | 1013 |
> | 1 | 1014 |
> | 1 | 1015 |
> | 1 | 1016 |
> | 1 | 1017 |
> | 1 | 1018 |
> | 1 | 1019 |
> | 1 | 1020 |
> | 1 | 1021 |
> | 1 | 1022 |
> | 1 | 1023 |
> | 1 | 1024 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> | 1 | 1 |
> +-------------------------+-------------------------+
> {code}
> Results with hive.vectorized.execution.reduce.enabled=false
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1 | orderdatatest_ext.col2 |
> +-------------------------+-------------------------+
> | 1 | 1001 |
> | 1 | 1002 |
> | 1 | 1003 |
> | 1 | 1004 |
> | 1 | 1005 |
> | 1 | 1006 |
> | 1 | 1007 |
> | 1 | 1008 |
> | 1 | 1009 |
> | 1 | 1010 |
> | 1 | 1011 |
> | 1 | 1012 |
> | 1 | 1013 |
> | 1 | 1014 |
> | 1 | 1015 |
> | 1 | 1016 |
> | 1 | 1017 |
> | 1 | 1018 |
> | 1 | 1019 |
> | 1 | 1020 |
> | 1 | 1021 |
> | 1 | 1022 |
> | 1 | 1023 |
> | 1 | 1024 |
> | 1 | 1025 |
> | 1 | 1026 |
> | 1 | 1027 |
> | 1 | 1028 |
> | 1 | 1029 |
> | 1 | 1030 |
> | 1 | 1031 |
> | 1 | 1032 |
> | 1 | 1033 |
> | 1 | 1034 |
> | 1 | 1035 |
> | 1 | 1036 |
> | 1 | 1037 |
> | 1 | 1038 |
> | 1 | 1039 |
> | 1 | 1040 |
> | 1 | 1041 |
> | 1 | 1042 |
> | 1 | 1043 |
> | 1 | 1044 |
> | 1 | 1045 |
> | 1 | 1046 |
> | 1 | 1047 |
> | 1 | 1048 |
> | 1 | 1049 |
> | 1 | 1050 |
> +-------------------------+-------------------------+
> {code}
> Table DDL
> {code:java}
> CREATE EXTERNAL TABLE orderdatatest_ext (col1 int, col2 int) stored as orc
> {code}
> Attached sample ORC file.
> Problem appears to be with VectorLimitOperator.
> {code}
> 2020-04-20 15:35:49,693 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[6]
> 2020-04-20 15:35:49,747 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_INTERMEDIATE_Map_1:0, RECORDS_OUT_OPERATOR_SEL_6:1500,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[8]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_OPERATOR_SEL_8:1050, RECORDS_OUT_INTERMEDIATE_Reducer_2:0,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorLimitOperator|: Initializing operator LIM[9]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorLimitOperator|: RECORDS_OUT_INTERMEDIATE_Reducer_2:0, RECORDS_OUT_OPERATOR_LIM_9:1050,
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)