You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Chiran Ravani (Jira)" <ji...@apache.org> on 2020/04/21 16:53:00 UTC

[jira] [Updated] (HIVE-23265) Duplicate rowsets are returned with Limit and Offset ste

     [ https://issues.apache.org/jira/browse/HIVE-23265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chiran Ravani updated HIVE-23265:
---------------------------------
    Description: 
We have a query which produces duplicate results even when there is no duplicate records in underlying tables.

Sample Query
{code:java}
select * from orderdatatest_ext order by col1 limit 1000,50
{code}

The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
set hive.vectorized.execution.reduce.enabled=false does not cause the problem.

Data in table is as follows.
{code:java}
1,1
1,2
1,3
.
.
1,1500
{code}

Results with hive.vectorized.execution.reduce.enabled=true

{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1  | orderdatatest_ext.col2  |
+-------------------------+-------------------------+
| 1                       | 1001                    |
| 1                       | 1002                    |
| 1                       | 1003                    |
| 1                       | 1004                    |
| 1                       | 1005                    |
| 1                       | 1006                    |
| 1                       | 1007                    |
| 1                       | 1008                    |
| 1                       | 1009                    |
| 1                       | 1010                    |
| 1                       | 1011                    |
| 1                       | 1012                    |
| 1                       | 1013                    |
| 1                       | 1014                    |
| 1                       | 1015                    |
| 1                       | 1016                    |
| 1                       | 1017                    |
| 1                       | 1018                    |
| 1                       | 1019                    |
| 1                       | 1020                    |
| 1                       | 1021                    |
| 1                       | 1022                    |
| 1                       | 1023                    |
| 1                       | 1024                    |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
+-------------------------+-------------------------+
{code}

Results with hive.vectorized.execution.reduce.enabled=false

{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1  | orderdatatest_ext.col2  |
+-------------------------+-------------------------+
| 1                       | 1001                    |
| 1                       | 1002                    |
| 1                       | 1003                    |
| 1                       | 1004                    |
| 1                       | 1005                    |
| 1                       | 1006                    |
| 1                       | 1007                    |
| 1                       | 1008                    |
| 1                       | 1009                    |
| 1                       | 1010                    |
| 1                       | 1011                    |
| 1                       | 1012                    |
| 1                       | 1013                    |
| 1                       | 1014                    |
| 1                       | 1015                    |
| 1                       | 1016                    |
| 1                       | 1017                    |
| 1                       | 1018                    |
| 1                       | 1019                    |
| 1                       | 1020                    |
| 1                       | 1021                    |
| 1                       | 1022                    |
| 1                       | 1023                    |
| 1                       | 1024                    |
| 1                       | 1025                    |
| 1                       | 1026                    |
| 1                       | 1027                    |
| 1                       | 1028                    |
| 1                       | 1029                    |
| 1                       | 1030                    |
| 1                       | 1031                    |
| 1                       | 1032                    |
| 1                       | 1033                    |
| 1                       | 1034                    |
| 1                       | 1035                    |
| 1                       | 1036                    |
| 1                       | 1037                    |
| 1                       | 1038                    |
| 1                       | 1039                    |
| 1                       | 1040                    |
| 1                       | 1041                    |
| 1                       | 1042                    |
| 1                       | 1043                    |
| 1                       | 1044                    |
| 1                       | 1045                    |
| 1                       | 1046                    |
| 1                       | 1047                    |
| 1                       | 1048                    |
| 1                       | 1049                    |
| 1                       | 1050                    |
+-------------------------+-------------------------+
{code}

Table DDL
{code:java}
CREATE EXTERNAL TABLE  orderdatatest_ext (col1 int, col2 int) stored as orc
{code}

Attached sample ORC file.



Problem appears to be with VectorLimitOperator.
{code}
2020-04-20 15:35:49,693 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[6]
2020-04-20 15:35:49,747 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_INTERMEDIATE_Map_1:0, RECORDS_OUT_OPERATOR_SEL_6:1500,
2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[8]
2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_OPERATOR_SEL_8:1050, RECORDS_OUT_INTERMEDIATE_Reducer_2:0,


2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorLimitOperator|: Initializing operator LIM[9]
2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorLimitOperator|: RECORDS_OUT_INTERMEDIATE_Reducer_2:0, RECORDS_OUT_OPERATOR_LIM_9:1050,
{code}

  was:
We have a query which produces duplicate results even when there is no duplicate records in underlying tables.

Sample Query
{code:java}
select * from orderdatatest_ext order by col1 limit 1000,50
{code}

The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
set hive.vectorized.execution.reduce.enabled=false does not cause the problem.

Data in table is as follows.
{code:java}
1,1
1,2
1,3
.
.
1,1500
{code}

Results with hive.vectorized.execution.reduce.enabled=true

{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1  | orderdatatest_ext.col2  |
+-------------------------+-------------------------+
| 1                       | 1001                    |
| 1                       | 1002                    |
| 1                       | 1003                    |
| 1                       | 1004                    |
| 1                       | 1005                    |
| 1                       | 1006                    |
| 1                       | 1007                    |
| 1                       | 1008                    |
| 1                       | 1009                    |
| 1                       | 1010                    |
| 1                       | 1011                    |
| 1                       | 1012                    |
| 1                       | 1013                    |
| 1                       | 1014                    |
| 1                       | 1015                    |
| 1                       | 1016                    |
| 1                       | 1017                    |
| 1                       | 1018                    |
| 1                       | 1019                    |
| 1                       | 1020                    |
| 1                       | 1021                    |
| 1                       | 1022                    |
| 1                       | 1023                    |
| 1                       | 1024                    |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
| 1                       | 1                       |
+-------------------------+-------------------------+
{code}

Results with hive.vectorized.execution.reduce.enabled=false

{code:java}
+-------------------------+-------------------------+
| orderdatatest_ext.col1  | orderdatatest_ext.col2  |
+-------------------------+-------------------------+
| 1                       | 1001                    |
| 1                       | 1002                    |
| 1                       | 1003                    |
| 1                       | 1004                    |
| 1                       | 1005                    |
| 1                       | 1006                    |
| 1                       | 1007                    |
| 1                       | 1008                    |
| 1                       | 1009                    |
| 1                       | 1010                    |
| 1                       | 1011                    |
| 1                       | 1012                    |
| 1                       | 1013                    |
| 1                       | 1014                    |
| 1                       | 1015                    |
| 1                       | 1016                    |
| 1                       | 1017                    |
| 1                       | 1018                    |
| 1                       | 1019                    |
| 1                       | 1020                    |
| 1                       | 1021                    |
| 1                       | 1022                    |
| 1                       | 1023                    |
| 1                       | 1024                    |
| 1                       | 1025                    |
| 1                       | 1026                    |
| 1                       | 1027                    |
| 1                       | 1028                    |
| 1                       | 1029                    |
| 1                       | 1030                    |
| 1                       | 1031                    |
| 1                       | 1032                    |
| 1                       | 1033                    |
| 1                       | 1034                    |
| 1                       | 1035                    |
| 1                       | 1036                    |
| 1                       | 1037                    |
| 1                       | 1038                    |
| 1                       | 1039                    |
| 1                       | 1040                    |
| 1                       | 1041                    |
| 1                       | 1042                    |
| 1                       | 1043                    |
| 1                       | 1044                    |
| 1                       | 1045                    |
| 1                       | 1046                    |
| 1                       | 1047                    |
| 1                       | 1048                    |
| 1                       | 1049                    |
| 1                       | 1050                    |
+-------------------------+-------------------------+
{code}

Table DDL
{code:java}
CREATE EXTERNAL TABLE  orderdatatest_ext (col1 int, col2 int) stored as orc
{code}

Attached sample ORC file.


> Duplicate rowsets are returned with Limit and Offset ste
> --------------------------------------------------------
>
>                 Key: HIVE-23265
>                 URL: https://issues.apache.org/jira/browse/HIVE-23265
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2, Vectorization
>    Affects Versions: 3.1.0, 3.1.2
>            Reporter: Chiran Ravani
>            Priority: Critical
>         Attachments: 000000_0
>
>
> We have a query which produces duplicate results even when there is no duplicate records in underlying tables.
> Sample Query
> {code:java}
> select * from orderdatatest_ext order by col1 limit 1000,50
> {code}
> The problem appears when order by clause is used with col1 having non-unique rows. Apparently the duplicates are being produced during reducer phase of the query.
> set hive.vectorized.execution.reduce.enabled=false does not cause the problem.
> Data in table is as follows.
> {code:java}
> 1,1
> 1,2
> 1,3
> .
> .
> 1,1500
> {code}
> Results with hive.vectorized.execution.reduce.enabled=true
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1  | orderdatatest_ext.col2  |
> +-------------------------+-------------------------+
> | 1                       | 1001                    |
> | 1                       | 1002                    |
> | 1                       | 1003                    |
> | 1                       | 1004                    |
> | 1                       | 1005                    |
> | 1                       | 1006                    |
> | 1                       | 1007                    |
> | 1                       | 1008                    |
> | 1                       | 1009                    |
> | 1                       | 1010                    |
> | 1                       | 1011                    |
> | 1                       | 1012                    |
> | 1                       | 1013                    |
> | 1                       | 1014                    |
> | 1                       | 1015                    |
> | 1                       | 1016                    |
> | 1                       | 1017                    |
> | 1                       | 1018                    |
> | 1                       | 1019                    |
> | 1                       | 1020                    |
> | 1                       | 1021                    |
> | 1                       | 1022                    |
> | 1                       | 1023                    |
> | 1                       | 1024                    |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> | 1                       | 1                       |
> +-------------------------+-------------------------+
> {code}
> Results with hive.vectorized.execution.reduce.enabled=false
> {code:java}
> +-------------------------+-------------------------+
> | orderdatatest_ext.col1  | orderdatatest_ext.col2  |
> +-------------------------+-------------------------+
> | 1                       | 1001                    |
> | 1                       | 1002                    |
> | 1                       | 1003                    |
> | 1                       | 1004                    |
> | 1                       | 1005                    |
> | 1                       | 1006                    |
> | 1                       | 1007                    |
> | 1                       | 1008                    |
> | 1                       | 1009                    |
> | 1                       | 1010                    |
> | 1                       | 1011                    |
> | 1                       | 1012                    |
> | 1                       | 1013                    |
> | 1                       | 1014                    |
> | 1                       | 1015                    |
> | 1                       | 1016                    |
> | 1                       | 1017                    |
> | 1                       | 1018                    |
> | 1                       | 1019                    |
> | 1                       | 1020                    |
> | 1                       | 1021                    |
> | 1                       | 1022                    |
> | 1                       | 1023                    |
> | 1                       | 1024                    |
> | 1                       | 1025                    |
> | 1                       | 1026                    |
> | 1                       | 1027                    |
> | 1                       | 1028                    |
> | 1                       | 1029                    |
> | 1                       | 1030                    |
> | 1                       | 1031                    |
> | 1                       | 1032                    |
> | 1                       | 1033                    |
> | 1                       | 1034                    |
> | 1                       | 1035                    |
> | 1                       | 1036                    |
> | 1                       | 1037                    |
> | 1                       | 1038                    |
> | 1                       | 1039                    |
> | 1                       | 1040                    |
> | 1                       | 1041                    |
> | 1                       | 1042                    |
> | 1                       | 1043                    |
> | 1                       | 1044                    |
> | 1                       | 1045                    |
> | 1                       | 1046                    |
> | 1                       | 1047                    |
> | 1                       | 1048                    |
> | 1                       | 1049                    |
> | 1                       | 1050                    |
> +-------------------------+-------------------------+
> {code}
> Table DDL
> {code:java}
> CREATE EXTERNAL TABLE  orderdatatest_ext (col1 int, col2 int) stored as orc
> {code}
> Attached sample ORC file.
> Problem appears to be with VectorLimitOperator.
> {code}
> 2020-04-20 15:35:49,693 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[6]
> 2020-04-20 15:35:49,747 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_INTERMEDIATE_Map_1:0, RECORDS_OUT_OPERATOR_SEL_6:1500,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorSelectOperator|: Initializing operator SEL[8]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorSelectOperator|: RECORDS_OUT_OPERATOR_SEL_8:1050, RECORDS_OUT_INTERMEDIATE_Reducer_2:0,
> 2020-04-20 15:35:50,142 [INFO] [TezChild] |vector.VectorLimitOperator|: Initializing operator LIM[9]
> 2020-04-20 15:35:50,303 [INFO] [TezChild] |vector.VectorLimitOperator|: RECORDS_OUT_INTERMEDIATE_Reducer_2:0, RECORDS_OUT_OPERATOR_LIM_9:1050,
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)