You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "gaozhan ding (Jira)" <ji...@apache.org> on 2021/05/21 10:30:00 UTC

[jira] [Created] (HIVE-25147) Limit offset query in CTAS will cause data loss

gaozhan ding created HIVE-25147:
-----------------------------------

             Summary: Limit offset query in CTAS will cause data loss 
                 Key: HIVE-25147
                 URL: https://issues.apache.org/jira/browse/HIVE-25147
             Project: Hive
          Issue Type: Bug
          Components: Parser
    Affects Versions: 3.1.0
            Reporter: gaozhan ding


query like:

 
{code:java}
create table ... as select ... from ... limit  offset 
or:
insert overwrite table ... select ... from ... limit offset 
{code}
will cause data loss.

 

reproduce step:
{code:java}
create table test_limit_offset (id int);
insert into test_limit_offset values(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16);
drop table if exists test_limit_offset2;
create table test_limit_offset2 as select * from test_limit_offset limit 5 offset 2;
{code}
query test_limit_offset2
{code:java}
+------------------------+
| test_limit_offset2.id  |
+------------------------+
| 5                      |
| 6                      |
| 7                      |
+------------------------+

{code}
expected 5 numbers while got 3;

We can see the problem from the execution plan
{code:java}
+----------------------------------------------------+
|                      Explain                       |
+----------------------------------------------------+
| Plan optimized by CBO.                             |
|                                                    |
| Vertex dependency in root stage                    |
| Reducer 2 <- Map 1 (CUSTOM_SIMPLE_EDGE)            |
|                                                    |
| Stage-3                                            |
|   Stats Work{}                                     |
|     Stage-9                                        |
|       Create Table Operator:                       |
|         name:dgz.test_limit_offset2                |
|         Stage-2                                    |
|           Dependency Collection{}                  |
|             Stage-5(CONDITIONAL)                   |
|               Move Operator                        |
|                 Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) |
|                   Conditional Operator             |
|                     Stage-1                        |
|                       Reducer 2                    |
|                       File Output Operator [FS_6]  |
|                         table:{"name:":"dgz.test_limit_offset2"} |
|                         Limit [LIM_5] (rows=5 width=1) |   //reduce side full limit offset
|                           Number of rows:5,Offset of rows:2 |
|                           Select Operator [SEL_4] (rows=5 width=1) |
|                             Output:["_col0"]       |
|                           <-Map 1 [CUSTOM_SIMPLE_EDGE] |
|                             PARTITION_ONLY_SHUFFLE [RS_3] |
|                               Limit [LIM_2] (rows=5 width=1) |  //map side full limit offset
|                                 Number of rows:5,Offset of rows:2 |
|                                 Select Operator [SEL_1] (rows=13 width=1) |
|                                   Output:["_col0"] |
|                                   TableScan [TS_0] (rows=13 width=1) |
|                                     dgz@test_limit_offset,test_limit_offset,Tbl:COMPLETE,Col:NONE,Output:["id"] |
|             Stage-4(CONDITIONAL)                   |
|               File Merge                           |
|                  Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) |
|             Stage-7                                |
|               Move Operator                        |
|                 Stage-6(CONDITIONAL)               |
|                   File Merge                       |
|                      Please refer to the previous Stage-8(CONDITIONAL CHILD TASKS: Stage-5, Stage-4, Stage-6) |
|         Stage-0                                    |
|           Move Operator                            |
|              Please refer to the previous Stage-5(CONDITIONAL) |
|              Please refer to the previous Stage-4(CONDITIONAL) |
|              Please refer to the previous Stage-7  |
|                                                    |
+----------------------------------------------------+

{code}
It generate limit operator on both map and reduce side. 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)