You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@doris.apache.org by GitBox <gi...@apache.org> on 2021/11/06 13:23:00 UTC

[GitHub] [incubator-doris] renyuankun opened a new issue #7027: [Bug]

renyuankun opened a new issue #7027:
URL: https://github.com/apache/incubator-doris/issues/7027


   ### Search before asking
   
   - [X] I had searched in the [issues](https://github.com/apache/incubator-doris/issues?q=is%3Aissue) and found no similar issues.
   
   
   ### Version
   
   0.14.12.4
   
   ### What's Wrong?
   
   According to the official document`Colocation Join` function description,Created a table `test_table`
   Table building statement:
   ```
   CREATE TABLE `test_table` (
     `id` bigint(20) NULL COMMENT "Continuously increase from 1 without repeating",
     `user_id` bigint(20) NULL COMMENT "Will repeat",
     INDEX id_index (`id`) COMMENT '',
     INDEX user_id_index (`user_id`) COMMENT ''
   ) ENGINE=OLAP
   COMMENT ""
   DISTRIBUTED BY HASH(`user_id`) BUCKETS 10
   PROPERTIES (
   "replication_num" = "3",
   "colocate_with" = "group_test",
   "in_memory" = "false",
   "storage_format" = "V2"
   );
   ```
   
   After the table is created, do the following things:
   1. 10 million pieces of data with continuous self-increment and non-repetitive IDs are inserted into the table
   2. Execute statement:
   ```
   select * from test_table where id BETWEEN 1 and 100
   ```
   The result of the query is only 88, and the user_id of these 88 data is the same(This user_id only has 88 pieces of data)
   3. Change the sentence to:
   ```
   select * from test_table where id BETWEEN 89 and 100
   ```
   At this time, the remaining 12 pieces of data have been found out, and the user_id of these 12 pieces of data has another value.
   
   If the first step is to limit the number of inserted data, for example, limit to 10,000, then there will be no problems when you perform step 2。Or when creating the table, change `DISTRIBUTED BY HASH(`user_id`)` to `DISTRIBUTED BY HASH(`id`)` it is normal
   
   Remove the `"colocate_with" = "group_test",` in the table creation statement, no effect
   
   
   ### What You Expected?
   
   Performing the single-table query in step two should return 100 data instead of 88
   
   ### How to Reproduce?
   
   _No response_
   
   ### Anything Else?
   
   The explain results are as follows:
   
   ```
   PLAN FRAGMENT 0
    OUTPUT EXPRS:`default_cluster:test_data.test_table`.`id` | `default_cluster:test_data.test_table`.`user_id`
     PARTITION: UNPARTITIONED
   
     RESULT SINK
   
     1:EXCHANGE
   
   PLAN FRAGMENT 1
    OUTPUT EXPRS:
     PARTITION: HASH_PARTITIONED: `default_cluster:test_data`.`test_table`.`user_id`
   
     STREAM DATA SINK
       EXCHANGE ID: 01
       UNPARTITIONED
   
     0:OlapScanNode
        TABLE: test_table
        PREAGGREGATION: ON
        PREDICATES: `id` >= 1, `id` <= 100
        partitions=1/1
        rollup: test_table
        tabletRatio=10/10
        tabletList=70118,70122,70126,70130,70134,70138,70142,70146,70150,70154
        cardinality=0
        avgRowSize=16.0
        numNodes=1
   ```
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@doris.apache.org
For additional commands, e-mail: commits-help@doris.apache.org