You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Sagar Sumit (Jira)" <ji...@apache.org> on 2023/04/13 17:16:00 UTC

[jira] [Closed] (HUDI-5990) Incremental queries on MOR sometimes miss data

     [ https://issues.apache.org/jira/browse/HUDI-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sagar Sumit closed HUDI-5990.
-----------------------------
    Resolution: Fixed

> Incremental queries on MOR sometimes miss data
> ----------------------------------------------
>
>                 Key: HUDI-5990
>                 URL: https://issues.apache.org/jira/browse/HUDI-5990
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark-sql
>    Affects Versions: 0.12.2, 0.13.0
>            Reporter: ruofan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.14.0
>
>
> env: hudi-0.12.2 spark-3.2.0
> Currently,we have a hudi timeline and data files.
> {code:java}
> -rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095758155.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:58 20230326095810406.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095811072.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095820974.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095830980.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095840978.compaction.requested
> -rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095841125.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:59 20230326095850994.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095900988.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095910983.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.requested
> -rw-r--r--  1 rfyu rfyu 1.5K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code}
> We use spark to incrementally query this hudi table. Data maybe go missing due to the incremental range contains an incomplete compaction plan.
> There is an example of incremental query.Normally, from begin_instance_time to end_instance_time, 6 commits should have been found, but only 3 were found.
> {code:java}
> sql:
> call copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988');
> select _hoodie_commit_time,count(*) from incremental_table group by _hoodie_commit_time order by _hoodie_commit_time desc;
> actual result: 
> +-------------------+--------+
> |_hoodie_commit_time|count(1)|
> +-------------------+--------+
> |20230326095830980  |10      |
> |20230326095820974  |10      |
> |20230326095811072  |10      |
> +-------------------+--------+
> expected result:
> +-------------------+--------+
> |_hoodie_commit_time|count(1)|
> +-------------------+--------+
> |20230326095830980  |10      |
> |20230326095820974  |10      |
> |20230326095811072  |10      |
> |20230326095841125  |10      |
> |20230326095850994  |10      |
> |20230326095900988  |10      | {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)