You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "ruofan (Jira)" <ji...@apache.org> on 2023/03/27 07:31:00 UTC

[jira] [Updated] (HUDI-5990) Incremental queries on MOR sometimes miss data

     [ https://issues.apache.org/jira/browse/HUDI-5990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ruofan updated HUDI-5990:
-------------------------
    Description: 
env: hudi-0.12.2 spark-3.2.0

Currently,we have a hudi timeline and data files.
{code:java}
-rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095758155.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:58 20230326095810406.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095811072.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095820974.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095830980.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095840978.compaction.requested
-rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095841125.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:59 20230326095850994.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095900988.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095910983.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.requested


-rw-r--r--  1 rfyu rfyu 1.5K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code}
We use spark to incrementally query this hudi table. Data maybe go missing due to the incremental range contains an incomplete compaction plan.

There is an example of incremental query.Normally, from begin_instance_time to end_instance_time, 6 commits should have been found, but only 3 were found.
{code:java}
sql:
call copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988');
select _hoodie_commit_time,count(*) from incremental_table group by _hoodie_commit_time order by _hoodie_commit_time desc;actual result: 
+-------------------+--------+
|_hoodie_commit_time|count(1)|
+-------------------+--------+
|20230326095830980  |10      |
|20230326095820974  |10      |
|20230326095811072  |10      |
+-------------------+--------+expected result:
+-------------------+--------+
|_hoodie_commit_time|count(1)|
+-------------------+--------+
|20230326095830980  |10      |
|20230326095820974  |10      |
|20230326095811072  |10      |
|20230326095841125  |10      |
|20230326095850994  |10      |
|20230326095900988  |10      | {code}

  was:
env: hudi-0.12.2 spark-3.2.0

Currently,we have a hudi timeline and data files.
{code:java}
-rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095758155.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:58 20230326095810406.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095811072.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095820974.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095830980.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095840978.compaction.requested
-rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095841125.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:59 20230326095850994.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095900988.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095910983.deltacommit
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.requested
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.inflight
-rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.requested
-rw-r--r--  1 rfyu rfyu 1.5K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0
-rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code}
We use spark to incrementally query this hudi table. Data maybe go missing due to the incremental range contains an incomplete compaction plan.

There is an example of incremental query.Normally, from begin_instance_time to end_instance_time, 6 commits should have been found, but only 3 were found.
{code:java}
sql:
call copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988');
select _hoodie_commit_time,count(*) from incremental_table group by _hoodie_commit_time order by _hoodie_commit_time desc;actual result: 
+-------------------+--------+
|_hoodie_commit_time|count(1)|
+-------------------+--------+
|20230326095830980  |10      |
|20230326095820974  |10      |
|20230326095811072  |10      |
+-------------------+--------+expected result:
+-------------------+--------+
|_hoodie_commit_time|count(1)|
+-------------------+--------+
|20230326095830980  |10      |
|20230326095820974  |10      |
|20230326095811072  |10      |
|20230326095841125  |10      |
|20230326095850994  |10      |
|20230326095900988  |10      | {code}


> Incremental queries on MOR sometimes miss data
> ----------------------------------------------
>
>                 Key: HUDI-5990
>                 URL: https://issues.apache.org/jira/browse/HUDI-5990
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: spark-sql
>    Affects Versions: 0.12.2, 0.13.0
>            Reporter: ruofan
>            Priority: Major
>             Fix For: 0.14.0
>
>
> env: hudi-0.12.2 spark-3.2.0
> Currently,we have a hudi timeline and data files.
> {code:java}
> -rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095758155.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:57 20230326095758155.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:58 20230326095810406.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095810406.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095811072.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095811072.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:58 20230326095820974.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095820974.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095830980.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095830980.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.8K 3月  26 09:58 20230326095840978.compaction.requested
> -rw-r--r-- 1 rfyu rfyu 1.5K 3月  26 09:58 20230326095841125.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095841125.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.6K 3月  26 09:59 20230326095850994.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:58 20230326095850994.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095900988.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095900988.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu 1.7K 3月  26 09:59 20230326095910983.deltacommit
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095910983.deltacommit.requested
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.inflight
> -rw-r--r-- 1 rfyu rfyu    0 3月  26 09:59 20230326095920986.deltacommit.requested
> -rw-r--r--  1 rfyu rfyu 1.5K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.1_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.2_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.3_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.4_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095758155.log.5_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:58 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.1_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.2_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.3_0-1-0
> -rw-r--r--  1 rfyu rfyu 3.0K 3月  26 09:59 .b9f3a322-b0fe-4f70-8ad8-aa2664be957c_20230326095840978.log.4_0-1-0 {code}
> We use spark to incrementally query this hudi table. Data maybe go missing due to the incremental range contains an incomplete compaction plan.
> There is an example of incremental query.Normally, from begin_instance_time to end_instance_time, 6 commits should have been found, but only 3 were found.
> {code:java}
> sql:
> call copy_to_table(table=>'hudi_table',new_table=>'incremental_table',query_type=>'incremental',begin_instance_time=>'20230326095810406',end_instance_time=>'20230326095900988');
> select _hoodie_commit_time,count(*) from incremental_table group by _hoodie_commit_time order by _hoodie_commit_time desc;actual result: 
> +-------------------+--------+
> |_hoodie_commit_time|count(1)|
> +-------------------+--------+
> |20230326095830980  |10      |
> |20230326095820974  |10      |
> |20230326095811072  |10      |
> +-------------------+--------+expected result:
> +-------------------+--------+
> |_hoodie_commit_time|count(1)|
> +-------------------+--------+
> |20230326095830980  |10      |
> |20230326095820974  |10      |
> |20230326095811072  |10      |
> |20230326095841125  |10      |
> |20230326095850994  |10      |
> |20230326095900988  |10      | {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)