You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "loukey_j (Jira)" <ji...@apache.org> on 2022/05/22 08:09:00 UTC

[jira] [Updated] (HUDI-4133) Sprak query mor by snapshot query lost data

     [ https://issues.apache.org/jira/browse/HUDI-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

loukey_j updated HUDI-4133:
---------------------------
    Component/s: flink
                 spark-sql

> Sprak query  mor by snapshot query lost data 
> ---------------------------------------------
>
>                 Key: HUDI-4133
>                 URL: https://issues.apache.org/jira/browse/HUDI-4133
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: core, flink, spark-sql
>            Reporter: loukey_j
>            Priority: Major
>
> Suppose there are two no intersection batches of data written to a new hudi  mor no partition table  in turn by flink.
> Hooide timeline and log file as follows:
>  
> hdfs dfs -ls hdfs://xxx/mor_test/.hoodie
>      0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/.aux
>      0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/.schema
>      0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/.temp
>   5291 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit
>      0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit.inflight
>      0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit.requested
>   5291 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit
>      0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit.inflight
>      0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit.requested
>      0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/archived
>    798 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/hoodie.properties
> hdfs dfs -ls hdfs://xxx/mor_test/
>  13316 2022-05-21 16:42 hdfs://xxx/mor_test/.00000000-1dd6-4395-9c90-53f8a6c6eed3_20220521164201245.log.1_0-2-0
>  28395 2022-05-21 16:42 hdfs://xxx/mor_test/.00000000-1dd6-4395-9c90-53f8a6c6eed3_20220521164214473.log.1_0-2-0
>      0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie
>    100 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie_partition_metadata
>  
> Use spark snapshot query execute such sql 'select distinct _hoodie_commit_time from mor_test_rt' 
> Expected results is 20220521164201245 and 20220521164214473, but actual results is 20220521164214473.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)