You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "loukey_j (Jira)" <ji...@apache.org> on 2022/05/22 08:09:00 UTC
[jira] [Updated] (HUDI-4133) Sprak query mor by snapshot query lost data
[ https://issues.apache.org/jira/browse/HUDI-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
loukey_j updated HUDI-4133:
---------------------------
Component/s: flink
spark-sql
> Sprak query mor by snapshot query lost data
> ---------------------------------------------
>
> Key: HUDI-4133
> URL: https://issues.apache.org/jira/browse/HUDI-4133
> Project: Apache Hudi
> Issue Type: Bug
> Components: core, flink, spark-sql
> Reporter: loukey_j
> Priority: Major
>
> Suppose there are two no intersection batches of data written to a new hudi mor no partition table in turn by flink.
> Hooide timeline and log file as follows:
>
> hdfs dfs -ls hdfs://xxx/mor_test/.hoodie
> 0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/.aux
> 0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/.schema
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/.temp
> 5291 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit.inflight
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164201245.deltacommit.requested
> 5291 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit.inflight
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie/20220521164214473.deltacommit.requested
> 0 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/archived
> 798 2022-05-21 16:41 hdfs://xxx/mor_test/.hoodie/hoodie.properties
> hdfs dfs -ls hdfs://xxx/mor_test/
> 13316 2022-05-21 16:42 hdfs://xxx/mor_test/.00000000-1dd6-4395-9c90-53f8a6c6eed3_20220521164201245.log.1_0-2-0
> 28395 2022-05-21 16:42 hdfs://xxx/mor_test/.00000000-1dd6-4395-9c90-53f8a6c6eed3_20220521164214473.log.1_0-2-0
> 0 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie
> 100 2022-05-21 16:42 hdfs://xxx/mor_test/.hoodie_partition_metadata
>
> Use spark snapshot query execute such sql 'select distinct _hoodie_commit_time from mor_test_rt'
> Expected results is 20220521164201245 and 20220521164214473, but actual results is 20220521164214473.
--
This message was sent by Atlassian Jira
(v8.20.7#820007)