You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Xiaoqiao He (Jira)" <ji...@apache.org> on 2022/03/10 04:52:00 UTC
[jira] [Created] (HUDI-3599) Not atomicity commit could cause streaming read loss data
Xiaoqiao He created HUDI-3599:
---------------------------------
Summary: Not atomicity commit could cause streaming read loss data
Key: HUDI-3599
URL: https://issues.apache.org/jira/browse/HUDI-3599
Project: Apache Hudi
Issue Type: Bug
Components: core
Reporter: Xiaoqiao He
The current `commit` implement call hierarchy show as following, and `transitionState` invoke write deltacommit file to complete this commit. But `write file` is not atomicity operation on HDFS for instance.
{code:java}
HoodieActiveTimeline.transitionState(HoodieInstant, HoodieInstant, Option<byte[]>, boolean) (org.apache.hudi.common.table.timeline)
HoodieActiveTimeline.transitionState(HoodieInstant, HoodieInstant, Option<byte[]>) (org.apache.hudi.common.table.timeline)
HoodieActiveTimeline.saveAsComplete(HoodieInstant, Option<byte[]>) (org.apache.hudi.common.table.timeline)
BaseHoodieWriteClient.commit(HoodieTable, String, String, HoodieCommitMetadata, List<HoodieWriteStat>) (org.apache.hudi.client)
BaseHoodieWriteClient.commitStats(String, List<HoodieWriteStat>, Option<Map<String, String>>, String, Map<String, List<String>>) (org.apache.hudi.client)
HoodieFlinkWriteClient.commit(String, List<WriteStatus>, Option<Map<String, String>>, String, Map<String, List<String>>) (org.apache.hudi.client)
HoodieJavaWriteClient.commit(String, List<WriteStatus>, Option<Map<String, String>>, String, Map<String, List<String>>) (org.apache.hudi.client)
{code}
As the org.apache.hudi.common.table.timeline.HoodieActiveTimeline#createImmutableFileInPath said as below, there are three step to complete data write: A. create file, B. write data, C. close file handle. Consider `StreamReadMonitoring` traverse this deltacommit file but content is null between step A and B then it will read nothing at the loop. IMO it could loss some commit data for stream read.
{code:java}
private void createImmutableFileInPath(Path fullPath, Option<byte[]> content) {
FSDataOutputStream fsout = null;
try {
fsout = metaClient.getFs().create(fullPath, false);
if (content.isPresent()) {
fsout.write(content.get());
}
} catch (IOException e) {
throw new HoodieIOException("Failed to create file " + fullPath, e);
} finally {
try {
if (null != fsout) {
fsout.close();
}
} catch (IOException e) {
throw new HoodieIOException("Failed to close file " + fullPath, e);
}
}
}
{code}
In order to avoid this corner case, I think we should dependency on `rename` operation to complete commit rather than create-write-close flow. Please correct me if something I missed.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)