You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yue Zhang (Jira)" <ji...@apache.org> on 2021/08/26 06:41:00 UTC
[jira] [Comment Edited] (HUDI-2355) after clustering with archive
meet data incorrect
[ https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404991#comment-17404991 ]
Yue Zhang edited comment on HUDI-2355 at 8/26/21, 6:40 AM:
-----------------------------------------------------------
Actually, this problems does exist based to current master branch that cleaner happened first then archival executed.
`protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, String instantTime, Option<Map<String, String>> extraMetadata) {
try {
// Delete the marker directory for the instant.
WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
.quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
// We cannot have unbounded commit files. Archive commits if we have to archive
HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, table);
archiveLog.archiveIfRequired(context);
if (operationType != null && operationType != WriteOperationType.CLUSTER && operationType != WriteOperationType.COMPACT)
{ syncTableMetadata(); }
} catch (IOException ioe)
{ throw new HoodieIOException(ioe.getMessage(), ioe); }
finally
{ this.heartbeatClient.stop(instantTime); }
}
`
Even using async cleaner mode, the archival will not wait for async cleaner service finished and start to archive/delete commits.
Just raise a PR to fix this problem to adjust the execution order that execute cleaner first then archive
was (Author: zhangyue19921010):
Actually, this problems does exist based to current master branch that cleaner happened first then archival executed.
```
protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, String instantTime, Option<Map<String, String>> extraMetadata) {
try {
// Delete the marker directory for the instant.
WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
.quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
// We cannot have unbounded commit files. Archive commits if we have to archive
HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, table);
archiveLog.archiveIfRequired(context);
if (operationType != null && operationType != WriteOperationType.CLUSTER && operationType != WriteOperationType.COMPACT) {
syncTableMetadata();
}
} catch (IOException ioe) {
throw new HoodieIOException(ioe.getMessage(), ioe);
} finally {
this.heartbeatClient.stop(instantTime);
}
}
```
Even using async cleaner mode, the archival will not wait for async cleaner service finished and start to archive/delete commits.
Just raise a PR to fix this problem to adjust the execution order that execute cleaner first then archive
> after clustering with archive meet data incorrect
> --------------------------------------------------
>
> Key: HUDI-2355
> URL: https://issues.apache.org/jira/browse/HUDI-2355
> Project: Apache Hudi
> Issue Type: Sub-task
> Reporter: liwei
> Assignee: liwei
> Priority: Major
> Labels: pull-request-available
>
> after [https://github.com/apache/hudi/pull/3310] replace data file clean in clean. but if replacecommit file deleted , in clean can not read the datafile.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)