You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Yue Zhang (Jira)" <ji...@apache.org> on 2021/08/26 06:41:00 UTC

[jira] [Comment Edited] (HUDI-2355) after clustering with archive meet data incorrect

    [ https://issues.apache.org/jira/browse/HUDI-2355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17404991#comment-17404991 ] 

Yue Zhang edited comment on HUDI-2355 at 8/26/21, 6:40 AM:
-----------------------------------------------------------

Actually, this problems does exist  based to current master branch that cleaner happened first then archival executed. 

`protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, String instantTime, Option<Map<String, String>> extraMetadata) {
 try {
 // Delete the marker directory for the instant.
 WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
 .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
 // We cannot have unbounded commit files. Archive commits if we have to archive
 HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, table);
 archiveLog.archiveIfRequired(context);
 if (operationType != null && operationType != WriteOperationType.CLUSTER && operationType != WriteOperationType.COMPACT)

{ syncTableMetadata(); }

} catch (IOException ioe)

{ throw new HoodieIOException(ioe.getMessage(), ioe); }

finally

{ this.heartbeatClient.stop(instantTime); }

}

`

Even using async cleaner mode, the archival will not wait for async cleaner service finished and start to archive/delete commits.

 

Just raise a PR to fix this problem to adjust the execution order that execute cleaner first then archive


was (Author: zhangyue19921010):
Actually, this problems does exist  based to current master branch that cleaner happened first then archival executed. 
```

protected void postCommit(HoodieTable<T, I, K, O> table, HoodieCommitMetadata metadata, String instantTime, Option<Map<String, String>> extraMetadata) {
 try {
 // Delete the marker directory for the instant.
 WriteMarkersFactory.get(config.getMarkersType(), table, instantTime)
 .quietDeleteMarkerDir(context, config.getMarkersDeleteParallelism());
 // We cannot have unbounded commit files. Archive commits if we have to archive
 HoodieTimelineArchiveLog archiveLog = new HoodieTimelineArchiveLog(config, table);
 archiveLog.archiveIfRequired(context);
 if (operationType != null && operationType != WriteOperationType.CLUSTER && operationType != WriteOperationType.COMPACT) {
 syncTableMetadata();
 }
 } catch (IOException ioe) {
 throw new HoodieIOException(ioe.getMessage(), ioe);
 } finally {
 this.heartbeatClient.stop(instantTime);
 }
}


```

Even using async cleaner mode, the archival will not wait for async cleaner service finished and start to archive/delete commits.

 

Just raise a PR to fix this problem to adjust the execution order that execute cleaner first then archive

> after clustering with archive  meet data incorrect
> --------------------------------------------------
>
>                 Key: HUDI-2355
>                 URL: https://issues.apache.org/jira/browse/HUDI-2355
>             Project: Apache Hudi
>          Issue Type: Sub-task
>            Reporter: liwei
>            Assignee: liwei
>            Priority: Major
>              Labels: pull-request-available
>
> after  [https://github.com/apache/hudi/pull/3310]  replace data file clean in clean. but if replacecommit file deleted , in clean can not read the datafile. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)