You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/09/17 19:24:00 UTC

[jira] [Closed] (HUDI-4736) Fix inflight clean action preventing clean service to continue when multiple cleans are not allowed

     [ https://issues.apache.org/jira/browse/HUDI-4736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo closed HUDI-4736.
---------------------------
    Resolution: Fixed

> Fix inflight clean action preventing clean service to continue when multiple cleans are not allowed
> ---------------------------------------------------------------------------------------------------
>
>                 Key: HUDI-4736
>                 URL: https://issues.apache.org/jira/browse/HUDI-4736
>             Project: Apache Hudi
>          Issue Type: Bug
>          Components: cleaning
>    Affects Versions: 0.11.0, 0.11.1
>            Reporter: Ethan Guo
>            Assignee: Ethan Guo
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.12.1
>
>
> For Hudi Deltastreamer async cleaning, when the Spark job fails in the middle of the cleaning, leaving the clean instant inflight, the Spark job retried next time may not resume the inflight clean action if `hoodie.clean.allow.multiple` is `false`, i.e., multiple clean schedules are disabled.  This is due to a bug in the code below.
>  
> Relevant logic in BaseHoodieWriteClient:
> {code:java}
> public HoodieCleanMetadata clean(String cleanInstantTime, boolean scheduleInline, boolean skipLocking) throws HoodieIOException {
>   if (!tableServicesEnabled(config)) {
>     return null;
>   }
>   final Timer.Context timerContext = metrics.getCleanCtx();
>   CleanerUtils.rollbackFailedWrites(config.getFailedWritesCleanPolicy(),
>       HoodieTimeline.CLEAN_ACTION, () -> rollbackFailedWrites(skipLocking));
>   HoodieCleanMetadata metadata = null;
>   HoodieTable table = createTable(config, hadoopConf);
>   if (config.allowMultipleCleans() || !table.getActiveTimeline().getCleanerTimeline().filterInflightsAndRequested().firstInstant().isPresent()) {
>     LOG.info("Cleaner started");
>     // proceed only if multiple clean schedules are enabled or if there are no pending cleans.
>     if (scheduleInline) {
>       scheduleTableServiceInternal(cleanInstantTime, Option.empty(), TableServiceType.CLEAN);
>       table.getMetaClient().reloadActiveTimeline();
>     }
>     metadata = table.clean(context, cleanInstantTime, skipLocking);
>     if (timerContext != null && metadata != null) {
>       long durationMs = metrics.getDurationInMs(timerContext.stop());
>       metrics.updateCleanMetrics(durationMs, metadata.getTotalFilesDeleted());
>       LOG.info("Cleaned " + metadata.getTotalFilesDeleted() + " files"
>           + " Earliest Retained Instant :" + metadata.getEarliestCommitToRetain()
>           + " cleanerElapsedMs" + durationMs);
>     }
>   }
>   return metadata;
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)