You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Hadoop QA (Jira)" <ji...@apache.org> on 2022/11/10 17:09:00 UTC

[jira] [Commented] (OOZIE-3669) Fix purge process for bundles to prevent orphan coordinators

    [ https://issues.apache.org/jira/browse/OOZIE-3669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17631796#comment-17631796 ] 

Hadoop QA commented on OOZIE-3669:
----------------------------------

PreCommit-OOZIE-Build started


> Fix purge process for bundles to prevent orphan coordinators
> ------------------------------------------------------------
>
>                 Key: OOZIE-3669
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3669
>             Project: Oozie
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 5.2.1
>            Reporter: Janos Makai
>            Assignee: Janos Makai
>            Priority: Major
>         Attachments: OOZIE-3669-001.patch
>
>
> The Oozie purge process for bundles is creating orphan coordinators. When purging bundle jobs and bundle actions, it does not always purge coordinator jobs, etc. This causes orphaned coordinators, meaning neither they nor their children will ever be purged due to the purge logic.
>  
> ----
>  
> When purging bundles, it first compiles a list of any coordinators which are not ready to purge [0]. It checks the coord list for status and coordOlderThan. If the no child coordinator meets these criteria, it adds it to the coordsToPurge list.
> Being added to the list does not guarantee that the coordinator will be purged however. The processCoordinators method also has logic to check if the children workflows are older than wfOlderThan [1]. If a purge command is started where wfOlderThan is much higher than coordOlderThan (for example the default values are 30 days for workflows and 7 days for coordinators), then the bundle will be purged, but the coordinator will not.
> Since the bundle is now purged, the child coordinator will never be purged because only parentless coordinators will be checked, since coordinators with parents will only be purged when the bundle is purged
> [0]
> {code:java}
> PurgeXCommand
>  380 long numChildrenNotReady = jpaService.execute(
>  381 new CoordJobsCountNotForPurgeFromParentIdJPAExecutor(coordOlderThan, bundleId));
> CoordinatorJobBean
>  192 @NamedQuery(name = "GET_COORD_COUNT_WITH_PARENT_ID_NOT_READY_FOR_PURGE", query = "select count(w) from CoordinatorJobBean"
>  193 + " w where w.bundleId = :parentId and (w.statusStr NOT IN ('SUCCEEDED', 'FAILED', 'KILLED', 'DONEWITHERROR') "
>  194 + "OR w.lastModifiedTimestamp >= :lastModTime)"),
> {code}
>  
> [1]
> {code:java}
> PurgeXCommand
>  343 List<String> workflowChildren = fetchTerminatedWorkflow(wfjBeanList);
>  344
> private boolean isWorkflowPurgeable(WorkflowJobBean wfjBean, long wfOlderThanMS) {
>  308 final Date wfEndTime = wfjBean.getEndTime();
>  309 final boolean isFinished = wfjBean.inTerminalState();
>  310 if (isFinished && wfEndTime != null && wfEndTime.getTime() < wfOlderThanMS)
> { 311 return true; 312 }
> 313 else {
>  314 final Date lastModificationTime = wfjBean.getLastModifiedTime();
>  315 if (isFinished && lastModificationTime != null && lastModificationTime.getTime() < wfOlderThanMS)
> { 316 return true; 317 }
> 318 }
>  319 return false;
> 345 // if all workflow are ready to purge add them and add the coordinator and their actions
>  346 if(workflowChildren.size() == wfjBeanList.size()) {
>  347 LOG.debug("Purging coordinator " + coordId);
>  348 wfsToPurge.addAll(workflowChildren);
>  349 coordsToPurge.add(coordId);
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)