You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Jaydeep Vishwakarma (JIRA)" <ji...@apache.org> on 2014/04/11 11:37:16 UTC

[jira] [Commented] (OOZIE-1401) PurgeCommand should purge the workflow jobs w/o end_time

    [ https://issues.apache.org/jira/browse/OOZIE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966365#comment-13966365 ] 

Jaydeep Vishwakarma commented on OOZIE-1401:
--------------------------------------------

[~chitnis],
I saw the code snippet for this. It first fetch all eligible workflows for deletion and than start removing one by one.
The way current code is written for purging work flow might not create issues when you have less count of workflow, But when you have more than a million work flow it will run very slow and create extra load on DB. I think all eligible workflows should be deleted by single query. 
Although I have small patch ready for this bug, Still I feel we should think other prospects as well. 

> PurgeCommand should purge the workflow jobs w/o end_time
> --------------------------------------------------------
>
>                 Key: OOZIE-1401
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1401
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: bundle, coordinator, workflow
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>             Fix For: trunk
>
>
> Currently, Purge logic is not working with those workflow jobs with end_time=null. This command needs to take care of those jobs as well. This happens in the case of long stuck jobs after Hadoop restarts or DB failures. It could be done by checking created_time if end_time is not available.
> The current query:
> select w from WorkflowJobBean w where w.endTimestamp < :endTime



--
This message was sent by Atlassian JIRA
(v6.2#6252)