You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Attila Sasvari (JIRA)" <ji...@apache.org> on 2017/11/04 21:27:00 UTC

[jira] [Updated] (OOZIE-1401) PurgeCommand should purge the workflow jobs w/o end_time

     [ https://issues.apache.org/jira/browse/OOZIE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Attila Sasvari updated OOZIE-1401:
----------------------------------
    Attachment: OOZIE-1401-001.patch

[~andras.piros] can I take this over from you? Please take a look at the attached patch. It is a very simple changeset with a new purge test case where end_time is null.

> PurgeCommand should purge the workflow jobs w/o end_time
> --------------------------------------------------------
>
>                 Key: OOZIE-1401
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1401
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: bundle, coordinator, workflow
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Andras Piros
>            Priority: Major
>         Attachments: OOZIE-1401-001.patch
>
>
> Currently, {{PurgeXCommand}} logic is not working with those workflow jobs with {{end_time=null}}. This command needs to take care of those jobs as well. This happens in the case of long stuck jobs after Hadoop restarts or DB failures. It could be done by checking {{last_modified_time}} instead, if {{end_time}} is not available.
> The current query:
> {code:sql}
> select w from WorkflowJobBean w where w.endTimestamp < :endTime
> {code}
> There is also an issue when:
> * there is a parent workflow that has its {{end_time}} set
> * is otherwise eligible for {{PurgeXCommand}}: {{end_time}} is older than configured number of days, and has {{status}} either {{KILLED}}, or {{FAILED}}, or {{SUCCEEDED}}
> * has a child workflow that has the {{parent_id}} set to the {{id}} of the parent workflow
> * child workflow has its {{end_time = NULL}}
> In this case, [*{{PurgeXCommand#fetchTerminatedWorkflow()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/PurgeXCommand.java#L249] throws a {{NullPointerException}} like this:
> {noformat}
> 2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on
> 2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Execute command [purge] key [null]
> 2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days.
> 2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, 
> java.lang.NullPointerException
> 	at org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249)
> 	at org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227)
> 	at org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199)
> 	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150)
> 	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53)
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)