You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Sergey Svinarchuk (JIRA)" <ji...@apache.org> on 2017/12/29 13:47:00 UTC

[jira] [Commented] (OOZIE-1401) PurgeCommand should purge the workflow jobs w/o end_time

    [ https://issues.apache.org/jira/browse/OOZIE-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16306288#comment-16306288 ] 

Sergey Svinarchuk commented on OOZIE-1401:
------------------------------------------

This patch doesn't work because lastModificationTime always NULL. Need add lastModificationTime to GET_WORKFLOWS_BASIC_INFO_BY_PARENT_ID and GET_WORKFLOWS_BASIC_INFO_BY_COORD_PARENT_ID queries. I can create new patch for this ticket or open new issue.

> PurgeCommand should purge the workflow jobs w/o end_time
> --------------------------------------------------------
>
>                 Key: OOZIE-1401
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1401
>             Project: Oozie
>          Issue Type: Sub-task
>          Components: bundle, coordinator, workflow
>    Affects Versions: trunk
>            Reporter: Mona Chitnis
>            Assignee: Attila Sasvari
>             Fix For: 5.0.0b1
>
>         Attachments: OOZIE-1401-001.patch
>
>
> Currently, {{PurgeXCommand}} logic is not working with those workflow jobs with {{end_time=null}}. This command needs to take care of those jobs as well. This happens in the case of long stuck jobs after Hadoop restarts or DB failures. It could be done by checking {{last_modified_time}} instead, if {{end_time}} is not available.
> The current query:
> {code:sql}
> select w from WorkflowJobBean w where w.endTimestamp < :endTime
> {code}
> There is also an issue when:
> * there is a parent workflow that has its {{end_time}} set
> * is otherwise eligible for {{PurgeXCommand}}: {{end_time}} is older than configured number of days, and has {{status}} either {{KILLED}}, or {{FAILED}}, or {{SUCCEEDED}}
> * has a child workflow that has the {{parent_id}} set to the {{id}} of the parent workflow
> * child workflow has its {{end_time = NULL}}
> In this case, [*{{PurgeXCommand#fetchTerminatedWorkflow()}}*|https://github.com/apache/oozie/blob/master/core/src/main/java/org/apache/oozie/command/PurgeXCommand.java#L249] throws a {{NullPointerException}} like this:
> {noformat}
> 2017-09-29 07:59:46,365 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Purging workflows of long running coordinators is turned on
> 2017-09-29 07:59:46,371 DEBUG org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Execute command [purge] key [null]
> 2017-09-29 07:59:46,371 INFO org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] STARTED Purge to purge Workflow Jobs older than [1] days, Coordinator Jobs older than [1] days, and Bundlejobs older than [1] days.
> 2017-09-29 07:59:46,375 ERROR org.apache.oozie.command.PurgeXCommand: SERVER[host-10-17-101-90.coe.cloudera.com] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Exception, 
> java.lang.NullPointerException
> 	at org.apache.oozie.command.PurgeXCommand.fetchTerminatedWorkflow(PurgeXCommand.java:249)
> 	at org.apache.oozie.command.PurgeXCommand.processWorkflowsHelper(PurgeXCommand.java:227)
> 	at org.apache.oozie.command.PurgeXCommand.processWorkflows(PurgeXCommand.java:199)
> 	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:150)
> 	at org.apache.oozie.command.PurgeXCommand.execute(PurgeXCommand.java:53)
> 	at org.apache.oozie.command.XCommand.call(XCommand.java:286)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> 	at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:178)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> 	at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)