You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@edgent.apache.org by "Victor Dogaru (JIRA)" <ji...@apache.org> on 2016/03/23 01:47:25 UTC
[jira] [Commented] (QUARKS-66) Job monitoring application which
restarts failed jobs
[ https://issues.apache.org/jira/browse/QUARKS-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207632#comment-15207632 ]
Victor Dogaru commented on QUARKS-66:
-------------------------------------
I have prototyped an application with a stream generated by JobEvents.source(), a filter for closed jobs with an UNHEALTHY health value, and a sink which restarts the failed job.
* The given provider needs to register the created topologies and their submission configuration.
* The submitter must submit the topology explicitly assigning the topology name to the job.
* Given a job name, the job monitoring sink needs to:
** retrieve the topology and the submission configuration as registered on the first submission
** resubmit the topology with the same configuration
I checked the ApplicationService definition and how it is used in the IotProvider and IotAppServiceTest, and I see that I already have bits and pieces and I could reuse to achieve the desired functionality. I still have a few questions:
* It seems that the provider could register a JsonControlService and an ApplicationService, then the job monitoring application could make a control request for to resubmit the application?
* I need a registry which, given an application name, allows me to query for its topology and submission configuration. Would this come from extending the AppService implementation, creating a new service, or some other option?
Does it sound like I am on the right track here? I appreciate an assistance.
> Job monitoring application which restarts failed jobs
> -----------------------------------------------------
>
> Key: QUARKS-66
> URL: https://issues.apache.org/jira/browse/QUARKS-66
> Project: Quarks
> Issue Type: Task
> Reporter: Victor Dogaru
> Assignee: Victor Dogaru
>
> An application which filters job events indicating jobs which closed with an unhealthy state and resubmits applications associated with those jobs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)