You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@edgent.apache.org by "Victor Dogaru (JIRA)" <ji...@apache.org> on 2016/03/23 01:47:25 UTC

[jira] [Commented] (QUARKS-66) Job monitoring application which restarts failed jobs

    [ https://issues.apache.org/jira/browse/QUARKS-66?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15207632#comment-15207632 ] 

Victor Dogaru commented on QUARKS-66:
-------------------------------------

I have prototyped an application with a stream generated by JobEvents.source(), a filter for closed jobs with an UNHEALTHY health value, and a sink which restarts the failed job.
* The given provider needs to register the created topologies and their submission configuration.
* The submitter must submit the topology explicitly assigning the topology name to the job.
* Given a job name, the job monitoring sink needs to:
** retrieve the topology and the submission configuration as registered on the first submission
** resubmit the topology with the same configuration

I checked the ApplicationService definition and how it is used in the IotProvider and IotAppServiceTest, and I see that I already have bits and pieces and I could reuse to achieve the desired functionality.  I still have a few questions:
* It seems that the provider could register a JsonControlService and an ApplicationService, then the job monitoring application could make a control request for to resubmit the application?
* I need a registry which, given an application name, allows me to query for its topology and submission configuration.  Would this come from extending the AppService implementation, creating a new service, or some other option?

Does it sound like I am on the right track here?  I appreciate an assistance.

> Job monitoring application which restarts failed jobs
> -----------------------------------------------------
>
>                 Key: QUARKS-66
>                 URL: https://issues.apache.org/jira/browse/QUARKS-66
>             Project: Quarks
>          Issue Type: Task
>            Reporter: Victor Dogaru
>            Assignee: Victor Dogaru
>
> An application which filters job events indicating jobs which closed with an unhealthy state and resubmits applications associated with those jobs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)