You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@gobblin.apache.org by "Hung Tran (JIRA)" <ji...@apache.org> on 2019/01/14 19:05:00 UTC

[jira] [Resolved] (GOBBLIN-661) Prevent jobs resubmission after manager failure

     [ https://issues.apache.org/jira/browse/GOBBLIN-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hung Tran resolved GOBBLIN-661.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 0.15.0

Issue resolved by pull request #2532
[https://github.com/apache/incubator-gobblin/pull/2532]

> Prevent jobs resubmission after manager failure
> -----------------------------------------------
>
>                 Key: GOBBLIN-661
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-661
>             Project: Apache Gobblin
>          Issue Type: Improvement
>            Reporter: Kuai Yu
>            Assignee: Kuai Yu
>            Priority: Major
>             Fix For: 0.15.0
>
>
> In gobblin cluster, if manager failed and relaunched, all the jobs persisted in the job catalog will be relaunched. This can cause a few issues:
> 1) Scalability issue: because the unfinished job might be submitted at different point of time, now if all of them are submitted at the same time, it can cause a performance issue.
> 2) Waste effort: because the unfinished job now needs to be deleted, we have to kill the existing running job, and resubmit.
>  
> In this change, we improve both 1) and 2)
> 1) In taskdriver mode, we will delete the job spec once we submit to Helix, because we believe Helix is durable and all the jobs submitted wont' be lost, so that we can safely delete the job specs. Next reboot manager won't see those deleted job spec, thus no resubmission is needed. 
> 2) In taskdriver mode, we will cleanup Helix running jobs. If it is a planning job, we won't delete it. Instead we just let it run to the end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)