You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Kuai Yu (JIRA)" <ji...@apache.org> on 2019/01/09 01:11:00 UTC

[jira] [Created] (GOBBLIN-661) Prevent jobs resubmission after manager failure

Kuai Yu created GOBBLIN-661:
-------------------------------

             Summary: Prevent jobs resubmission after manager failure
                 Key: GOBBLIN-661
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-661
             Project: Apache Gobblin
          Issue Type: Improvement
            Reporter: Kuai Yu
            Assignee: Kuai Yu


In gobblin cluster, if manager failed and relaunched, all the jobs persisted in the job catalog will be relaunched. This can cause a few issues:

1) Scalability issue: because the unfinished job might be submitted at different point of time, now if all of them are submitted at the same time, it can cause a performance issue.

2) Waste effort: because the unfinished job now needs to be deleted, we have to kill the existing running job, and resubmit.

 

In this change, we improve both 1) and 2)

1) In taskdriver mode, we will delete the job spec once we submit to Helix, because we believe Helix is durable and all the jobs submitted wont' be lost, so that we can safely delete the job specs. Next reboot manager won't see those deleted job spec, thus no resubmission is needed. 

2) In taskdriver mode, we will cleanup Helix running jobs. If it is a planning job, we won't delete it. Instead we just let it run to the end.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)