You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "William Lo (Jira)" <ji...@apache.org> on 2022/03/18 21:07:00 UTC

[jira] [Created] (GOBBLIN-1624) Gobblin as a Service does not emit correct running job metrics and quotas in some edge cases

William Lo created GOBBLIN-1624:
-----------------------------------

             Summary: Gobblin as a Service does not emit correct running job metrics and quotas in some edge cases
                 Key: GOBBLIN-1624
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-1624
             Project: Apache Gobblin
          Issue Type: Task
            Reporter: William Lo


With the DagManager class in GaaS, during rollout/leader swap it is possible to get an inaccurate count of running jobs emitted, and quotas for these running jobs.

For example, if the leader is shut down while keeping track of 10 running jobs, and during restart 5 of these jobs completed, the leader would emit that 0 jobs are currently running since it would not treat the job counters as idempotent. Additionally, we over-decrement due to not differentiating jobs running on the executor that fail, vs jobs that fail on the GaaS side.

We should keep track of currently running jobs better to ensure that we only decrement counters/quotas for jobs that are actually running on the executor and track better between startup. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)