You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "William Lo (Jira)" <ji...@apache.org> on 2022/03/18 21:07:00 UTC
[jira] [Created] (GOBBLIN-1624) Gobblin as a Service does not emit correct running job metrics and quotas in some edge cases
William Lo created GOBBLIN-1624:
-----------------------------------
Summary: Gobblin as a Service does not emit correct running job metrics and quotas in some edge cases
Key: GOBBLIN-1624
URL: https://issues.apache.org/jira/browse/GOBBLIN-1624
Project: Apache Gobblin
Issue Type: Task
Reporter: William Lo
With the DagManager class in GaaS, during rollout/leader swap it is possible to get an inaccurate count of running jobs emitted, and quotas for these running jobs.
For example, if the leader is shut down while keeping track of 10 running jobs, and during restart 5 of these jobs completed, the leader would emit that 0 jobs are currently running since it would not treat the job counters as idempotent. Additionally, we over-decrement due to not differentiating jobs running on the executor that fail, vs jobs that fail on the GaaS side.
We should keep track of currently running jobs better to ensure that we only decrement counters/quotas for jobs that are actually running on the executor and track better between startup.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)