You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oozie.apache.org by "Andras Piros (JIRA)" <ji...@apache.org> on 2017/11/17 10:33:00 UTC

[jira] [Created] (OOZIE-3132) Instrument SLAService and SLACalculatorMemory

Andras Piros created OOZIE-3132:
-----------------------------------

             Summary: Instrument SLAService and SLACalculatorMemory
                 Key: OOZIE-3132
                 URL: https://issues.apache.org/jira/browse/OOZIE-3132
             Project: Oozie
          Issue Type: Improvement
          Components: core
    Affects Versions: 4.3.0
            Reporter: Andras Piros
            Assignee: Andras Piros
             Fix For: 5.0.0b1


When there are lots of {{WorkflowJobBean}} and {{CoordinatorJobBean}} instances that have to be followed up on creating {{SLASummaryBean}} instances, following can occur:
* we set {{oozie.sla.service.SLAService.capacity}} to a sane value like {{10000}} to preserve heap consumption
* {{SLACalculatorMemory#addRegistration()}} and {{SLACalculatorMemory#updateRegistration}} would:
** either emit {{TRACE}} level logs like {{SLA Registration Event - Job:}} showing the add / update of {{SLARegistrationBean}} was successful
** or emit {{ERROR}} level logs like {{SLACalculator memory capacity reached. Cannot add or update new SLA Registration entry for job}} showing the add / update of {{SLARegistrationBean}} was not successful

Since sometimes stale or already processed {{SLAEvent}} entries from {{SLACalculatorMemory#slaMap}} get removed, it's pretty hard to say what is its the actual size - that is, whether the next add or update command will succeed

We need an {{Instrumentation.Counter}} instance that gets incremented when there is an {{SLACalculatorMemory#slaMap#put()}} with a new entry added, and gets decremented when there happens a {{SLACalculatorMemory#slaMap#remove()}} with an existing entry removed. This counter will be automatically present within REST interface, and Oozie client.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)