You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by "Andrew Onischuk (JIRA)" <ji...@apache.org> on 2015/10/15 16:40:05 UTC

[jira] [Created] (AMBARI-13434) Expose Alert Grace Period Setting in Agents

Andrew Onischuk created AMBARI-13434:
----------------------------------------

             Summary: Expose Alert Grace Period Setting in Agents
                 Key: AMBARI-13434
                 URL: https://issues.apache.org/jira/browse/AMBARI-13434
             Project: Ambari
          Issue Type: Bug
            Reporter: Andrew Onischuk
            Assignee: Andrew Onischuk
             Fix For: 2.1.3


On some deployments, hosts may be required to run many alerts depending on the
number of components installed. If the number of components is large, it's
possible that alert jobs may miss their scheduled intervals. The default grace
period set by APS is 1 second, which is rather aggressive.

    
    
    
    WARNING 2015-07-29 20:59:50,733 scheduler.py:496 - Run time of job "947770c6-424a-4ef8-9a46-19eca8fd080b (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309353)" was missed by 0:00:01.423766
    WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "005b1d50-2aca-4af2-a3b4-bc39e6f65ede (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.309646)" was missed by 0:00:01.424313
    WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "6950ff19-c26c-46b7-8bac-1869773f1380 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309840)" was missed by 0:00:01.424364
    WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "d986b9eb-bfd4-400f-b107-5640495eeece (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310025)" was missed by 0:00:01.425144
    WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "3589154e-a8e3-441d-b3cb-a93fd49e1dfe (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310204)" was missed by 0:00:01.425600
    WARNING 2015-07-29 20:59:50,736 scheduler.py:496 - Run time of job "04a7f393-800b-4728-95be-28c2ca091ade (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310380)" was missed by 0:00:01.425769
    WARNING 2015-07-29 20:59:50,737 scheduler.py:496 - Run time of job "f0e2a065-af36-476c-b6b9-b662471c3f22 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310759)" was missed by 0:00:01.426607
    WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "76accffd-e390-4aaa-8b35-8219ef4b3057 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.311118)" was missed by 0:00:01.427039
    WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "e0ce4088-2f0c-4f6d-8642-26ba94b3c66a (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311297)" was missed by 0:00:01.426953
    WARNING 2015-07-29 20:59:50,739 scheduler.py:496 - Run time of job "9cb39eb2-8ce4-408e-8030-a36362d5b5af (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311501)" was missed by 0:00:01.427677
    WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "c299b3ab-ced6-4423-8f39-e16427157d98 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.312033)" was missed by 0:00:01.427972
    WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "cd444594-7859-482d-ae04-348ee7653da2 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312208)" was missed by 0:00:01.428285
    WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "9afd8b3e-8850-4f2d-9ce7-a130be6b933b (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312385)" was missed by 0:00:01.428689
    WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "be140827-a21f-4782-a109-bde8bcbc35c2 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312574)" was missed by 0:00:01.429298
    WARNING 2015-07-29 20:59:50,742 scheduler.py:496 - Run time of job "e009b685-717f-4552-8dfb-35a4d9d3d658 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312751)" was missed by 0:00:01.429906
    WARNING 2015-07-29 20:59:50,743 scheduler.py:496 - Run time of job "f42e635f-ce2d-47b6-8da3-10c7bfef7c3c (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312927)" was missed by 0:00:01.430541
    WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "ace91b40-28e2-472a-ac97-8b01dc3bd976 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.313280)" was missed by 0:00:01.430793
    WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "77ea324a-a836-4f32-a751-1a596417bc11 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.313461)" was missed by 0:00:01.431357
    WARNING 2015-07-29 20:59:50,745 scheduler.py:496 - Run time of job "e74f63b0-4143-4ebb-9adc-8e124eae1f99 (trigger: interval[0:02:00], next run at: 2015-07-29 20:59:49.313642)" was missed by 0:00:01.431588
    WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "3640c1eb-e7a2-4783-9480-e7f2129a4093 (trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.313817)" was missed by 0:00:01.432356
    WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "5b1fb2e8-8488-429b-9310-ca882b775c25 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314182)" was missed by 0:00:01.432292
    WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "509bb649-e065-492a-a258-9a8e48e5d79c (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314359)" was missed by 0:00:01.432485
    WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "211e7885-368e-415d-8875-a5abb66071c3 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314546)" was missed by 0:00:01.432553
    WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "239e8d13-1f31-4b2d-ac6f-b66294700814 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314722)" was missed by 0:00:01.432682
    WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "bc300bfc-7f4f-4015-84a6-4bfe761f4167 (trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.314897)" was missed by 0:00:01.432882
    WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "0e800a78-48fa-4738-8bab-dc0b57ecc6fa (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315072)" was missed by 0:00:01.433000
    WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "19190cfd-d9b4-4869-81ec-0bdce227540e (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315246)" was missed by 0:00:01.433040
    WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "7f102c1d-3e4e-4b46-b89d-f6df4c231591 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315782)" was missed by 0:00:01.432642
    WARNING 2015-07-29 20:59:50,749 scheduler.py:496 - Run time of job "8ef15a08-698b-429f-8925-4d6e5c49c01d (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315959)" was missed by 0:00:01.433006
    

The setting can be exposed in
[AlertSchedulerHandler.py](https://github.com/apache/ambari/blob/trunk/ambari-
agent/src/main/python/ambari_agent/AlertSchedulerHandler.py#L46) by adding
`misfire_grace_time`:

    
    
    
      APS_CONFIG = { 
        'threadpool.core_threads': 3,
        'coalesce': True,
        'standalone': False,
        'misfire_grace_time': 5
      }
    

  * Expose the ability to set this grace period via the agent's configuration file
  * Increase the default amount from 1 second to 5 seconds





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)