You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Andrew Onischuk <ao...@hortonworks.com> on 2015/10/15 16:39:36 UTC

Review Request 39339: Expose Alert Grace Period Setting in Agents

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39339/
-----------------------------------------------------------

Review request for Ambari and Nate Cole.


Bugs: AMBARI-13434
    https://issues.apache.org/jira/browse/AMBARI-13434


Repository: ambari


Description
-------

On some deployments, hosts may be required to run many alerts depending on the
number of components installed. If the number of components is large, it's
possible that alert jobs may miss their scheduled intervals. The default grace
period set by APS is 1 second, which is rather aggressive.

    
    
    
    WARNING 2015-07-29 20:59:50,733 scheduler.py:496 - Run time of job "947770c6-424a-4ef8-9a46-19eca8fd080b (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309353)" was missed by 0:00:01.423766
    WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "005b1d50-2aca-4af2-a3b4-bc39e6f65ede (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.309646)" was missed by 0:00:01.424313
    WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "6950ff19-c26c-46b7-8bac-1869773f1380 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309840)" was missed by 0:00:01.424364
    WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "d986b9eb-bfd4-400f-b107-5640495eeece (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310025)" was missed by 0:00:01.425144
    WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "3589154e-a8e3-441d-b3cb-a93fd49e1dfe (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310204)" was missed by 0:00:01.425600
    WARNING 2015-07-29 20:59:50,736 scheduler.py:496 - Run time of job "04a7f393-800b-4728-95be-28c2ca091ade (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310380)" was missed by 0:00:01.425769
    WARNING 2015-07-29 20:59:50,737 scheduler.py:496 - Run time of job "f0e2a065-af36-476c-b6b9-b662471c3f22 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310759)" was missed by 0:00:01.426607
    WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "76accffd-e390-4aaa-8b35-8219ef4b3057 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.311118)" was missed by 0:00:01.427039
    WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "e0ce4088-2f0c-4f6d-8642-26ba94b3c66a (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311297)" was missed by 0:00:01.426953
    WARNING 2015-07-29 20:59:50,739 scheduler.py:496 - Run time of job "9cb39eb2-8ce4-408e-8030-a36362d5b5af (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311501)" was missed by 0:00:01.427677
    WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "c299b3ab-ced6-4423-8f39-e16427157d98 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.312033)" was missed by 0:00:01.427972
    WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "cd444594-7859-482d-ae04-348ee7653da2 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312208)" was missed by 0:00:01.428285
    WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "9afd8b3e-8850-4f2d-9ce7-a130be6b933b (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312385)" was missed by 0:00:01.428689
    WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "be140827-a21f-4782-a109-bde8bcbc35c2 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312574)" was missed by 0:00:01.429298
    WARNING 2015-07-29 20:59:50,742 scheduler.py:496 - Run time of job "e009b685-717f-4552-8dfb-35a4d9d3d658 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312751)" was missed by 0:00:01.429906
    WARNING 2015-07-29 20:59:50,743 scheduler.py:496 - Run time of job "f42e635f-ce2d-47b6-8da3-10c7bfef7c3c (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312927)" was missed by 0:00:01.430541
    WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "ace91b40-28e2-472a-ac97-8b01dc3bd976 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.313280)" was missed by 0:00:01.430793
    WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "77ea324a-a836-4f32-a751-1a596417bc11 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.313461)" was missed by 0:00:01.431357
    WARNING 2015-07-29 20:59:50,745 scheduler.py:496 - Run time of job "e74f63b0-4143-4ebb-9adc-8e124eae1f99 (trigger: interval[0:02:00], next run at: 2015-07-29 20:59:49.313642)" was missed by 0:00:01.431588
    WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "3640c1eb-e7a2-4783-9480-e7f2129a4093 (trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.313817)" was missed by 0:00:01.432356
    WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "5b1fb2e8-8488-429b-9310-ca882b775c25 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314182)" was missed by 0:00:01.432292
    WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "509bb649-e065-492a-a258-9a8e48e5d79c (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314359)" was missed by 0:00:01.432485
    WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "211e7885-368e-415d-8875-a5abb66071c3 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314546)" was missed by 0:00:01.432553
    WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "239e8d13-1f31-4b2d-ac6f-b66294700814 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314722)" was missed by 0:00:01.432682
    WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "bc300bfc-7f4f-4015-84a6-4bfe761f4167 (trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.314897)" was missed by 0:00:01.432882
    WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "0e800a78-48fa-4738-8bab-dc0b57ecc6fa (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315072)" was missed by 0:00:01.433000
    WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "19190cfd-d9b4-4869-81ec-0bdce227540e (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315246)" was missed by 0:00:01.433040
    WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "7f102c1d-3e4e-4b46-b89d-f6df4c231591 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315782)" was missed by 0:00:01.432642
    WARNING 2015-07-29 20:59:50,749 scheduler.py:496 - Run time of job "8ef15a08-698b-429f-8925-4d6e5c49c01d (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315959)" was missed by 0:00:01.433006
    

The setting can be exposed in
[AlertSchedulerHandler.py](https://github.com/apache/ambari/blob/trunk/ambari-
agent/src/main/python/ambari_agent/AlertSchedulerHandler.py#L46) by adding
`misfire_grace_time`:

    
    
    
      APS_CONFIG = { 
        'threadpool.core_threads': 3,
        'coalesce': True,
        'standalone': False,
        'misfire_grace_time': 5
      }
    

  * Expose the ability to set this grace period via the agent's configuration file
  * Increase the default amount from 1 second to 5 seconds


Diffs
-----

  ambari-agent/conf/unix/ambari-agent.ini 3b7631c 
  ambari-agent/conf/windows/ambari-agent.ini 972e11e 
  ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py cddee57 
  ambari-agent/src/main/python/ambari_agent/Controller.py 74a8eac 
  ambari-agent/src/test/python/ambari_agent/TestAlertSchedulerHandler.py d15cd32 
  ambari-agent/src/test/python/ambari_agent/TestAlerts.py dab717d 

Diff: https://reviews.apache.org/r/39339/diff/


Testing
-------

mvn clean test


Thanks,

Andrew Onischuk