You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ambari.apache.org by Andrew Onischuk <ao...@hortonworks.com> on 2015/10/15 16:39:36 UTC
Review Request 39339: Expose Alert Grace Period Setting in Agents
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39339/
-----------------------------------------------------------
Review request for Ambari and Nate Cole.
Bugs: AMBARI-13434
https://issues.apache.org/jira/browse/AMBARI-13434
Repository: ambari
Description
-------
On some deployments, hosts may be required to run many alerts depending on the
number of components installed. If the number of components is large, it's
possible that alert jobs may miss their scheduled intervals. The default grace
period set by APS is 1 second, which is rather aggressive.
WARNING 2015-07-29 20:59:50,733 scheduler.py:496 - Run time of job "947770c6-424a-4ef8-9a46-19eca8fd080b (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309353)" was missed by 0:00:01.423766
WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "005b1d50-2aca-4af2-a3b4-bc39e6f65ede (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.309646)" was missed by 0:00:01.424313
WARNING 2015-07-29 20:59:50,734 scheduler.py:496 - Run time of job "6950ff19-c26c-46b7-8bac-1869773f1380 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.309840)" was missed by 0:00:01.424364
WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "d986b9eb-bfd4-400f-b107-5640495eeece (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310025)" was missed by 0:00:01.425144
WARNING 2015-07-29 20:59:50,735 scheduler.py:496 - Run time of job "3589154e-a8e3-441d-b3cb-a93fd49e1dfe (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310204)" was missed by 0:00:01.425600
WARNING 2015-07-29 20:59:50,736 scheduler.py:496 - Run time of job "04a7f393-800b-4728-95be-28c2ca091ade (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310380)" was missed by 0:00:01.425769
WARNING 2015-07-29 20:59:50,737 scheduler.py:496 - Run time of job "f0e2a065-af36-476c-b6b9-b662471c3f22 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.310759)" was missed by 0:00:01.426607
WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "76accffd-e390-4aaa-8b35-8219ef4b3057 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.311118)" was missed by 0:00:01.427039
WARNING 2015-07-29 20:59:50,738 scheduler.py:496 - Run time of job "e0ce4088-2f0c-4f6d-8642-26ba94b3c66a (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311297)" was missed by 0:00:01.426953
WARNING 2015-07-29 20:59:50,739 scheduler.py:496 - Run time of job "9cb39eb2-8ce4-408e-8030-a36362d5b5af (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.311501)" was missed by 0:00:01.427677
WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "c299b3ab-ced6-4423-8f39-e16427157d98 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.312033)" was missed by 0:00:01.427972
WARNING 2015-07-29 20:59:50,740 scheduler.py:496 - Run time of job "cd444594-7859-482d-ae04-348ee7653da2 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312208)" was missed by 0:00:01.428285
WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "9afd8b3e-8850-4f2d-9ce7-a130be6b933b (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312385)" was missed by 0:00:01.428689
WARNING 2015-07-29 20:59:50,741 scheduler.py:496 - Run time of job "be140827-a21f-4782-a109-bde8bcbc35c2 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312574)" was missed by 0:00:01.429298
WARNING 2015-07-29 20:59:50,742 scheduler.py:496 - Run time of job "e009b685-717f-4552-8dfb-35a4d9d3d658 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312751)" was missed by 0:00:01.429906
WARNING 2015-07-29 20:59:50,743 scheduler.py:496 - Run time of job "f42e635f-ce2d-47b6-8da3-10c7bfef7c3c (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.312927)" was missed by 0:00:01.430541
WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "ace91b40-28e2-472a-ac97-8b01dc3bd976 (trigger: interval[0:01:00], next run at: 2015-07-29 20:59:49.313280)" was missed by 0:00:01.430793
WARNING 2015-07-29 20:59:50,744 scheduler.py:496 - Run time of job "77ea324a-a836-4f32-a751-1a596417bc11 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.313461)" was missed by 0:00:01.431357
WARNING 2015-07-29 20:59:50,745 scheduler.py:496 - Run time of job "e74f63b0-4143-4ebb-9adc-8e124eae1f99 (trigger: interval[0:02:00], next run at: 2015-07-29 20:59:49.313642)" was missed by 0:00:01.431588
WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "3640c1eb-e7a2-4783-9480-e7f2129a4093 (trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.313817)" was missed by 0:00:01.432356
WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "5b1fb2e8-8488-429b-9310-ca882b775c25 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314182)" was missed by 0:00:01.432292
WARNING 2015-07-29 20:59:50,746 scheduler.py:496 - Run time of job "509bb649-e065-492a-a258-9a8e48e5d79c (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314359)" was missed by 0:00:01.432485
WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "211e7885-368e-415d-8875-a5abb66071c3 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314546)" was missed by 0:00:01.432553
WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "239e8d13-1f31-4b2d-ac6f-b66294700814 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.314722)" was missed by 0:00:01.432682
WARNING 2015-07-29 20:59:50,747 scheduler.py:496 - Run time of job "bc300bfc-7f4f-4015-84a6-4bfe761f4167 (trigger: interval[0:02:00], next run at: 2015-07-29 21:01:49.314897)" was missed by 0:00:01.432882
WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "0e800a78-48fa-4738-8bab-dc0b57ecc6fa (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315072)" was missed by 0:00:01.433000
WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "19190cfd-d9b4-4869-81ec-0bdce227540e (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315246)" was missed by 0:00:01.433040
WARNING 2015-07-29 20:59:50,748 scheduler.py:496 - Run time of job "7f102c1d-3e4e-4b46-b89d-f6df4c231591 (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315782)" was missed by 0:00:01.432642
WARNING 2015-07-29 20:59:50,749 scheduler.py:496 - Run time of job "8ef15a08-698b-429f-8925-4d6e5c49c01d (trigger: interval[0:01:00], next run at: 2015-07-29 21:00:49.315959)" was missed by 0:00:01.433006
The setting can be exposed in
[AlertSchedulerHandler.py](https://github.com/apache/ambari/blob/trunk/ambari-
agent/src/main/python/ambari_agent/AlertSchedulerHandler.py#L46) by adding
`misfire_grace_time`:
APS_CONFIG = {
'threadpool.core_threads': 3,
'coalesce': True,
'standalone': False,
'misfire_grace_time': 5
}
* Expose the ability to set this grace period via the agent's configuration file
* Increase the default amount from 1 second to 5 seconds
Diffs
-----
ambari-agent/conf/unix/ambari-agent.ini 3b7631c
ambari-agent/conf/windows/ambari-agent.ini 972e11e
ambari-agent/src/main/python/ambari_agent/AlertSchedulerHandler.py cddee57
ambari-agent/src/main/python/ambari_agent/Controller.py 74a8eac
ambari-agent/src/test/python/ambari_agent/TestAlertSchedulerHandler.py d15cd32
ambari-agent/src/test/python/ambari_agent/TestAlerts.py dab717d
Diff: https://reviews.apache.org/r/39339/diff/
Testing
-------
mvn clean test
Thanks,
Andrew Onischuk