You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@cloudstack.apache.org by Koushik Das <ko...@citrix.com> on 2013/10/01 07:46:45 UTC

Re: [PROPOSAL] Service monitoring tool in virtual router

This is a very useful feature. Can this be extended to the other system VMs? SSVM and CPVM

Based on the discussion I see that there is an assumption that restarting services/rebooting should fix the issues. Is that always true? What if the service fails to restart after repeated attempts? What is the fallback?

-Koushik


On 01-Oct-2013, at 3:15 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> Good idea. If x and y and z are borked, initiate shutdown?
> 
> More generically, it seems we need some form of in-VM automation that can
> co-ordinate with top-level orchestration
> 
> On 9/28/13 4:14 AM, "Daan Hoogland" <da...@gmail.com> wrote:
> 
>> Even when always restarting on every glitch we need to monitor the inside
>> of the vr to know when to restart/respin a new vr. There is much
>> functionality present on the vr an for us it is not possible to say for
>> sure what is important to a customer installation so the admin should be
>> able to define the minimal reqs that will stop us from spinning up a new
>> vr. And there must be tools present for monitoring these reqs.
>> 
>> makes sense?
>> 
>> 
>> On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <da...@gnsa.us> wrote:
>> 
>>> For what it's worth we created an ACS-specific MIB (beneath the
>>> org.apache MIB) so really this is just a matter of defining and
>>> publishing it.
>>> 
>>> But lets think about monit being used to restart services - with HA,
>>> Redundant VR, are we sure that we want to inject yet another point of
>>> control into things? Is it better to just respawn an instance since
>>> they are essentially stateless? I don't know, but management server,
>>> local daemons, and other SysVMs making decisions seems like we are
>>> increasing complexity.
>>> 
>>> --David
>>> 
>>> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
>>> <Ch...@citrix.com> wrote:
>>>> In this case you would have to invent another enterprise MIB. Not too
>>>> hard, but I'd argue that it needs to be proxied through some other
>>> service
>>>> anyway and it represents a different integration point with ACS.
>>> Depends
>>>> on whether you consider the system vm part of the ACS deployment, or
>>> an
>>>> entity like a host.
>>>> 
>>>> On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:
>>>> 
>>>>> Using SNMP for alert notification is not a bad idea though.  I don't
>>> see
>>>>> why we can't do that instead of posting to the management server.
>>> This
>>>>> is specifically referring to the second part of the proposal.  Why
>>>>> reinvent that part of it?
>>>>> 
>>>>> --Alex
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>>>>>> Sent: Wednesday, September 25, 2013 10:28 PM
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>>> 
>>>>>> SNMP wouldn't restart a failed process nor would it generate
>>> alerts. It
>>>>>> is
>>>>>> simply too generic for the requirements outlined here. The proposal
>>> does
>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>> the AGPL.
>>>>>> I think the idea is to have a tight monitoring loop that scales: so
>>>>>> executing the
>>>>>> monitoring loop in-situ makes sense.
>>>>>> 
>>>>>> 
>>>>>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>>>>>> 
>>>>>>> On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>>>>>>> <ja...@citrix.com> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Currently in virtual router there is no way to recover and
>>> notify if
>>>>>>>> some service goes down unexpectedly.
>>>>>>>> 
>>>>>>>> This feature is about monitoring all the services rendered by the
>>>>>>>> virtual router, ensure that the services are running through the
>>> life
>>>>>>>> time of the VR.
>>>>>>>> 
>>>>>>>> On service failure:
>>>>>>>> 1. Generate an alert and event indicating failure 2. Restart the
>>>>>>>> service
>>>>>>>> 
>>>>>>>> Services to be monitored:
>>>>>>>> DHCP, DNS, haproxy, password server etc.
>>>>>>>> 
>>>>>>>> As part of monitoring there are two activities
>>>>>>>> 
>>>>>>>> 1. One is monitoring the services in VR and log the events. Using
>>>>>>>> monit for monitoring services  2. Second part is pushing alerts
>>> from
>>>>>>>> router to  MS server. Thinking on POST the logs to web server in
>>> MS.
>>>>>>>> 
>>>>>>>> I will be updating more details and FS in this thread.
>>>>>>>> 
>>>>>>>> I created enhancement bug for this.
>>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jayapal
>>>>>>> 
>>>>>>> So several things - why not make this via SNMP? Query processes,
>>> and
>>>>>>> many other things. This should be relatively simple, is well known,
>>> can
>>>>>>> be locked down (or could be monitored for many other things by
>>> external
>>>>>>> monitoring packages) and is the defacto standard for monitoring
>>> hosts.
>>>>>>> Second - monit is Affero GPL licensed - which is a cat-x license.
>>>>>>> While I expect that we would merely use this and not do any
>>> hacking on
>>>>>>> it - I think its inclusion might be a surprise (and forbidden in
>>> many
>>>>>>> environments) to our users
>>>>>>> 
>>>>>>> --David
>>>>> 
>>>> 
>>> 
>

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.

The current scope is limited to VR.
If a service fails to restart after certain cycles then monit will timeout, log the event. In this case admin has to interfere, solve the issue in the service and add it to monit again.

Thanks,
Jayapal

On 01-Oct-2013, at 11:16 AM, Koushik Das <ko...@citrix.com>
 wrote:

> This is a very useful feature. Can this be extended to the other system VMs? SSVM and CPVM
> 
> Based on the discussion I see that there is an assumption that restarting services/rebooting should fix the issues. Is that always true? What if the service fails to restart after repeated attempts? What is the fallback?
> 
> -Koushik
> 
> 
> On 01-Oct-2013, at 3:15 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
> 
>> Good idea. If x and y and z are borked, initiate shutdown?
>> 
>> More generically, it seems we need some form of in-VM automation that can
>> co-ordinate with top-level orchestration
>> 
>> On 9/28/13 4:14 AM, "Daan Hoogland" <da...@gmail.com> wrote:
>> 
>>> Even when always restarting on every glitch we need to monitor the inside
>>> of the vr to know when to restart/respin a new vr. There is much
>>> functionality present on the vr an for us it is not possible to say for
>>> sure what is important to a customer installation so the admin should be
>>> able to define the minimal reqs that will stop us from spinning up a new
>>> vr. And there must be tools present for monitoring these reqs.
>>> 
>>> makes sense?
>>> 
>>> 
>>> On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <da...@gnsa.us> wrote:
>>> 
>>>> For what it's worth we created an ACS-specific MIB (beneath the
>>>> org.apache MIB) so really this is just a matter of defining and
>>>> publishing it.
>>>> 
>>>> But lets think about monit being used to restart services - with HA,
>>>> Redundant VR, are we sure that we want to inject yet another point of
>>>> control into things? Is it better to just respawn an instance since
>>>> they are essentially stateless? I don't know, but management server,
>>>> local daemons, and other SysVMs making decisions seems like we are
>>>> increasing complexity.
>>>> 
>>>> --David
>>>> 
>>>> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
>>>> <Ch...@citrix.com> wrote:
>>>>> In this case you would have to invent another enterprise MIB. Not too
>>>>> hard, but I'd argue that it needs to be proxied through some other
>>>> service
>>>>> anyway and it represents a different integration point with ACS.
>>>> Depends
>>>>> on whether you consider the system vm part of the ACS deployment, or
>>>> an
>>>>> entity like a host.
>>>>> 
>>>>> On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:
>>>>> 
>>>>>> Using SNMP for alert notification is not a bad idea though.  I don't
>>>> see
>>>>>> why we can't do that instead of posting to the management server.
>>>> This
>>>>>> is specifically referring to the second part of the proposal.  Why
>>>>>> reinvent that part of it?
>>>>>> 
>>>>>> --Alex
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>>>>>>> Sent: Wednesday, September 25, 2013 10:28 PM
>>>>>>> To: dev@cloudstack.apache.org
>>>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>>>> 
>>>>>>> SNMP wouldn't restart a failed process nor would it generate
>>>> alerts. It
>>>>>>> is
>>>>>>> simply too generic for the requirements outlined here. The proposal
>>>> does
>>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>>> the AGPL.
>>>>>>> I think the idea is to have a tight monitoring loop that scales: so
>>>>>>> executing the
>>>>>>> monitoring loop in-situ makes sense.
>>>>>>> 
>>>>>>> 
>>>>>>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>>>>>>> 
>>>>>>>> On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>>>>>>>> <ja...@citrix.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Currently in virtual router there is no way to recover and
>>>> notify if
>>>>>>>>> some service goes down unexpectedly.
>>>>>>>>> 
>>>>>>>>> This feature is about monitoring all the services rendered by the
>>>>>>>>> virtual router, ensure that the services are running through the
>>>> life
>>>>>>>>> time of the VR.
>>>>>>>>> 
>>>>>>>>> On service failure:
>>>>>>>>> 1. Generate an alert and event indicating failure 2. Restart the
>>>>>>>>> service
>>>>>>>>> 
>>>>>>>>> Services to be monitored:
>>>>>>>>> DHCP, DNS, haproxy, password server etc.
>>>>>>>>> 
>>>>>>>>> As part of monitoring there are two activities
>>>>>>>>> 
>>>>>>>>> 1. One is monitoring the services in VR and log the events. Using
>>>>>>>>> monit for monitoring services  2. Second part is pushing alerts
>>>> from
>>>>>>>>> router to  MS server. Thinking on POST the logs to web server in
>>>> MS.
>>>>>>>>> 
>>>>>>>>> I will be updating more details and FS in this thread.
>>>>>>>>> 
>>>>>>>>> I created enhancement bug for this.
>>>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Jayapal
>>>>>>>> 
>>>>>>>> So several things - why not make this via SNMP? Query processes,
>>>> and
>>>>>>>> many other things. This should be relatively simple, is well known,
>>>> can
>>>>>>>> be locked down (or could be monitored for many other things by
>>>> external
>>>>>>>> monitoring packages) and is the defacto standard for monitoring
>>>> hosts.
>>>>>>>> Second - monit is Affero GPL licensed - which is a cat-x license.
>>>>>>>> While I expect that we would merely use this and not do any
>>>> hacking on
>>>>>>>> it - I think its inclusion might be a surprise (and forbidden in
>>>> many
>>>>>>>> environments) to our users
>>>>>>>> 
>>>>>>>> --David
>>>>>> 
>>>>> 
>>>> 
>> 
>