You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cloudstack.apache.org by Jayapal Reddy Uradi <ja...@citrix.com> on 2013/09/25 18:30:33 UTC

[PROPOSAL] Service monitoring tool in virtual router

Hi,

Currently in virtual router there is no way to recover and notify if some service goes down unexpectedly.

This feature is about monitoring all the services rendered by the virtual router, ensure that the services are running through the life time of the VR.

On service failure:
1. Generate an alert and event indicating failure
2. Restart the service

Services to be monitored:
DHCP, DNS, haproxy, password server etc.

As part of monitoring there are two activities

1. One is monitoring the services in VR and log the events. Using monit for monitoring services
2. Second part is pushing alerts from router to  MS server. Thinking on POST the logs to web server in MS.

I will be updating more details and FS in this thread.

I created enhancement bug for this.
https://issues.apache.org/jira/browse/CLOUDSTACK-4736

Thanks,
Jayapal

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Chip Childers <ch...@apache.org>.
On Wed, Sep 25, 2013 at 04:30:33PM +0000, Jayapal Reddy Uradi wrote:
> Hi,
> 
> Currently in virtual router there is no way to recover and notify if some service goes down unexpectedly.
> 
> This feature is about monitoring all the services rendered by the virtual router, ensure that the services are running through the life time of the VR.
> 
> On service failure:
> 1. Generate an alert and event indicating failure
> 2. Restart the service
> 
> Services to be monitored:
> DHCP, DNS, haproxy, password server etc.
> 
> As part of monitoring there are two activities
> 
> 1. One is monitoring the services in VR and log the events. Using monit for monitoring services
> 2. Second part is pushing alerts from router to  MS server. Thinking on POST the logs to web server in MS.
> 
> I will be updating more details and FS in this thread.
> 
> I created enhancement bug for this.
> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
> 
> Thanks,
> Jayapal

Generally sounds like a very good idea Jayapal!  Looking forward to
seeing the FS.

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Daan Hoogland <da...@gmail.com>.
this is not a counter proposal to your initiative, Jayapal. It sounds great.
I just want to give cloud operators and domain admins more control on how
to react on mishaps on the systemvms.


On Fri, Oct 4, 2013 at 11:17 AM, Daan Hoogland <da...@gmail.com>wrote:

> Could we not use the native syslog to gather the info (process monitoring
> will still be needed) and present an admin interface on this on the ms?
>
>
> On Fri, Oct 4, 2013 at 10:59 AM, Jayapal Reddy Uradi <
> jayapalreddy.uradi@citrix.com> wrote:
>
>> Hi,
>>
>> I am planning to write script utility to monitor processes and restart on
>> the event of failure. It will also logs the events.
>>
>> Thanks,
>> Jayapal
>>
>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>
>> > supervisord maybe?
>> >
>> > ----- Original Message -----
>> >
>> > From: "Chiradeep Vittal" <Ch...@citrix.com>
>> > To: dev@cloudstack.apache.org
>> > Sent: Tuesday, October 1, 2013 4:45:56 PM
>> > Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> >
>> > Got it. Any other OSS tool out there similar to monit?
>> >
>> > On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>> >
>> >> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>> >> <Ch...@citrix.com> wrote:
>> >>> SNMP wouldn't restart a failed process nor would it generate alerts.
>> It
>> >>> is
>> >>> simply too generic for the requirements outlined here. The proposal
>> does
>> >>> not talk about modifying monit, just using it. That wouldn't trigger
>> the
>> >>> AGPL.
>> >>
>> >> Let me restate my objection to anything AGPL.
>> >> People are largely comfortable with GPLv2 software - Linux is
>> >> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>> >> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>> >> license is anathema in many corporate environments, and by forcing it
>> >> on folks in the default System VM I fear it will hurt adoption of
>> >> CloudStack.
>> >>
>> >> --David
>> >
>> >
>>
>>
>

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Daan Hoogland <da...@gmail.com>.
Could we not use the native syslog to gather the info (process monitoring
will still be needed) and present an admin interface on this on the ms?


On Fri, Oct 4, 2013 at 10:59 AM, Jayapal Reddy Uradi <
jayapalreddy.uradi@citrix.com> wrote:

> Hi,
>
> I am planning to write script utility to monitor processes and restart on
> the event of failure. It will also logs the events.
>
> Thanks,
> Jayapal
>
> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>
> > supervisord maybe?
> >
> > ----- Original Message -----
> >
> > From: "Chiradeep Vittal" <Ch...@citrix.com>
> > To: dev@cloudstack.apache.org
> > Sent: Tuesday, October 1, 2013 4:45:56 PM
> > Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
> >
> > Got it. Any other OSS tool out there similar to monit?
> >
> > On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
> >
> >> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
> >> <Ch...@citrix.com> wrote:
> >>> SNMP wouldn't restart a failed process nor would it generate alerts. It
> >>> is
> >>> simply too generic for the requirements outlined here. The proposal
> does
> >>> not talk about modifying monit, just using it. That wouldn't trigger
> the
> >>> AGPL.
> >>
> >> Let me restate my objection to anything AGPL.
> >> People are largely comfortable with GPLv2 software - Linux is
> >> ubiquitous. Many legal departments routinely prohibit GPLv3 software
> >> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
> >> license is anathema in many corporate environments, and by forcing it
> >> on folks in the default System VM I fear it will hurt adoption of
> >> CloudStack.
> >>
> >> --David
> >
> >
>
>

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.
Hi Sanjeev,

Thanks your comments.
Please find my comments inline. 
Also update the FS.

Thanks,
Jayapal

On 07-Nov-2013, at 11:55 AM, Sanjeev Neelarapu <sa...@citrix.com>
 wrote:

> Jayapal,
> 
> I have gone through the FS posted @ https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services
> 
> Following are the few review comments:
> 
> 1.	First line in the Introduction section says "Virtual router has running services which needs to run always until cloudsack disable it." What is the meaning of disable by cloudstack ? If cloudstack disables few services how the monitoring tool differentiate whether the service is disabled by cloudstack admin or its due to some failure?
It means the services should run until cloudstack instruct  to stop.
The service disable/enable happens with network offering. on VR boot and monitor configuration get updated with new services. There are default services also.
> 2.	Is monitoring VR services is optional or will be monitored always? Any ways to set whether to enable this feature or not?
Currently it is not configurable.By default monitoring default services like sshd, web server.
> 3.	Is service monitoring frequency configurable? If yes how do we configure? FS says the default value is 5 secs.
No.
> 4.	FS says monitoring VR services has two tasks.
> 1.	monitoring services in VR
> 2.	sending alerts from router to external receivers
> What external receivers we will be supporting? Also please specify what all the ways the monitoring tool indicates the failure? Are we going to use exiting Cloudstack Alerts and Events framework to indicate the failure?
This item will be updated once finalised about sending alerts from VR.
> 5.	If multiple instances of the same processes are running do we monitor all the instances of the same process?
It monitors the parent service, which has its pid in pid file. 
> 6.	After how many restarts the monitoring service decides that something is wrong with the process in bringing it up?
five
> 7.	After N no.of restarts if the process is still not running are we going to remove it from the monitoring processes list? If yes how the tools informs the admin that it is not able to restart the process? Or it will be restarting the process forever?
Unmonit process after N number re tries is not there. 
monitor log the service fail. Admin can knows only from the logs. 
For this release sending alerts from VR is not implemented.
> 8.	Is there way for the admin to specify the tool to monitor only particular services?
Currently the services are selected based on network offering and default services from db.
Configuring services from API/UI is not there.
> 9.	Apart from dnsmasq,haproxy,sshd,apache webserver services are we not monitoring the password service(socat)? Socat process is not mentioned in the Monitoring Services section in the FS
Not monitoring socat because socat is automatically restarted by password server
> 10.	Is this supported in RVR case as well?
No.
> 11.	Specify the hypervisors supported for this feature?
xen,kvm and vmware
> 12.	As per my understanding this tool will be part of systemvm.iso. After upgrade from pre 4.3 release to 4.3 iso will be pushed to the hypervisors. So stop, start VR is required for the exiting VRs to get this service. Please confirm.
yes
> 13.	Please specify the expected date for confirming the scope for failure notifications. Scope is not clear from "sending alerts from router" section in FS
> 
> Thanks,
> Sanjeev
> 
> -----Original Message-----
> From: John Kinsella [mailto:jlk@stratosec.co] 
> Sent: Thursday, November 07, 2013 6:26 AM
> To: <de...@cloudstack.apache.org>
> Cc: <us...@cloudstack.apache.org>
> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
> 
> Thx for putting this together, Jayapal. A few comments:
> 
> I'd really like to have a config flag to specify if things should be restarted automatically or not. Worst case, track the restarts - if a service is restarted more than X times in Y seconds, something's obviously wrong so stop tail-chasing[1]. Personally I'm much more interested in knowing there's a problem and then taking whatever happens to be the appropriate actions for our situation.
> 
> Regarding communicating with a monitoring system - what makes more sense to me is setting up a solid framework that provides folks flexibility to use various monitoring tools, from sending an email to contacting pager duty or whatever.
> 
> So, to me there's 3 parts to that:
> 1) At VR creation, ACS calls defined hook-script which knows how to contact monitoring system to tell it about system to monitor
> 2) At boot, VR sends API query to which the mgmt server responds with a URL for an install script - VR runs that to download/setup appropriate monitoring agent
> 3) VR has standardized scripts for agent to call to find out what should be running, and then agent can go check for itself.
> 
> With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module (I'm thinking module is hosted outside ACS, but I guess it could be a plugin - see earlier licensing points).
> 
> Thoughts?
> 
> Just my 2c. Happy to tweak wiki if folks lean towards this.
> 
> John
> 1: Aside - this applies to SSVM creation currently - that hamster[2] keeps trying to spin that create SSVM wheel..
> 2: Apache CloudHamster, CloudMonkey's furry monitoring friend?
> 
> On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <ja...@citrix.com> wrote:
> 
>> Please find below update FS
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+s
>> ervices
>> 
>> Thanks,
>> Jayapal
>> 
>> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <sa...@citrix.com> wrote:
>> 
>>> A shell script can be used. Few thoughts below:
>>> 
>>> 1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.
>>> 
>>> Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 
>>> 
>>> With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 
>>> 
>>> 
>>> 2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 
>>> 
>>> Also, there could be many other approaches as well.
>>> 
>>> 
>>> Thanks!
>>> Santhosh
>>> ________________________________________
>>> From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
>>> Sent: Saturday, October 05, 2013 5:17 AM
>>> To: <de...@cloudstack.apache.org>
>>> Cc: <us...@cloudstack.apache.org>
>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>> 
>>> Hi,
>>> 
>>> +users list
>>> If any one is already using any tools for monitoring then please share your ideas.
>>> Also share the cases where you experienced service crashes.
>>> 
>>> Thanks,
>>> Jayapal
>>> 
>>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
>>> 
>>>> Well just make sure that your script is resilient to its own crashes 
>>>> as well.
>>>> 
>>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" 
>>>> <ja...@citrix.com>
>>>> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> I am planning to write script utility to monitor processes and 
>>>>> restart on the event of failure. It will also logs the events.
>>>>> 
>>>>> Thanks,
>>>>> Jayapal
>>>>> 
>>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>>>> 
>>>>>> supervisord maybe?
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>> 
>>>>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>>> 
>>>>>> Got it. Any other OSS tool out there similar to monit?
>>>>>> 
>>>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>>>> 
>>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal 
>>>>>>> <Ch...@citrix.com> wrote:
>>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>>>> It
>>>>>>>> is
>>>>>>>> simply too generic for the requirements outlined here. The 
>>>>>>>> proposal does not talk about modifying monit, just using it. 
>>>>>>>> That wouldn't trigger the AGPL.
>>>>>>> 
>>>>>>> Let me restate my objection to anything AGPL.
>>>>>>> People are largely comfortable with GPLv2 software - Linux is 
>>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 
>>>>>>> software (we actually saw this when CS was GPLv3 licensed.) But 
>>>>>>> the Affero GPL license is anathema in many corporate 
>>>>>>> environments, and by forcing it on folks in the default System VM 
>>>>>>> I fear it will hurt adoption of CloudStack.
>>>>>>> 
>>>>>>> --David
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


RE: [PROPOSAL] Service monitoring tool in virtual router

Posted by Sanjeev Neelarapu <sa...@citrix.com>.
Jayapal,

I have gone through the FS posted @ https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services

Following are the few review comments:

1.	First line in the Introduction section says "Virtual router has running services which needs to run always until cloudsack disable it." What is the meaning of disable by cloudstack ? If cloudstack disables few services how the monitoring tool differentiate whether the service is disabled by cloudstack admin or its due to some failure?
2.	Is monitoring VR services is optional or will be monitored always? Any ways to set whether to enable this feature or not?
3.	Is service monitoring frequency configurable? If yes how do we configure? FS says the default value is 5 secs.
4.	FS says monitoring VR services has two tasks.
1.	monitoring services in VR
2.	sending alerts from router to external receivers
What external receivers we will be supporting? Also please specify what all the ways the monitoring tool indicates the failure? Are we going to use exiting Cloudstack Alerts and Events framework to indicate the failure?
5.	If multiple instances of the same processes are running do we monitor all the instances of the same process?
6.	After how many restarts the monitoring service decides that something is wrong with the process in bringing it up?
7.	After N no.of restarts if the process is still not running are we going to remove it from the monitoring processes list? If yes how the tools informs the admin that it is not able to restart the process? Or it will be restarting the process forever?
8.	Is there way for the admin to specify the tool to monitor only particular services?
9.	Apart from dnsmasq,haproxy,sshd,apache webserver services are we not monitoring the password service(socat)? Socat process is not mentioned in the Monitoring Services section in the FS
10.	Is this supported in RVR case as well?
11.	Specify the hypervisors supported for this feature?
12.	As per my understanding this tool will be part of systemvm.iso. After upgrade from pre 4.3 release to 4.3 iso will be pushed to the hypervisors. So stop, start VR is required for the exiting VRs to get this service. Please confirm.
13.	Please specify the expected date for confirming the scope for failure notifications. Scope is not clear from "sending alerts from router" section in FS

Thanks,
Sanjeev

-----Original Message-----
From: John Kinsella [mailto:jlk@stratosec.co] 
Sent: Thursday, November 07, 2013 6:26 AM
To: <de...@cloudstack.apache.org>
Cc: <us...@cloudstack.apache.org>
Subject: Re: [PROPOSAL] Service monitoring tool in virtual router

Thx for putting this together, Jayapal. A few comments:

I'd really like to have a config flag to specify if things should be restarted automatically or not. Worst case, track the restarts - if a service is restarted more than X times in Y seconds, something's obviously wrong so stop tail-chasing[1]. Personally I'm much more interested in knowing there's a problem and then taking whatever happens to be the appropriate actions for our situation.

Regarding communicating with a monitoring system - what makes more sense to me is setting up a solid framework that provides folks flexibility to use various monitoring tools, from sending an email to contacting pager duty or whatever.

So, to me there's 3 parts to that:
1) At VR creation, ACS calls defined hook-script which knows how to contact monitoring system to tell it about system to monitor
2) At boot, VR sends API query to which the mgmt server responds with a URL for an install script - VR runs that to download/setup appropriate monitoring agent
3) VR has standardized scripts for agent to call to find out what should be running, and then agent can go check for itself.

With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module (I'm thinking module is hosted outside ACS, but I guess it could be a plugin - see earlier licensing points).

Thoughts?

Just my 2c. Happy to tweak wiki if folks lean towards this.

John
1: Aside - this applies to SSVM creation currently - that hamster[2] keeps trying to spin that create SSVM wheel..
2: Apache CloudHamster, CloudMonkey's furry monitoring friend?

On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <ja...@citrix.com> wrote:

> Please find below update FS
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+s
> ervices
> 
> Thanks,
> Jayapal
> 
> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <sa...@citrix.com> wrote:
> 
>> A shell script can be used. Few thoughts below:
>> 
>> 1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.
>> 
>> Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 
>> 
>> With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 
>> 
>> 
>> 2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 
>> 
>> Also, there could be many other approaches as well.
>> 
>> 
>> Thanks!
>> Santhosh
>> ________________________________________
>> From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
>> Sent: Saturday, October 05, 2013 5:17 AM
>> To: <de...@cloudstack.apache.org>
>> Cc: <us...@cloudstack.apache.org>
>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> 
>> Hi,
>> 
>> +users list
>> If any one is already using any tools for monitoring then please share your ideas.
>> Also share the cases where you experienced service crashes.
>> 
>> Thanks,
>> Jayapal
>> 
>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
>> 
>>> Well just make sure that your script is resilient to its own crashes 
>>> as well.
>>> 
>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" 
>>> <ja...@citrix.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am planning to write script utility to monitor processes and 
>>>> restart on the event of failure. It will also logs the events.
>>>> 
>>>> Thanks,
>>>> Jayapal
>>>> 
>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>>> 
>>>>> supervisord maybe?
>>>>> 
>>>>> ----- Original Message -----
>>>>> 
>>>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>>>> To: dev@cloudstack.apache.org
>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>> 
>>>>> Got it. Any other OSS tool out there similar to monit?
>>>>> 
>>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>>> 
>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal 
>>>>>> <Ch...@citrix.com> wrote:
>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>>> It
>>>>>>> is
>>>>>>> simply too generic for the requirements outlined here. The 
>>>>>>> proposal does not talk about modifying monit, just using it. 
>>>>>>> That wouldn't trigger the AGPL.
>>>>>> 
>>>>>> Let me restate my objection to anything AGPL.
>>>>>> People are largely comfortable with GPLv2 software - Linux is 
>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 
>>>>>> software (we actually saw this when CS was GPLv3 licensed.) But 
>>>>>> the Affero GPL license is anathema in many corporate 
>>>>>> environments, and by forcing it on folks in the default System VM 
>>>>>> I fear it will hurt adoption of CloudStack.
>>>>>> 
>>>>>> --David
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by John Kinsella <jl...@stratosec.co>.
Thx for putting this together, Jayapal. A few comments:

I'd really like to have a config flag to specify if things should be restarted automatically or not. Worst case, track the restarts - if a service is restarted more than X times in Y seconds, something's obviously wrong so stop tail-chasing[1]. Personally I'm much more interested in knowing there's a problem and then taking whatever happens to be the appropriate actions for our situation.

Regarding communicating with a monitoring system - what makes more sense to me is setting up a solid framework that provides folks flexibility to use various monitoring tools, from sending an email to contacting pager duty or whatever.

So, to me there's 3 parts to that:
1) At VR creation, ACS calls defined hook-script which knows how to contact monitoring system to tell it about system to monitor
2) At boot, VR sends API query to which the mgmt server responds with a URL for an install script - VR runs that to download/setup appropriate monitoring agent
3) VR has standardized scripts for agent to call to find out what should be running, and then agent can go check for itself.

With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module (I'm thinking module is hosted outside ACS, but I guess it could be a plugin - see earlier licensing points).

Thoughts?

Just my 2c. Happy to tweak wiki if folks lean towards this.

John
1: Aside - this applies to SSVM creation currently - that hamster[2] keeps trying to spin that create SSVM wheel..
2: Apache CloudHamster, CloudMonkey's furry monitoring friend?

On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <ja...@citrix.com> wrote:

> Please find below update FS
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services
> 
> Thanks,
> Jayapal
> 
> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <sa...@citrix.com> wrote:
> 
>> A shell script can be used. Few thoughts below:
>> 
>> 1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.
>> 
>> Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 
>> 
>> With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 
>> 
>> 
>> 2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 
>> 
>> Also, there could be many other approaches as well.
>> 
>> 
>> Thanks!
>> Santhosh 
>> ________________________________________
>> From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
>> Sent: Saturday, October 05, 2013 5:17 AM
>> To: <de...@cloudstack.apache.org>
>> Cc: <us...@cloudstack.apache.org>
>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> 
>> Hi,
>> 
>> +users list
>> If any one is already using any tools for monitoring then please share your ideas.
>> Also share the cases where you experienced service crashes.
>> 
>> Thanks,
>> Jayapal
>> 
>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
>> 
>>> Well just make sure that your script is resilient to its own crashes as
>>> well.
>>> 
>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am planning to write script utility to monitor processes and restart on
>>>> the event of failure. It will also logs the events.
>>>> 
>>>> Thanks,
>>>> Jayapal
>>>> 
>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>>> 
>>>>> supervisord maybe?
>>>>> 
>>>>> ----- Original Message -----
>>>>> 
>>>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>>>> To: dev@cloudstack.apache.org
>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>> 
>>>>> Got it. Any other OSS tool out there similar to monit?
>>>>> 
>>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>>> 
>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>>>> <Ch...@citrix.com> wrote:
>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>>> It
>>>>>>> is
>>>>>>> simply too generic for the requirements outlined here. The proposal
>>>>>>> does
>>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>>> the
>>>>>>> AGPL.
>>>>>> 
>>>>>> Let me restate my objection to anything AGPL.
>>>>>> People are largely comfortable with GPLv2 software - Linux is
>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>>>> license is anathema in many corporate environments, and by forcing it
>>>>>> on folks in the default System VM I fear it will hurt adoption of
>>>>>> CloudStack.
>>>>>> 
>>>>>> --David
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by John Kinsella <jl...@stratosec.co>.
Thx for putting this together, Jayapal. A few comments:

I'd really like to have a config flag to specify if things should be restarted automatically or not. Worst case, track the restarts - if a service is restarted more than X times in Y seconds, something's obviously wrong so stop tail-chasing[1]. Personally I'm much more interested in knowing there's a problem and then taking whatever happens to be the appropriate actions for our situation.

Regarding communicating with a monitoring system - what makes more sense to me is setting up a solid framework that provides folks flexibility to use various monitoring tools, from sending an email to contacting pager duty or whatever.

So, to me there's 3 parts to that:
1) At VR creation, ACS calls defined hook-script which knows how to contact monitoring system to tell it about system to monitor
2) At boot, VR sends API query to which the mgmt server responds with a URL for an install script - VR runs that to download/setup appropriate monitoring agent
3) VR has standardized scripts for agent to call to find out what should be running, and then agent can go check for itself.

With a setup like this, you can support SNMP, Opsview/Nagios, Monit, NSA, Zenoss, HPOV, Tivoli, etc etc etc. I'll happily write the Opsview/Nagios module (I'm thinking module is hosted outside ACS, but I guess it could be a plugin - see earlier licensing points).

Thoughts?

Just my 2c. Happy to tweak wiki if folks lean towards this.

John
1: Aside - this applies to SSVM creation currently - that hamster[2] keeps trying to spin that create SSVM wheel..
2: Apache CloudHamster, CloudMonkey's furry monitoring friend?

On Nov 6, 2013, at 7:58 AM, Jayapal Reddy Uradi <ja...@citrix.com> wrote:

> Please find below update FS
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services
> 
> Thanks,
> Jayapal
> 
> On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <sa...@citrix.com> wrote:
> 
>> A shell script can be used. Few thoughts below:
>> 
>> 1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.
>> 
>> Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 
>> 
>> With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 
>> 
>> 
>> 2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 
>> 
>> Also, there could be many other approaches as well.
>> 
>> 
>> Thanks!
>> Santhosh 
>> ________________________________________
>> From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
>> Sent: Saturday, October 05, 2013 5:17 AM
>> To: <de...@cloudstack.apache.org>
>> Cc: <us...@cloudstack.apache.org>
>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> 
>> Hi,
>> 
>> +users list
>> If any one is already using any tools for monitoring then please share your ideas.
>> Also share the cases where you experienced service crashes.
>> 
>> Thanks,
>> Jayapal
>> 
>> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
>> 
>>> Well just make sure that your script is resilient to its own crashes as
>>> well.
>>> 
>>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> I am planning to write script utility to monitor processes and restart on
>>>> the event of failure. It will also logs the events.
>>>> 
>>>> Thanks,
>>>> Jayapal
>>>> 
>>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>>> 
>>>>> supervisord maybe?
>>>>> 
>>>>> ----- Original Message -----
>>>>> 
>>>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>>>> To: dev@cloudstack.apache.org
>>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>> 
>>>>> Got it. Any other OSS tool out there similar to monit?
>>>>> 
>>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>>> 
>>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>>>> <Ch...@citrix.com> wrote:
>>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>>> It
>>>>>>> is
>>>>>>> simply too generic for the requirements outlined here. The proposal
>>>>>>> does
>>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>>> the
>>>>>>> AGPL.
>>>>>> 
>>>>>> Let me restate my objection to anything AGPL.
>>>>>> People are largely comfortable with GPLv2 software - Linux is
>>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>>>> license is anathema in many corporate environments, and by forcing it
>>>>>> on folks in the default System VM I fear it will hurt adoption of
>>>>>> CloudStack.
>>>>>> 
>>>>>> --David
>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.
Please find below update FS
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services

Thanks,
Jayapal

On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <sa...@citrix.com> wrote:

> A shell script can be used. Few thoughts below:
> 
> 1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.
> 
> Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 
> 
> With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 
> 
> 
> 2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 
> 
> Also, there could be many other approaches as well.
> 
> 
> Thanks!
> Santhosh 
> ________________________________________
> From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
> Sent: Saturday, October 05, 2013 5:17 AM
> To: <de...@cloudstack.apache.org>
> Cc: <us...@cloudstack.apache.org>
> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
> 
> Hi,
> 
> +users list
> If any one is already using any tools for monitoring then please share your ideas.
> Also share the cases where you experienced service crashes.
> 
> Thanks,
> Jayapal
> 
> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
> 
>> Well just make sure that your script is resilient to its own crashes as
>> well.
>> 
>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I am planning to write script utility to monitor processes and restart on
>>> the event of failure. It will also logs the events.
>>> 
>>> Thanks,
>>> Jayapal
>>> 
>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>> 
>>>> supervisord maybe?
>>>> 
>>>> ----- Original Message -----
>>>> 
>>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>>> To: dev@cloudstack.apache.org
>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>> 
>>>> Got it. Any other OSS tool out there similar to monit?
>>>> 
>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>> 
>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>>> <Ch...@citrix.com> wrote:
>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>> It
>>>>>> is
>>>>>> simply too generic for the requirements outlined here. The proposal
>>>>>> does
>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>> the
>>>>>> AGPL.
>>>>> 
>>>>> Let me restate my objection to anything AGPL.
>>>>> People are largely comfortable with GPLv2 software - Linux is
>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>>> license is anathema in many corporate environments, and by forcing it
>>>>> on folks in the default System VM I fear it will hurt adoption of
>>>>> CloudStack.
>>>>> 
>>>>> --David
>>>> 
>>>> 
>>> 
>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.
Please find below update FS
https://cwiki.apache.org/confluence/display/CLOUDSTACK/Monitoring+VR+services

Thanks,
Jayapal

On 05-Oct-2013, at 6:54 PM, Santhosh Edukulla <sa...@citrix.com> wrote:

> A shell script can be used. Few thoughts below:
> 
> 1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.
> 
> Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 
> 
> With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 
> 
> 
> 2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 
> 
> Also, there could be many other approaches as well.
> 
> 
> Thanks!
> Santhosh 
> ________________________________________
> From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
> Sent: Saturday, October 05, 2013 5:17 AM
> To: <de...@cloudstack.apache.org>
> Cc: <us...@cloudstack.apache.org>
> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
> 
> Hi,
> 
> +users list
> If any one is already using any tools for monitoring then please share your ideas.
> Also share the cases where you experienced service crashes.
> 
> Thanks,
> Jayapal
> 
> On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
> 
>> Well just make sure that your script is resilient to its own crashes as
>> well.
>> 
>> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> I am planning to write script utility to monitor processes and restart on
>>> the event of failure. It will also logs the events.
>>> 
>>> Thanks,
>>> Jayapal
>>> 
>>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>> 
>>>> supervisord maybe?
>>>> 
>>>> ----- Original Message -----
>>>> 
>>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>>> To: dev@cloudstack.apache.org
>>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>> 
>>>> Got it. Any other OSS tool out there similar to monit?
>>>> 
>>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>> 
>>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>>> <Ch...@citrix.com> wrote:
>>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>>> It
>>>>>> is
>>>>>> simply too generic for the requirements outlined here. The proposal
>>>>>> does
>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>> the
>>>>>> AGPL.
>>>>> 
>>>>> Let me restate my objection to anything AGPL.
>>>>> People are largely comfortable with GPLv2 software - Linux is
>>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>>> license is anathema in many corporate environments, and by forcing it
>>>>> on folks in the default System VM I fear it will hurt adoption of
>>>>> CloudStack.
>>>>> 
>>>>> --David
>>>> 
>>>> 
>>> 
>> 
> 


RE: [PROPOSAL] Service monitoring tool in virtual router

Posted by Santhosh Edukulla <sa...@citrix.com>.
A shell script can be used. Few thoughts below:

1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.

Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 

With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 


2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 

Also, there could be many other approaches as well.


Thanks!
Santhosh 
________________________________________
From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
Sent: Saturday, October 05, 2013 5:17 AM
To: <de...@cloudstack.apache.org>
Cc: <us...@cloudstack.apache.org>
Subject: Re: [PROPOSAL] Service monitoring tool in virtual router

Hi,

+users list
If any one is already using any tools for monitoring then please share your ideas.
Also share the cases where you experienced service crashes.

Thanks,
Jayapal

On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> Well just make sure that your script is resilient to its own crashes as
> well.
>
> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
> wrote:
>
>> Hi,
>>
>> I am planning to write script utility to monitor processes and restart on
>> the event of failure. It will also logs the events.
>>
>> Thanks,
>> Jayapal
>>
>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>
>>> supervisord maybe?
>>>
>>> ----- Original Message -----
>>>
>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>> To: dev@cloudstack.apache.org
>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>
>>> Got it. Any other OSS tool out there similar to monit?
>>>
>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>
>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>> <Ch...@citrix.com> wrote:
>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>> It
>>>>> is
>>>>> simply too generic for the requirements outlined here. The proposal
>>>>> does
>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>> the
>>>>> AGPL.
>>>>
>>>> Let me restate my objection to anything AGPL.
>>>> People are largely comfortable with GPLv2 software - Linux is
>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>> license is anathema in many corporate environments, and by forcing it
>>>> on folks in the default System VM I fear it will hurt adoption of
>>>> CloudStack.
>>>>
>>>> --David
>>>
>>>
>>
>


RE: [PROPOSAL] Service monitoring tool in virtual router

Posted by Santhosh Edukulla <sa...@citrix.com>.
A shell script can be used. Few thoughts below:

1. Collect the process id of all daemons you wanted to monitor using "pidof" of command and then use "kill" command to check if the pid you got is valid. Using kill we can send a signal 0, then check the status using echo $? . For sending a notification use linux syslog call ( man 3 syslogd) or "logger" command to send to syslog. If wanted to send email then you may also have to look for firewall not allowing outbound smtp port communiation. Even for snmp this holds same( i mean if any blocking through firewall rules ).  Using syslog may be good as it by default exposes various debug log levels through its api call.

Now, to keep the monitor script up always up and runninig. Keep the monitor script run continuosly through cron or at at regular\scheduled intervals. This way even if monitor script goes down, the next xth interval, it is up again. 

With this there is a catch though, we may got multiple pids for a given daemon provided if there are multiple daemons spawned by same\multiple applications, if this scenario is not common then its ok, otherwise we may have to track it differently maintaining state of each spawned daemon and see if it exists. If multiple applications launch the same daemon, you may also wanted to say its application which got killed. EX: A launched httpd, and during its exit logic, it is killing all daemons it launched, then you may wanted to add  A is not available, rather than just http is not available. 


2.  Using  netstat command : Check for available, listening and active ports on local host, provided all the daemons you wanted to monitor are running on "standard" ports or if we know the listening ports of those deamons to be monitored. Again, this script can be added through cron\at to be scheduled to run x units, if it gets killed the next x units after the monitor script is up again. 

Also, there could be many other approaches as well.


Thanks!
Santhosh 
________________________________________
From: Jayapal Reddy Uradi [jayapalreddy.uradi@citrix.com]
Sent: Saturday, October 05, 2013 5:17 AM
To: <de...@cloudstack.apache.org>
Cc: <us...@cloudstack.apache.org>
Subject: Re: [PROPOSAL] Service monitoring tool in virtual router

Hi,

+users list
If any one is already using any tools for monitoring then please share your ideas.
Also share the cases where you experienced service crashes.

Thanks,
Jayapal

On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> Well just make sure that your script is resilient to its own crashes as
> well.
>
> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
> wrote:
>
>> Hi,
>>
>> I am planning to write script utility to monitor processes and restart on
>> the event of failure. It will also logs the events.
>>
>> Thanks,
>> Jayapal
>>
>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>>
>>> supervisord maybe?
>>>
>>> ----- Original Message -----
>>>
>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>> To: dev@cloudstack.apache.org
>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>
>>> Got it. Any other OSS tool out there similar to monit?
>>>
>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>>
>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>> <Ch...@citrix.com> wrote:
>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>> It
>>>>> is
>>>>> simply too generic for the requirements outlined here. The proposal
>>>>> does
>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>> the
>>>>> AGPL.
>>>>
>>>> Let me restate my objection to anything AGPL.
>>>> People are largely comfortable with GPLv2 software - Linux is
>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>> license is anathema in many corporate environments, and by forcing it
>>>> on folks in the default System VM I fear it will hurt adoption of
>>>> CloudStack.
>>>>
>>>> --David
>>>
>>>
>>
>


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.
Hi,

+users list
If any one is already using any tools for monitoring then please share your ideas.
Also share the cases where you experienced service crashes.

Thanks,
Jayapal

On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> Well just make sure that your script is resilient to its own crashes as
> well.
> 
> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
> wrote:
> 
>> Hi,
>> 
>> I am planning to write script utility to monitor processes and restart on
>> the event of failure. It will also logs the events.
>> 
>> Thanks,
>> Jayapal
>> 
>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>> 
>>> supervisord maybe?
>>> 
>>> ----- Original Message -----
>>> 
>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>> To: dev@cloudstack.apache.org
>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>> 
>>> Got it. Any other OSS tool out there similar to monit?
>>> 
>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>> 
>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>> <Ch...@citrix.com> wrote:
>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>> It 
>>>>> is 
>>>>> simply too generic for the requirements outlined here. The proposal
>>>>> does 
>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>> the 
>>>>> AGPL. 
>>>> 
>>>> Let me restate my objection to anything AGPL.
>>>> People are largely comfortable with GPLv2 software - Linux is
>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>> license is anathema in many corporate environments, and by forcing it
>>>> on folks in the default System VM I fear it will hurt adoption of
>>>> CloudStack. 
>>>> 
>>>> --David 
>>> 
>>> 
>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.
Hi,

+users list
If any one is already using any tools for monitoring then please share your ideas.
Also share the cases where you experienced service crashes.

Thanks,
Jayapal

On 05-Oct-2013, at 4:12 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> Well just make sure that your script is resilient to its own crashes as
> well.
> 
> On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
> wrote:
> 
>> Hi,
>> 
>> I am planning to write script utility to monitor processes and restart on
>> the event of failure. It will also logs the events.
>> 
>> Thanks,
>> Jayapal
>> 
>> On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>> 
>>> supervisord maybe?
>>> 
>>> ----- Original Message -----
>>> 
>>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>>> To: dev@cloudstack.apache.org
>>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>> 
>>> Got it. Any other OSS tool out there similar to monit?
>>> 
>>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>>> 
>>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>>> <Ch...@citrix.com> wrote:
>>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>> It 
>>>>> is 
>>>>> simply too generic for the requirements outlined here. The proposal
>>>>> does 
>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>> the 
>>>>> AGPL. 
>>>> 
>>>> Let me restate my objection to anything AGPL.
>>>> People are largely comfortable with GPLv2 software - Linux is
>>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>>> license is anathema in many corporate environments, and by forcing it
>>>> on folks in the default System VM I fear it will hurt adoption of
>>>> CloudStack. 
>>>> 
>>>> --David 
>>> 
>>> 
>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Chiradeep Vittal <Ch...@citrix.com>.
Well just make sure that your script is resilient to its own crashes as
well.

On 10/4/13 1:59 AM, "Jayapal Reddy Uradi" <ja...@citrix.com>
wrote:

>Hi,
>
>I am planning to write script utility to monitor processes and restart on
>the event of failure. It will also logs the events.
>
>Thanks,
>Jayapal
>
>On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:
>
>> supervisord maybe?
>> 
>> ----- Original Message -----
>> 
>> From: "Chiradeep Vittal" <Ch...@citrix.com>
>> To: dev@cloudstack.apache.org
>> Sent: Tuesday, October 1, 2013 4:45:56 PM
>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> 
>> Got it. Any other OSS tool out there similar to monit?
>> 
>> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:
>> 
>>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
>>> <Ch...@citrix.com> wrote:
>>>> SNMP wouldn't restart a failed process nor would it generate alerts.
>>>>It 
>>>> is 
>>>> simply too generic for the requirements outlined here. The proposal
>>>>does 
>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>the 
>>>> AGPL. 
>>> 
>>> Let me restate my objection to anything AGPL.
>>> People are largely comfortable with GPLv2 software - Linux is
>>> ubiquitous. Many legal departments routinely prohibit GPLv3 software
>>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>>> license is anathema in many corporate environments, and by forcing it
>>> on folks in the default System VM I fear it will hurt adoption of
>>> CloudStack. 
>>> 
>>> --David 
>> 
>> 
>


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.
Hi,

I am planning to write script utility to monitor processes and restart on 
the event of failure. It will also logs the events.

Thanks,
Jayapal

On 02-Oct-2013, at 3:25 AM, Simon Weller <sw...@ena.com> wrote:

> supervisord maybe? 
> 
> ----- Original Message -----
> 
> From: "Chiradeep Vittal" <Ch...@citrix.com> 
> To: dev@cloudstack.apache.org 
> Sent: Tuesday, October 1, 2013 4:45:56 PM 
> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router 
> 
> Got it. Any other OSS tool out there similar to monit? 
> 
> On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote: 
> 
>> On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal 
>> <Ch...@citrix.com> wrote: 
>>> SNMP wouldn't restart a failed process nor would it generate alerts. It 
>>> is 
>>> simply too generic for the requirements outlined here. The proposal does 
>>> not talk about modifying monit, just using it. That wouldn't trigger the 
>>> AGPL. 
>> 
>> Let me restate my objection to anything AGPL. 
>> People are largely comfortable with GPLv2 software - Linux is 
>> ubiquitous. Many legal departments routinely prohibit GPLv3 software 
>> (we actually saw this when CS was GPLv3 licensed.) But the Affero GPL 
>> license is anathema in many corporate environments, and by forcing it 
>> on folks in the default System VM I fear it will hurt adoption of 
>> CloudStack. 
>> 
>> --David 
> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Simon Weller <sw...@ena.com>.
supervisord maybe? 

----- Original Message -----

From: "Chiradeep Vittal" <Ch...@citrix.com> 
To: dev@cloudstack.apache.org 
Sent: Tuesday, October 1, 2013 4:45:56 PM 
Subject: Re: [PROPOSAL] Service monitoring tool in virtual router 

Got it. Any other OSS tool out there similar to monit? 

On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote: 

>On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal 
><Ch...@citrix.com> wrote: 
>> SNMP wouldn't restart a failed process nor would it generate alerts. It 
>>is 
>> simply too generic for the requirements outlined here. The proposal does 
>> not talk about modifying monit, just using it. That wouldn't trigger the 
>> AGPL. 
> 
>Let me restate my objection to anything AGPL. 
>People are largely comfortable with GPLv2 software - Linux is 
>ubiquitous. Many legal departments routinely prohibit GPLv3 software 
>(we actually saw this when CS was GPLv3 licensed.) But the Affero GPL 
>license is anathema in many corporate environments, and by forcing it 
>on folks in the default System VM I fear it will hurt adoption of 
>CloudStack. 
> 
>--David 



Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Chiradeep Vittal <Ch...@citrix.com>.
Got it. Any other OSS tool out there similar to monit?

On 10/1/13 8:24 AM, "David Nalley" <da...@gnsa.us> wrote:

>On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
><Ch...@citrix.com> wrote:
>> SNMP wouldn't restart a failed process nor would it generate alerts. It
>>is
>> simply too generic for the requirements outlined here. The proposal does
>> not talk about modifying monit, just using it. That wouldn't trigger the
>> AGPL.
>
>Let me restate my objection to anything AGPL.
>People are largely comfortable with GPLv2 software - Linux is
>ubiquitous. Many legal departments routinely prohibit GPLv3 software
>(we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
>license is anathema in many corporate environments, and by forcing it
>on folks in the default System VM I fear it will hurt adoption of
>CloudStack.
>
>--David


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by David Nalley <da...@gnsa.us>.
On Thu, Sep 26, 2013 at 1:27 AM, Chiradeep Vittal
<Ch...@citrix.com> wrote:
> SNMP wouldn't restart a failed process nor would it generate alerts. It is
> simply too generic for the requirements outlined here. The proposal does
> not talk about modifying monit, just using it. That wouldn't trigger the
> AGPL.

Let me restate my objection to anything AGPL.
People are largely comfortable with GPLv2 software - Linux is
ubiquitous. Many legal departments routinely prohibit GPLv3 software
(we actually saw this when CS was GPLv3 licensed.) But the Affero GPL
license is anathema in many corporate environments, and by forcing it
on folks in the default System VM I fear it will hurt adoption of
CloudStack.

--David

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Jayapal Reddy Uradi <ja...@citrix.com>.
The current scope is limited to VR.
If a service fails to restart after certain cycles then monit will timeout, log the event. In this case admin has to interfere, solve the issue in the service and add it to monit again.

Thanks,
Jayapal

On 01-Oct-2013, at 11:16 AM, Koushik Das <ko...@citrix.com>
 wrote:

> This is a very useful feature. Can this be extended to the other system VMs? SSVM and CPVM
> 
> Based on the discussion I see that there is an assumption that restarting services/rebooting should fix the issues. Is that always true? What if the service fails to restart after repeated attempts? What is the fallback?
> 
> -Koushik
> 
> 
> On 01-Oct-2013, at 3:15 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:
> 
>> Good idea. If x and y and z are borked, initiate shutdown?
>> 
>> More generically, it seems we need some form of in-VM automation that can
>> co-ordinate with top-level orchestration
>> 
>> On 9/28/13 4:14 AM, "Daan Hoogland" <da...@gmail.com> wrote:
>> 
>>> Even when always restarting on every glitch we need to monitor the inside
>>> of the vr to know when to restart/respin a new vr. There is much
>>> functionality present on the vr an for us it is not possible to say for
>>> sure what is important to a customer installation so the admin should be
>>> able to define the minimal reqs that will stop us from spinning up a new
>>> vr. And there must be tools present for monitoring these reqs.
>>> 
>>> makes sense?
>>> 
>>> 
>>> On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <da...@gnsa.us> wrote:
>>> 
>>>> For what it's worth we created an ACS-specific MIB (beneath the
>>>> org.apache MIB) so really this is just a matter of defining and
>>>> publishing it.
>>>> 
>>>> But lets think about monit being used to restart services - with HA,
>>>> Redundant VR, are we sure that we want to inject yet another point of
>>>> control into things? Is it better to just respawn an instance since
>>>> they are essentially stateless? I don't know, but management server,
>>>> local daemons, and other SysVMs making decisions seems like we are
>>>> increasing complexity.
>>>> 
>>>> --David
>>>> 
>>>> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
>>>> <Ch...@citrix.com> wrote:
>>>>> In this case you would have to invent another enterprise MIB. Not too
>>>>> hard, but I'd argue that it needs to be proxied through some other
>>>> service
>>>>> anyway and it represents a different integration point with ACS.
>>>> Depends
>>>>> on whether you consider the system vm part of the ACS deployment, or
>>>> an
>>>>> entity like a host.
>>>>> 
>>>>> On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:
>>>>> 
>>>>>> Using SNMP for alert notification is not a bad idea though.  I don't
>>>> see
>>>>>> why we can't do that instead of posting to the management server.
>>>> This
>>>>>> is specifically referring to the second part of the proposal.  Why
>>>>>> reinvent that part of it?
>>>>>> 
>>>>>> --Alex
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>>>>>>> Sent: Wednesday, September 25, 2013 10:28 PM
>>>>>>> To: dev@cloudstack.apache.org
>>>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>>>> 
>>>>>>> SNMP wouldn't restart a failed process nor would it generate
>>>> alerts. It
>>>>>>> is
>>>>>>> simply too generic for the requirements outlined here. The proposal
>>>> does
>>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>>> the AGPL.
>>>>>>> I think the idea is to have a tight monitoring loop that scales: so
>>>>>>> executing the
>>>>>>> monitoring loop in-situ makes sense.
>>>>>>> 
>>>>>>> 
>>>>>>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>>>>>>> 
>>>>>>>> On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>>>>>>>> <ja...@citrix.com> wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Currently in virtual router there is no way to recover and
>>>> notify if
>>>>>>>>> some service goes down unexpectedly.
>>>>>>>>> 
>>>>>>>>> This feature is about monitoring all the services rendered by the
>>>>>>>>> virtual router, ensure that the services are running through the
>>>> life
>>>>>>>>> time of the VR.
>>>>>>>>> 
>>>>>>>>> On service failure:
>>>>>>>>> 1. Generate an alert and event indicating failure 2. Restart the
>>>>>>>>> service
>>>>>>>>> 
>>>>>>>>> Services to be monitored:
>>>>>>>>> DHCP, DNS, haproxy, password server etc.
>>>>>>>>> 
>>>>>>>>> As part of monitoring there are two activities
>>>>>>>>> 
>>>>>>>>> 1. One is monitoring the services in VR and log the events. Using
>>>>>>>>> monit for monitoring services  2. Second part is pushing alerts
>>>> from
>>>>>>>>> router to  MS server. Thinking on POST the logs to web server in
>>>> MS.
>>>>>>>>> 
>>>>>>>>> I will be updating more details and FS in this thread.
>>>>>>>>> 
>>>>>>>>> I created enhancement bug for this.
>>>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Jayapal
>>>>>>>> 
>>>>>>>> So several things - why not make this via SNMP? Query processes,
>>>> and
>>>>>>>> many other things. This should be relatively simple, is well known,
>>>> can
>>>>>>>> be locked down (or could be monitored for many other things by
>>>> external
>>>>>>>> monitoring packages) and is the defacto standard for monitoring
>>>> hosts.
>>>>>>>> Second - monit is Affero GPL licensed - which is a cat-x license.
>>>>>>>> While I expect that we would merely use this and not do any
>>>> hacking on
>>>>>>>> it - I think its inclusion might be a surprise (and forbidden in
>>>> many
>>>>>>>> environments) to our users
>>>>>>>> 
>>>>>>>> --David
>>>>>> 
>>>>> 
>>>> 
>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Koushik Das <ko...@citrix.com>.
This is a very useful feature. Can this be extended to the other system VMs? SSVM and CPVM

Based on the discussion I see that there is an assumption that restarting services/rebooting should fix the issues. Is that always true? What if the service fails to restart after repeated attempts? What is the fallback?

-Koushik


On 01-Oct-2013, at 3:15 AM, Chiradeep Vittal <Ch...@citrix.com> wrote:

> Good idea. If x and y and z are borked, initiate shutdown?
> 
> More generically, it seems we need some form of in-VM automation that can
> co-ordinate with top-level orchestration
> 
> On 9/28/13 4:14 AM, "Daan Hoogland" <da...@gmail.com> wrote:
> 
>> Even when always restarting on every glitch we need to monitor the inside
>> of the vr to know when to restart/respin a new vr. There is much
>> functionality present on the vr an for us it is not possible to say for
>> sure what is important to a customer installation so the admin should be
>> able to define the minimal reqs that will stop us from spinning up a new
>> vr. And there must be tools present for monitoring these reqs.
>> 
>> makes sense?
>> 
>> 
>> On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <da...@gnsa.us> wrote:
>> 
>>> For what it's worth we created an ACS-specific MIB (beneath the
>>> org.apache MIB) so really this is just a matter of defining and
>>> publishing it.
>>> 
>>> But lets think about monit being used to restart services - with HA,
>>> Redundant VR, are we sure that we want to inject yet another point of
>>> control into things? Is it better to just respawn an instance since
>>> they are essentially stateless? I don't know, but management server,
>>> local daemons, and other SysVMs making decisions seems like we are
>>> increasing complexity.
>>> 
>>> --David
>>> 
>>> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
>>> <Ch...@citrix.com> wrote:
>>>> In this case you would have to invent another enterprise MIB. Not too
>>>> hard, but I'd argue that it needs to be proxied through some other
>>> service
>>>> anyway and it represents a different integration point with ACS.
>>> Depends
>>>> on whether you consider the system vm part of the ACS deployment, or
>>> an
>>>> entity like a host.
>>>> 
>>>> On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:
>>>> 
>>>>> Using SNMP for alert notification is not a bad idea though.  I don't
>>> see
>>>>> why we can't do that instead of posting to the management server.
>>> This
>>>>> is specifically referring to the second part of the proposal.  Why
>>>>> reinvent that part of it?
>>>>> 
>>>>> --Alex
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>>>>>> Sent: Wednesday, September 25, 2013 10:28 PM
>>>>>> To: dev@cloudstack.apache.org
>>>>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>>>> 
>>>>>> SNMP wouldn't restart a failed process nor would it generate
>>> alerts. It
>>>>>> is
>>>>>> simply too generic for the requirements outlined here. The proposal
>>> does
>>>>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>>>> the AGPL.
>>>>>> I think the idea is to have a tight monitoring loop that scales: so
>>>>>> executing the
>>>>>> monitoring loop in-situ makes sense.
>>>>>> 
>>>>>> 
>>>>>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>>>>>> 
>>>>>>> On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>>>>>>> <ja...@citrix.com> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Currently in virtual router there is no way to recover and
>>> notify if
>>>>>>>> some service goes down unexpectedly.
>>>>>>>> 
>>>>>>>> This feature is about monitoring all the services rendered by the
>>>>>>>> virtual router, ensure that the services are running through the
>>> life
>>>>>>>> time of the VR.
>>>>>>>> 
>>>>>>>> On service failure:
>>>>>>>> 1. Generate an alert and event indicating failure 2. Restart the
>>>>>>>> service
>>>>>>>> 
>>>>>>>> Services to be monitored:
>>>>>>>> DHCP, DNS, haproxy, password server etc.
>>>>>>>> 
>>>>>>>> As part of monitoring there are two activities
>>>>>>>> 
>>>>>>>> 1. One is monitoring the services in VR and log the events. Using
>>>>>>>> monit for monitoring services  2. Second part is pushing alerts
>>> from
>>>>>>>> router to  MS server. Thinking on POST the logs to web server in
>>> MS.
>>>>>>>> 
>>>>>>>> I will be updating more details and FS in this thread.
>>>>>>>> 
>>>>>>>> I created enhancement bug for this.
>>>>>>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Jayapal
>>>>>>> 
>>>>>>> So several things - why not make this via SNMP? Query processes,
>>> and
>>>>>>> many other things. This should be relatively simple, is well known,
>>> can
>>>>>>> be locked down (or could be monitored for many other things by
>>> external
>>>>>>> monitoring packages) and is the defacto standard for monitoring
>>> hosts.
>>>>>>> Second - monit is Affero GPL licensed - which is a cat-x license.
>>>>>>> While I expect that we would merely use this and not do any
>>> hacking on
>>>>>>> it - I think its inclusion might be a surprise (and forbidden in
>>> many
>>>>>>> environments) to our users
>>>>>>> 
>>>>>>> --David
>>>>> 
>>>> 
>>> 
> 


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Chiradeep Vittal <Ch...@citrix.com>.
Good idea. If x and y and z are borked, initiate shutdown?

More generically, it seems we need some form of in-VM automation that can
co-ordinate with top-level orchestration

On 9/28/13 4:14 AM, "Daan Hoogland" <da...@gmail.com> wrote:

>Even when always restarting on every glitch we need to monitor the inside
>of the vr to know when to restart/respin a new vr. There is much
>functionality present on the vr an for us it is not possible to say for
>sure what is important to a customer installation so the admin should be
>able to define the minimal reqs that will stop us from spinning up a new
>vr. And there must be tools present for monitoring these reqs.
>
>makes sense?
>
>
>On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <da...@gnsa.us> wrote:
>
>> For what it's worth we created an ACS-specific MIB (beneath the
>> org.apache MIB) so really this is just a matter of defining and
>> publishing it.
>>
>> But lets think about monit being used to restart services - with HA,
>> Redundant VR, are we sure that we want to inject yet another point of
>> control into things? Is it better to just respawn an instance since
>> they are essentially stateless? I don't know, but management server,
>> local daemons, and other SysVMs making decisions seems like we are
>> increasing complexity.
>>
>> --David
>>
>> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
>> <Ch...@citrix.com> wrote:
>> > In this case you would have to invent another enterprise MIB. Not too
>> > hard, but I'd argue that it needs to be proxied through some other
>> service
>> > anyway and it represents a different integration point with ACS.
>>Depends
>> > on whether you consider the system vm part of the ACS deployment, or
>>an
>> > entity like a host.
>> >
>> > On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:
>> >
>> >>Using SNMP for alert notification is not a bad idea though.  I don't
>>see
>> >>why we can't do that instead of posting to the management server.
>>This
>> >>is specifically referring to the second part of the proposal.  Why
>> >>reinvent that part of it?
>> >>
>> >>--Alex
>> >>
>> >>> -----Original Message-----
>> >>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>> >>> Sent: Wednesday, September 25, 2013 10:28 PM
>> >>> To: dev@cloudstack.apache.org
>> >>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> >>>
>> >>> SNMP wouldn't restart a failed process nor would it generate
>>alerts. It
>> >>>is
>> >>> simply too generic for the requirements outlined here. The proposal
>> does
>> >>> not talk about modifying monit, just using it. That wouldn't trigger
>> >>>the AGPL.
>> >>> I think the idea is to have a tight monitoring loop that scales: so
>> >>>executing the
>> >>> monitoring loop in-situ makes sense.
>> >>>
>> >>>
>> >>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>> >>>
>> >>> >On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>> >>> ><ja...@citrix.com> wrote:
>> >>> >> Hi,
>> >>> >>
>> >>> >> Currently in virtual router there is no way to recover and
>>notify if
>> >>> >>some service goes down unexpectedly.
>> >>> >>
>> >>> >> This feature is about monitoring all the services rendered by the
>> >>> >>virtual router, ensure that the services are running through the
>>life
>> >>> >>time of the VR.
>> >>> >>
>> >>> >> On service failure:
>> >>> >> 1. Generate an alert and event indicating failure 2. Restart the
>> >>> >> service
>> >>> >>
>> >>> >> Services to be monitored:
>> >>> >> DHCP, DNS, haproxy, password server etc.
>> >>> >>
>> >>> >> As part of monitoring there are two activities
>> >>> >>
>> >>> >> 1. One is monitoring the services in VR and log the events. Using
>> >>> >>monit for monitoring services  2. Second part is pushing alerts
>>from
>> >>> >>router to  MS server. Thinking on POST the logs to web server in
>>MS.
>> >>> >>
>> >>> >> I will be updating more details and FS in this thread.
>> >>> >>
>> >>> >> I created enhancement bug for this.
>> >>> >> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Jayapal
>> >>> >
>> >>> >So several things - why not make this via SNMP? Query processes,
>>and
>> >>> >many other things. This should be relatively simple, is well known,
>> can
>> >>> >be locked down (or could be monitored for many other things by
>> external
>> >>> >monitoring packages) and is the defacto standard for monitoring
>>hosts.
>> >>> >Second - monit is Affero GPL licensed - which is a cat-x license.
>> >>> >While I expect that we would merely use this and not do any
>>hacking on
>> >>> >it - I think its inclusion might be a surprise (and forbidden in
>>many
>> >>> >environments) to our users
>> >>> >
>> >>> >--David
>> >>
>> >
>>


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Daan Hoogland <da...@gmail.com>.
Even when always restarting on every glitch we need to monitor the inside
of the vr to know when to restart/respin a new vr. There is much
functionality present on the vr an for us it is not possible to say for
sure what is important to a customer installation so the admin should be
able to define the minimal reqs that will stop us from spinning up a new
vr. And there must be tools present for monitoring these reqs.

makes sense?


On Thu, Sep 26, 2013 at 10:01 PM, David Nalley <da...@gnsa.us> wrote:

> For what it's worth we created an ACS-specific MIB (beneath the
> org.apache MIB) so really this is just a matter of defining and
> publishing it.
>
> But lets think about monit being used to restart services - with HA,
> Redundant VR, are we sure that we want to inject yet another point of
> control into things? Is it better to just respawn an instance since
> they are essentially stateless? I don't know, but management server,
> local daemons, and other SysVMs making decisions seems like we are
> increasing complexity.
>
> --David
>
> On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
> <Ch...@citrix.com> wrote:
> > In this case you would have to invent another enterprise MIB. Not too
> > hard, but I'd argue that it needs to be proxied through some other
> service
> > anyway and it represents a different integration point with ACS. Depends
> > on whether you consider the system vm part of the ACS deployment, or an
> > entity like a host.
> >
> > On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:
> >
> >>Using SNMP for alert notification is not a bad idea though.  I don't see
> >>why we can't do that instead of posting to the management server.  This
> >>is specifically referring to the second part of the proposal.  Why
> >>reinvent that part of it?
> >>
> >>--Alex
> >>
> >>> -----Original Message-----
> >>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
> >>> Sent: Wednesday, September 25, 2013 10:28 PM
> >>> To: dev@cloudstack.apache.org
> >>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
> >>>
> >>> SNMP wouldn't restart a failed process nor would it generate alerts. It
> >>>is
> >>> simply too generic for the requirements outlined here. The proposal
> does
> >>> not talk about modifying monit, just using it. That wouldn't trigger
> >>>the AGPL.
> >>> I think the idea is to have a tight monitoring loop that scales: so
> >>>executing the
> >>> monitoring loop in-situ makes sense.
> >>>
> >>>
> >>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
> >>>
> >>> >On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
> >>> ><ja...@citrix.com> wrote:
> >>> >> Hi,
> >>> >>
> >>> >> Currently in virtual router there is no way to recover and notify if
> >>> >>some service goes down unexpectedly.
> >>> >>
> >>> >> This feature is about monitoring all the services rendered by the
> >>> >>virtual router, ensure that the services are running through the life
> >>> >>time of the VR.
> >>> >>
> >>> >> On service failure:
> >>> >> 1. Generate an alert and event indicating failure 2. Restart the
> >>> >> service
> >>> >>
> >>> >> Services to be monitored:
> >>> >> DHCP, DNS, haproxy, password server etc.
> >>> >>
> >>> >> As part of monitoring there are two activities
> >>> >>
> >>> >> 1. One is monitoring the services in VR and log the events. Using
> >>> >>monit for monitoring services  2. Second part is pushing alerts from
> >>> >>router to  MS server. Thinking on POST the logs to web server in MS.
> >>> >>
> >>> >> I will be updating more details and FS in this thread.
> >>> >>
> >>> >> I created enhancement bug for this.
> >>> >> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
> >>> >>
> >>> >> Thanks,
> >>> >> Jayapal
> >>> >
> >>> >So several things - why not make this via SNMP? Query processes, and
> >>> >many other things. This should be relatively simple, is well known,
> can
> >>> >be locked down (or could be monitored for many other things by
> external
> >>> >monitoring packages) and is the defacto standard for monitoring hosts.
> >>> >Second - monit is Affero GPL licensed - which is a cat-x license.
> >>> >While I expect that we would merely use this and not do any hacking on
> >>> >it - I think its inclusion might be a surprise (and forbidden in many
> >>> >environments) to our users
> >>> >
> >>> >--David
> >>
> >
>

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by David Nalley <da...@gnsa.us>.
For what it's worth we created an ACS-specific MIB (beneath the
org.apache MIB) so really this is just a matter of defining and
publishing it.

But lets think about monit being used to restart services - with HA,
Redundant VR, are we sure that we want to inject yet another point of
control into things? Is it better to just respawn an instance since
they are essentially stateless? I don't know, but management server,
local daemons, and other SysVMs making decisions seems like we are
increasing complexity.

--David

On Thu, Sep 26, 2013 at 10:31 AM, Chiradeep Vittal
<Ch...@citrix.com> wrote:
> In this case you would have to invent another enterprise MIB. Not too
> hard, but I'd argue that it needs to be proxied through some other service
> anyway and it represents a different integration point with ACS. Depends
> on whether you consider the system vm part of the ACS deployment, or an
> entity like a host.
>
> On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:
>
>>Using SNMP for alert notification is not a bad idea though.  I don't see
>>why we can't do that instead of posting to the management server.  This
>>is specifically referring to the second part of the proposal.  Why
>>reinvent that part of it?
>>
>>--Alex
>>
>>> -----Original Message-----
>>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>>> Sent: Wednesday, September 25, 2013 10:28 PM
>>> To: dev@cloudstack.apache.org
>>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>>>
>>> SNMP wouldn't restart a failed process nor would it generate alerts. It
>>>is
>>> simply too generic for the requirements outlined here. The proposal does
>>> not talk about modifying monit, just using it. That wouldn't trigger
>>>the AGPL.
>>> I think the idea is to have a tight monitoring loop that scales: so
>>>executing the
>>> monitoring loop in-situ makes sense.
>>>
>>>
>>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>>>
>>> >On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>>> ><ja...@citrix.com> wrote:
>>> >> Hi,
>>> >>
>>> >> Currently in virtual router there is no way to recover and notify if
>>> >>some service goes down unexpectedly.
>>> >>
>>> >> This feature is about monitoring all the services rendered by the
>>> >>virtual router, ensure that the services are running through the life
>>> >>time of the VR.
>>> >>
>>> >> On service failure:
>>> >> 1. Generate an alert and event indicating failure 2. Restart the
>>> >> service
>>> >>
>>> >> Services to be monitored:
>>> >> DHCP, DNS, haproxy, password server etc.
>>> >>
>>> >> As part of monitoring there are two activities
>>> >>
>>> >> 1. One is monitoring the services in VR and log the events. Using
>>> >>monit for monitoring services  2. Second part is pushing alerts from
>>> >>router to  MS server. Thinking on POST the logs to web server in MS.
>>> >>
>>> >> I will be updating more details and FS in this thread.
>>> >>
>>> >> I created enhancement bug for this.
>>> >> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>> >>
>>> >> Thanks,
>>> >> Jayapal
>>> >
>>> >So several things - why not make this via SNMP? Query processes, and
>>> >many other things. This should be relatively simple, is well known, can
>>> >be locked down (or could be monitored for many other things by external
>>> >monitoring packages) and is the defacto standard for monitoring hosts.
>>> >Second - monit is Affero GPL licensed - which is a cat-x license.
>>> >While I expect that we would merely use this and not do any hacking on
>>> >it - I think its inclusion might be a surprise (and forbidden in many
>>> >environments) to our users
>>> >
>>> >--David
>>
>

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Chiradeep Vittal <Ch...@citrix.com>.
In this case you would have to invent another enterprise MIB. Not too
hard, but I'd argue that it needs to be proxied through some other service
anyway and it represents a different integration point with ACS. Depends
on whether you consider the system vm part of the ACS deployment, or an
entity like a host.

On 9/26/13 10:27 AM, "Alex Huang" <Al...@citrix.com> wrote:

>Using SNMP for alert notification is not a bad idea though.  I don't see
>why we can't do that instead of posting to the management server.  This
>is specifically referring to the second part of the proposal.  Why
>reinvent that part of it?
>
>--Alex
>
>> -----Original Message-----
>> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
>> Sent: Wednesday, September 25, 2013 10:28 PM
>> To: dev@cloudstack.apache.org
>> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
>> 
>> SNMP wouldn't restart a failed process nor would it generate alerts. It
>>is
>> simply too generic for the requirements outlined here. The proposal does
>> not talk about modifying monit, just using it. That wouldn't trigger
>>the AGPL.
>> I think the idea is to have a tight monitoring loop that scales: so
>>executing the
>> monitoring loop in-situ makes sense.
>> 
>> 
>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>> 
>> >On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>> ><ja...@citrix.com> wrote:
>> >> Hi,
>> >>
>> >> Currently in virtual router there is no way to recover and notify if
>> >>some service goes down unexpectedly.
>> >>
>> >> This feature is about monitoring all the services rendered by the
>> >>virtual router, ensure that the services are running through the life
>> >>time of the VR.
>> >>
>> >> On service failure:
>> >> 1. Generate an alert and event indicating failure 2. Restart the
>> >> service
>> >>
>> >> Services to be monitored:
>> >> DHCP, DNS, haproxy, password server etc.
>> >>
>> >> As part of monitoring there are two activities
>> >>
>> >> 1. One is monitoring the services in VR and log the events. Using
>> >>monit for monitoring services  2. Second part is pushing alerts from
>> >>router to  MS server. Thinking on POST the logs to web server in MS.
>> >>
>> >> I will be updating more details and FS in this thread.
>> >>
>> >> I created enhancement bug for this.
>> >> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>> >>
>> >> Thanks,
>> >> Jayapal
>> >
>> >So several things - why not make this via SNMP? Query processes, and
>> >many other things. This should be relatively simple, is well known, can
>> >be locked down (or could be monitored for many other things by external
>> >monitoring packages) and is the defacto standard for monitoring hosts.
>> >Second - monit is Affero GPL licensed - which is a cat-x license.
>> >While I expect that we would merely use this and not do any hacking on
>> >it - I think its inclusion might be a surprise (and forbidden in many
>> >environments) to our users
>> >
>> >--David
>


RE: [PROPOSAL] Service monitoring tool in virtual router

Posted by Alex Huang <Al...@citrix.com>.
Using SNMP for alert notification is not a bad idea though.  I don't see why we can't do that instead of posting to the management server.  This is specifically referring to the second part of the proposal.  Why reinvent that part of it?

--Alex

> -----Original Message-----
> From: Chiradeep Vittal [mailto:Chiradeep.Vittal@citrix.com]
> Sent: Wednesday, September 25, 2013 10:28 PM
> To: dev@cloudstack.apache.org
> Subject: Re: [PROPOSAL] Service monitoring tool in virtual router
> 
> SNMP wouldn't restart a failed process nor would it generate alerts. It is
> simply too generic for the requirements outlined here. The proposal does
> not talk about modifying monit, just using it. That wouldn't trigger the AGPL.
> I think the idea is to have a tight monitoring loop that scales: so executing the
> monitoring loop in-situ makes sense.
> 
> 
> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
> 
> >On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
> ><ja...@citrix.com> wrote:
> >> Hi,
> >>
> >> Currently in virtual router there is no way to recover and notify if
> >>some service goes down unexpectedly.
> >>
> >> This feature is about monitoring all the services rendered by the
> >>virtual router, ensure that the services are running through the life
> >>time of the VR.
> >>
> >> On service failure:
> >> 1. Generate an alert and event indicating failure 2. Restart the
> >> service
> >>
> >> Services to be monitored:
> >> DHCP, DNS, haproxy, password server etc.
> >>
> >> As part of monitoring there are two activities
> >>
> >> 1. One is monitoring the services in VR and log the events. Using
> >>monit for monitoring services  2. Second part is pushing alerts from
> >>router to  MS server. Thinking on POST the logs to web server in MS.
> >>
> >> I will be updating more details and FS in this thread.
> >>
> >> I created enhancement bug for this.
> >> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
> >>
> >> Thanks,
> >> Jayapal
> >
> >So several things - why not make this via SNMP? Query processes, and
> >many other things. This should be relatively simple, is well known, can
> >be locked down (or could be monitored for many other things by external
> >monitoring packages) and is the defacto standard for monitoring hosts.
> >Second - monit is Affero GPL licensed - which is a cat-x license.
> >While I expect that we would merely use this and not do any hacking on
> >it - I think its inclusion might be a surprise (and forbidden in many
> >environments) to our users
> >
> >--David


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Chip Childers <ch...@sungard.com>.
On Thu, Sep 26, 2013 at 05:27:57AM +0000, Chiradeep Vittal wrote:
> The proposal does
> not talk about modifying monit, just using it. That wouldn't trigger the
> AGPL.

The proposal talks about using it, and that's enough to trigger the
AGPL.  This is a *very bad* thing IMO.  For example, $dayjob would
require that we work around this in our environment (i.e.: not deploy
it).

Please please please don't bring in an AGPL package.  This isn't an ASF
"category X" issue, since we are talking about usage.  It's actually a
larger issue for users though.  Many legal departments would consider
the use of software with that license to trigger the requirement for
that organization to publish the source.

-chip

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Darren Shepherd <da...@gmail.com>.
I think monit is already in the current system vm template.  I know at least the scripts in tools/appliance add it (maybe that was a post 4.2 change).  +1 for monit.  

Darren

> On Sep 25, 2013, at 10:27 PM, Chiradeep Vittal <Ch...@citrix.com> wrote:
> 
> SNMP wouldn't restart a failed process nor would it generate alerts. It is
> simply too generic for the requirements outlined here. The proposal does
> not talk about modifying monit, just using it. That wouldn't trigger the
> AGPL.
> I think the idea is to have a tight monitoring loop that scales: so
> executing the monitoring loop in-situ makes sense.
> 
> 
>> On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:
>> 
>> On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
>> <ja...@citrix.com> wrote:
>>> Hi,
>>> 
>>> Currently in virtual router there is no way to recover and notify if
>>> some service goes down unexpectedly.
>>> 
>>> This feature is about monitoring all the services rendered by the
>>> virtual router, ensure that the services are running through the life
>>> time of the VR.
>>> 
>>> On service failure:
>>> 1. Generate an alert and event indicating failure
>>> 2. Restart the service
>>> 
>>> Services to be monitored:
>>> DHCP, DNS, haproxy, password server etc.
>>> 
>>> As part of monitoring there are two activities
>>> 
>>> 1. One is monitoring the services in VR and log the events. Using monit
>>> for monitoring services
>>> 2. Second part is pushing alerts from router to  MS server. Thinking on
>>> POST the logs to web server in MS.
>>> 
>>> I will be updating more details and FS in this thread.
>>> 
>>> I created enhancement bug for this.
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>> 
>>> Thanks,
>>> Jayapal
>> 
>> So several things - why not make this via SNMP? Query processes, and
>> many other things. This should be relatively simple, is well known,
>> can be locked down (or could be monitored for many other things by
>> external monitoring packages) and is the defacto standard for
>> monitoring hosts.
>> Second - monit is Affero GPL licensed - which is a cat-x license.
>> While I expect that we would merely use this and not do any hacking on
>> it - I think its inclusion might be a surprise (and forbidden in many
>> environments) to our users
>> 
>> --David
> 

Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by Chiradeep Vittal <Ch...@citrix.com>.
SNMP wouldn't restart a failed process nor would it generate alerts. It is
simply too generic for the requirements outlined here. The proposal does
not talk about modifying monit, just using it. That wouldn't trigger the
AGPL.
I think the idea is to have a tight monitoring loop that scales: so
executing the monitoring loop in-situ makes sense.


On 9/25/13 9:53 PM, "David Nalley" <da...@gnsa.us> wrote:

>On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
><ja...@citrix.com> wrote:
>> Hi,
>>
>> Currently in virtual router there is no way to recover and notify if
>>some service goes down unexpectedly.
>>
>> This feature is about monitoring all the services rendered by the
>>virtual router, ensure that the services are running through the life
>>time of the VR.
>>
>> On service failure:
>> 1. Generate an alert and event indicating failure
>> 2. Restart the service
>>
>> Services to be monitored:
>> DHCP, DNS, haproxy, password server etc.
>>
>> As part of monitoring there are two activities
>>
>> 1. One is monitoring the services in VR and log the events. Using monit
>>for monitoring services
>> 2. Second part is pushing alerts from router to  MS server. Thinking on
>>POST the logs to web server in MS.
>>
>> I will be updating more details and FS in this thread.
>>
>> I created enhancement bug for this.
>> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>>
>> Thanks,
>> Jayapal
>
>So several things - why not make this via SNMP? Query processes, and
>many other things. This should be relatively simple, is well known,
>can be locked down (or could be monitored for many other things by
>external monitoring packages) and is the defacto standard for
>monitoring hosts.
>Second - monit is Affero GPL licensed - which is a cat-x license.
>While I expect that we would merely use this and not do any hacking on
>it - I think its inclusion might be a surprise (and forbidden in many
>environments) to our users
>
>--David


Re: [PROPOSAL] Service monitoring tool in virtual router

Posted by David Nalley <da...@gnsa.us>.
On Wed, Sep 25, 2013 at 9:30 AM, Jayapal Reddy Uradi
<ja...@citrix.com> wrote:
> Hi,
>
> Currently in virtual router there is no way to recover and notify if some service goes down unexpectedly.
>
> This feature is about monitoring all the services rendered by the virtual router, ensure that the services are running through the life time of the VR.
>
> On service failure:
> 1. Generate an alert and event indicating failure
> 2. Restart the service
>
> Services to be monitored:
> DHCP, DNS, haproxy, password server etc.
>
> As part of monitoring there are two activities
>
> 1. One is monitoring the services in VR and log the events. Using monit for monitoring services
> 2. Second part is pushing alerts from router to  MS server. Thinking on POST the logs to web server in MS.
>
> I will be updating more details and FS in this thread.
>
> I created enhancement bug for this.
> https://issues.apache.org/jira/browse/CLOUDSTACK-4736
>
> Thanks,
> Jayapal

So several things - why not make this via SNMP? Query processes, and
many other things. This should be relatively simple, is well known,
can be locked down (or could be monitored for many other things by
external monitoring packages) and is the defacto standard for
monitoring hosts.
Second - monit is Affero GPL licensed - which is a cat-x license.
While I expect that we would merely use this and not do any hacking on
it - I think its inclusion might be a surprise (and forbidden in many
environments) to our users

--David