You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Gary Malouf <ma...@gmail.com> on 2014/12/10 00:45:39 UTC

Monitoring Mesos slave/master processes

We did this in the past with Nagios, but I was wondering if there was a
recommended way from others using in production.

Re: Monitoring Mesos slave/master processes

Posted by Tom Arnfeld <to...@duedil.com>.

We're monitoring the processes with M/Monit on each machine and pumping all the metrics to Graphite with https://github.com/rayrod2030/collectd-mesos.

--

Tom Arnfeld

Developer // DueDil

On Tue, Dec 9, 2014 at 11:46 PM, Gary Malouf <ma...@gmail.com>
wrote:

> We did this in the past with Nagios, but I was wondering if there was a
> recommended way from others using in production.

Re: Monitoring Mesos slave/master processes

Posted by Billy Bones <ga...@gmail.com>.

Well, as we build our business on the "Failure is a feature" credo we don't
really have a really advanced notification system, but we use this instead:
https://github.com/AcalephStorage/consul-alerts

you can run it on a docker container, have fine configuration settings,
events handlers (built-in and customs) and it rely on consul KV Store for
the configuration.
Here we just have a docker container hosting a consul agent in client mode
and consul-alert on top of this consul agent to use the distributed KV and
don't rely on the master one.

2014-12-10 15:47 GMT+01:00 Gary Malouf <ma...@gmail.com>:

> Billy, thanks for the link.  It was not easy to tell from the website, but
> do you get email/text alerts if something goes wrong overnight?
>
> On Wed, Dec 10, 2014 at 3:54 AM, Billy Bones <ga...@gmail.com>
> wrote:
>
>> Here we use the wonderfull consul tool as our monitoring and health check
>> dashboard plus some other things.
>>
>> Check it out at consul.io, it's made by Hashicorp.
>> I kinda like it because it's fast, realiable and it is build with huge
>> distributed systems in mind from the ground up.
>>
>> 2014-12-10 1:11 GMT+01:00 Steven Schlansker <ss...@opentable.com>:
>>
>>>
>>> On Dec 9, 2014, at 3:45 PM, Gary Malouf <ma...@gmail.com> wrote:
>>>
>>> > We did this in the past with Nagios, but I was wondering if there was
>>> a recommended way from others using in production.
>>>
>>> I wrote a Nagios plugin for it
>>>
>>> https://github.com/opentable/nagios-mesos
>>>
>>>
>>>
>>>
>>
>

Re: Monitoring Mesos slave/master processes

Posted by Leigh Martell <le...@immun.io>.

Hey Gary,
  I just finished up setting up consul, you need to setup handlers. In my
case I used a project called consul-alerts, the advantage here is that it
hold the alerts state so if it is not cleared in x seconds it will than
alert your end point(ie: pagerduty).

Here is the link https://github.com/AcalephStorage/consul-alerts

Hope that helps!

-Leigh

On Wed, Dec 10, 2014 at 10:47 AM, Gary Malouf <ma...@gmail.com> wrote:

> Billy, thanks for the link.  It was not easy to tell from the website, but
> do you get email/text alerts if something goes wrong overnight?
>
> On Wed, Dec 10, 2014 at 3:54 AM, Billy Bones <ga...@gmail.com>
> wrote:
>
>> Here we use the wonderfull consul tool as our monitoring and health check
>> dashboard plus some other things.
>>
>> Check it out at consul.io, it's made by Hashicorp.
>> I kinda like it because it's fast, realiable and it is build with huge
>> distributed systems in mind from the ground up.
>>
>> 2014-12-10 1:11 GMT+01:00 Steven Schlansker <ss...@opentable.com>:
>>
>>>
>>> On Dec 9, 2014, at 3:45 PM, Gary Malouf <ma...@gmail.com> wrote:
>>>
>>> > We did this in the past with Nagios, but I was wondering if there was
>>> a recommended way from others using in production.
>>>
>>> I wrote a Nagios plugin for it
>>>
>>> https://github.com/opentable/nagios-mesos
>>>
>>>
>>>
>>>
>>
>

Re: Monitoring Mesos slave/master processes

Posted by Gary Malouf <ma...@gmail.com>.

Billy, thanks for the link.  It was not easy to tell from the website, but
do you get email/text alerts if something goes wrong overnight?

On Wed, Dec 10, 2014 at 3:54 AM, Billy Bones <ga...@gmail.com> wrote:

> Here we use the wonderfull consul tool as our monitoring and health check
> dashboard plus some other things.
>
> Check it out at consul.io, it's made by Hashicorp.
> I kinda like it because it's fast, realiable and it is build with huge
> distributed systems in mind from the ground up.
>
> 2014-12-10 1:11 GMT+01:00 Steven Schlansker <ss...@opentable.com>:
>
>>
>> On Dec 9, 2014, at 3:45 PM, Gary Malouf <ma...@gmail.com> wrote:
>>
>> > We did this in the past with Nagios, but I was wondering if there was a
>> recommended way from others using in production.
>>
>> I wrote a Nagios plugin for it
>>
>> https://github.com/opentable/nagios-mesos
>>
>>
>>
>>
>

Re: Monitoring Mesos slave/master processes

Posted by Billy Bones <ga...@gmail.com>.

Here we use the wonderfull consul tool as our monitoring and health check
dashboard plus some other things.

Check it out at consul.io, it's made by Hashicorp.
I kinda like it because it's fast, realiable and it is build with huge
distributed systems in mind from the ground up.

2014-12-10 1:11 GMT+01:00 Steven Schlansker <ss...@opentable.com>:

>
> On Dec 9, 2014, at 3:45 PM, Gary Malouf <ma...@gmail.com> wrote:
>
> > We did this in the past with Nagios, but I was wondering if there was a
> recommended way from others using in production.
>
> I wrote a Nagios plugin for it
>
> https://github.com/opentable/nagios-mesos
>
>
>
>

Re: Monitoring Mesos slave/master processes

Posted by Steven Schlansker <ss...@opentable.com>.

On Dec 9, 2014, at 3:45 PM, Gary Malouf <ma...@gmail.com> wrote:

> We did this in the past with Nagios, but I was wondering if there was a recommended way from others using in production.

I wrote a Nagios plugin for it

https://github.com/opentable/nagios-mesos