You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Paul Bell <ar...@gmail.com> on 2015/09/16 19:11:59 UTC

Detecting slave crashes event

Hi All,

I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
subscribable event bus.

So I am wondering if there's a best practices way of determining if a slave
node has crashed. By "crashed" I mean something like the power plug got
yanked, or anything that would cause Mesos to stop talking to the slave
node.

I suppose such information would be recorded in /var/log/mesos.

Interested to learn how best to detect this.

Thank you.

-Paul

Re: Detecting slave crashes event

Posted by Paul Bell <ar...@gmail.com>.

Thank you all for your responses.

I look forward to event subscription. :)

-Paul

On Wed, Sep 23, 2015 at 2:23 PM, Joris Van Remoortere <jo...@mesosphere.io>
wrote:

> There is a plan for event subscription, but it is still in the early
> design phase.
>
> In 0.25 we are adding slave exit hooks: MESOS-3015
>
> This will allow you to generate whatever events you like based on removal
> of a slave. This is your best bet in terms of an immediate solution :-)
> @Kapil and @Niklas have worked on this hook.
>
> On Wed, Sep 23, 2015 at 1:29 PM, Benjamin Mahler <
> benjamin.mahler@gmail.com> wrote:
>
>> I believe some of the contributors from Mesosphere have been thinking
>> about it, but not sure on the plans. I'll let them reply here.
>>
>> On Wed, Sep 16, 2015 at 11:11 AM, Paul Bell <ar...@gmail.com> wrote:
>>
>>> Thank you, Benjamin.
>>>
>>> So, I could periodically request the metrics endpoint, or stream the
>>> logs (maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed"
>>> message look like in the logs?
>>>
>>> Are there plans to offer a mechanism for event subscription?
>>>
>>> Cordially,
>>>
>>> Paul
>>>
>>>
>>>
>>> On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler <
>>> benjamin.mahler@gmail.com> wrote:
>>>
>>>> You can detect when we remove an agent due to health check failures via
>>>> the metrics endpoint, but these are counters that are better used for
>>>> alerting / dashboards for visibility. If you need to know which agents, you
>>>> can also consume the logs as a stop-gap solution, until we offer a
>>>> mechanism for subscribing to cluster events.
>>>>
>>>> On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <ar...@gmail.com> wrote:
>>>>
>>>>> Hi All,
>>>>>
>>>>> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer
>>>>> a subscribable event bus.
>>>>>
>>>>> So I am wondering if there's a best practices way of determining if a
>>>>> slave node has crashed. By "crashed" I mean something like the power plug
>>>>> got yanked, or anything that would cause Mesos to stop talking to the slave
>>>>> node.
>>>>>
>>>>> I suppose such information would be recorded in /var/log/mesos.
>>>>>
>>>>> Interested to learn how best to detect this.
>>>>>
>>>>> Thank you.
>>>>>
>>>>> -Paul
>>>>>
>>>>
>>>>
>>>
>>
>

Re: Detecting slave crashes event

Posted by Joris Van Remoortere <jo...@mesosphere.io>.

There is a plan for event subscription, but it is still in the early design
phase.

In 0.25 we are adding slave exit hooks: MESOS-3015

This will allow you to generate whatever events you like based on removal
of a slave. This is your best bet in terms of an immediate solution :-)
@Kapil and @Niklas have worked on this hook.

On Wed, Sep 23, 2015 at 1:29 PM, Benjamin Mahler <be...@gmail.com>
wrote:

> I believe some of the contributors from Mesosphere have been thinking
> about it, but not sure on the plans. I'll let them reply here.
>
> On Wed, Sep 16, 2015 at 11:11 AM, Paul Bell <ar...@gmail.com> wrote:
>
>> Thank you, Benjamin.
>>
>> So, I could periodically request the metrics endpoint, or stream the logs
>> (maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed"
>> message look like in the logs?
>>
>> Are there plans to offer a mechanism for event subscription?
>>
>> Cordially,
>>
>> Paul
>>
>>
>>
>> On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler <
>> benjamin.mahler@gmail.com> wrote:
>>
>>> You can detect when we remove an agent due to health check failures via
>>> the metrics endpoint, but these are counters that are better used for
>>> alerting / dashboards for visibility. If you need to know which agents, you
>>> can also consume the logs as a stop-gap solution, until we offer a
>>> mechanism for subscribing to cluster events.
>>>
>>> On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <ar...@gmail.com> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
>>>> subscribable event bus.
>>>>
>>>> So I am wondering if there's a best practices way of determining if a
>>>> slave node has crashed. By "crashed" I mean something like the power plug
>>>> got yanked, or anything that would cause Mesos to stop talking to the slave
>>>> node.
>>>>
>>>> I suppose such information would be recorded in /var/log/mesos.
>>>>
>>>> Interested to learn how best to detect this.
>>>>
>>>> Thank you.
>>>>
>>>> -Paul
>>>>
>>>
>>>
>>
>

Re: Detecting slave crashes event

Posted by Benjamin Mahler <be...@gmail.com>.

I believe some of the contributors from Mesosphere have been thinking about
it, but not sure on the plans. I'll let them reply here.

On Wed, Sep 16, 2015 at 11:11 AM, Paul Bell <ar...@gmail.com> wrote:

> Thank you, Benjamin.
>
> So, I could periodically request the metrics endpoint, or stream the logs
> (maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed"
> message look like in the logs?
>
> Are there plans to offer a mechanism for event subscription?
>
> Cordially,
>
> Paul
>
>
>
> On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler <
> benjamin.mahler@gmail.com> wrote:
>
>> You can detect when we remove an agent due to health check failures via
>> the metrics endpoint, but these are counters that are better used for
>> alerting / dashboards for visibility. If you need to know which agents, you
>> can also consume the logs as a stop-gap solution, until we offer a
>> mechanism for subscribing to cluster events.
>>
>> On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <ar...@gmail.com> wrote:
>>
>>> Hi All,
>>>
>>> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
>>> subscribable event bus.
>>>
>>> So I am wondering if there's a best practices way of determining if a
>>> slave node has crashed. By "crashed" I mean something like the power plug
>>> got yanked, or anything that would cause Mesos to stop talking to the slave
>>> node.
>>>
>>> I suppose such information would be recorded in /var/log/mesos.
>>>
>>> Interested to learn how best to detect this.
>>>
>>> Thank you.
>>>
>>> -Paul
>>>
>>
>>
>

Re: Detecting slave crashes event

Posted by Paul Bell <ar...@gmail.com>.

Thank you, Benjamin.

So, I could periodically request the metrics endpoint, or stream the logs
(maybe via mesos.cli; or SSH)? What, roughly, does the "agent removed"
message look like in the logs?

Are there plans to offer a mechanism for event subscription?

Cordially,

Paul



On Wed, Sep 16, 2015 at 1:30 PM, Benjamin Mahler <be...@gmail.com>
wrote:

> You can detect when we remove an agent due to health check failures via
> the metrics endpoint, but these are counters that are better used for
> alerting / dashboards for visibility. If you need to know which agents, you
> can also consume the logs as a stop-gap solution, until we offer a
> mechanism for subscribing to cluster events.
>
> On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <ar...@gmail.com> wrote:
>
>> Hi All,
>>
>> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
>> subscribable event bus.
>>
>> So I am wondering if there's a best practices way of determining if a
>> slave node has crashed. By "crashed" I mean something like the power plug
>> got yanked, or anything that would cause Mesos to stop talking to the slave
>> node.
>>
>> I suppose such information would be recorded in /var/log/mesos.
>>
>> Interested to learn how best to detect this.
>>
>> Thank you.
>>
>> -Paul
>>
>
>

Re: Detecting slave crashes event

Posted by Benjamin Mahler <be...@gmail.com>.

You can detect when we remove an agent due to health check failures via the
metrics endpoint, but these are counters that are better used for alerting
/ dashboards for visibility. If you need to know which agents, you can also
consume the logs as a stop-gap solution, until we offer a mechanism for
subscribing to cluster events.

On Wed, Sep 16, 2015 at 10:11 AM, Paul Bell <ar...@gmail.com> wrote:

> Hi All,
>
> I am led to believe that, unlike Marathon, Mesos doesn't (yet?) offer a
> subscribable event bus.
>
> So I am wondering if there's a best practices way of determining if a
> slave node has crashed. By "crashed" I mean something like the power plug
> got yanked, or anything that would cause Mesos to stop talking to the slave
> node.
>
> I suppose such information would be recorded in /var/log/mesos.
>
> Interested to learn how best to detect this.
>
> Thank you.
>
> -Paul
>