You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@ignite.apache.org by Andrey Gura <ag...@apache.org> on 2018/05/15 13:40:18 UTC

Re: Topology-wide notification on critical errors

Ilya,

adding of message that will be sent to all other nodes still doesn't
make mentioned task easier. You still should understand where to find
problem description and what exactly.

Only helpful case here is using NoOpFailureHandler because node can
just hang but still be in topology so any diagnostic will be painful.

I don't sure that we should send any cluster-wide message about
critical errors. But I don't have enough arguments against such
behaviour.

On Mon, Apr 23, 2018 at 6:14 PM, Ilya Kasnacheev
<il...@gmail.com> wrote:
> Hello Denis!
>
> In my opinion, the primary users of this improvement will be developers,
> who at testing and pre-production stage are encountering errors only when
> trying production-size clusters.
>
> This means they end up with a dozen of log files and have no idea where to
> start looking at. Since it's non-production, DevOps expertise is often
> unavailable or limited at this point.
>
> This is the pattern that we repeatedly see in this maillist and on SO, and
> elsewhere.
>
> Regards,
>
> --
> Ilya Kasnacheev
>
> 2018-04-21 1:20 GMT+03:00 Denis Magda <dm...@apache.org>:
>
>> It might be useful if it's supported out of the box however usually DevOps
>> and admins use tools like DynaTrace or Splunk to monitor all the logs,
>> arrange logs in a meaningful way and set up special hooks for particular
>> events. It means if an event happens only on 1 node the tool will still
>> detect it.
>>
>> Thus my question is who is a primary user of this improvement?
>>
>> --
>> Denis
>>
>> On Fri, Apr 20, 2018 at 5:12 AM, Yakov Zhdanov <yz...@apache.org>
>> wrote:
>>
>> > Of course, no guarantees, but at least an effort.
>> >
>> > --Yakov
>> >
>>