You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Sergey Puchnin (JIRA)" <ji...@apache.org> on 2017/11/15 12:18:00 UTC
[jira] [Updated] (IGNITE-6587) Ignite watchdog service
[ https://issues.apache.org/jira/browse/IGNITE-6587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Puchnin updated IGNITE-6587:
-----------------------------------
Description:
We need to come up with a 'watchdog service' to monitor for Ignite node local health and kill the process under some critical conditions.
For example, if one of the mission-critical Ignite threads die, the Ignite node must be stopped.
At the first glance, the list of critical threads is:
disco-event-worker
tcp-disco-sock-reader
tcp-disco-srvr
tcp-disco-msg-worker
tcp-comm-worker
grid-nio-worker-tcp-comm
exchange-worker
sys-stripe
grid-timeout-worker
db-checkpoint-thread
wal-file-archiver
ttl-cleanup-worker
nio-acceptor
The mechanism should support pluggable components so that self-check can be extended via plugins.
was:
We need to come up with a 'watchdog service' to monitor for Ignite node local health and kill the process under some critical conditions.
For example, if one of the mission-critical Ignite threads die, the Ignite node must be stopped.
At the first glance, the list of critical threads is:
All TCP discovery threads
All communication NIO threads (acceptor and workers)
Exchange worker
Striped pool threads
Timeout Worker
Checkpointer
WAL archiver
The mechanism should support pluggable components so that self-check can be extended via plugins.
> Ignite watchdog service
> -----------------------
>
> Key: IGNITE-6587
> URL: https://issues.apache.org/jira/browse/IGNITE-6587
> Project: Ignite
> Issue Type: Improvement
> Components: general
> Affects Versions: 2.2
> Reporter: Alexey Goncharuk
> Assignee: Dmitriy Pavlov
> Labels: IEP-5
> Fix For: 2.4
>
> Attachments: watchdog.sh
>
>
> We need to come up with a 'watchdog service' to monitor for Ignite node local health and kill the process under some critical conditions.
> For example, if one of the mission-critical Ignite threads die, the Ignite node must be stopped.
> At the first glance, the list of critical threads is:
> disco-event-worker
> tcp-disco-sock-reader
> tcp-disco-srvr
> tcp-disco-msg-worker
> tcp-comm-worker
> grid-nio-worker-tcp-comm
> exchange-worker
> sys-stripe
> grid-timeout-worker
> db-checkpoint-thread
> wal-file-archiver
> ttl-cleanup-worker
> nio-acceptor
> The mechanism should support pluggable components so that self-check can be extended via plugins.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)