You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleksandr Shulgin <ol...@zalando.de> on 2018/01/19 08:42:41 UTC
Decommissioned nodes and FailureDetector
Hello,
Is there a better way to monitor for Cassandra nodes going Down than
querying via JMX for a condition like FailureDetector.DownEndpointCount > 0?
The problem for us is when any node is decommissioned, it affects the
DownEndpointCount for another ~3 days (the famous 72 hours of gossip).
Is there a similar metric to be observed which doesn't include nodes which
are expected to be down?
Regards,
--
Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
127-59-707
Re: Decommissioned nodes and FailureDetector
Posted by Oleksandr Shulgin <ol...@zalando.de>.
On Fri, Jan 19, 2018 at 11:17 AM, Nicolas Guyomar <nicolas.guyomar@gmail.com
> wrote:
> Hi,
>
> Not sure if StorageService should be accessed, but you can check node
> movement here :
> 'org.apache.cassandra.db:type=StorageService/LeavingNodes',
> 'org.apache.cassandra.db:type=StorageService/LiveNodes',
> 'org.apache.cassandra.db:type=StorageService/UnreachableNodes',
>
Checking the list of Unreachable Nodes doesn't help unfortunately, since
it contains a mix of decommissioned and just DOWN nodes. So the total
number of addresses in this list is equal to the DownEndpointCount, from
the perspective of a node where you query it.
--
Alex
Re: Decommissioned nodes and FailureDetector
Posted by Nicolas Guyomar <ni...@gmail.com>.
Hi,
Not sure if StorageService should be accessed, but you can check node
movement here :
'org.apache.cassandra.db:type=StorageService/LeavingNodes',
'org.apache.cassandra.db:type=StorageService/LiveNodes',
'org.apache.cassandra.db:type=StorageService/UnreachableNodes',
'org.apache.cassandra.db:type=StorageService/MovingNodes'
On 19 January 2018 at 09:42, Oleksandr Shulgin <oleksandr.shulgin@zalando.de
> wrote:
> Hello,
>
> Is there a better way to monitor for Cassandra nodes going Down than
> querying via JMX for a condition like FailureDetector.DownEndpointCount >
> 0?
>
> The problem for us is when any node is decommissioned, it affects the
> DownEndpointCount for another ~3 days (the famous 72 hours of gossip).
>
> Is there a similar metric to be observed which doesn't include nodes which
> are expected to be down?
>
> Regards,
> --
> Oleksandr "Alex" Shulgin | Database Engineer | Zalando SE | Tel: +49 176
> 127-59-707 <+49%20176%2012759707>
>
>