You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ron Kuris (JIRA)" <ji...@apache.org> on 2015/06/01 21:38:17 UTC

[jira] [Updated] (CASSANDRA-9526) Provide a JMX hook to monitor phi values in the FailureDetector

     [ https://issues.apache.org/jira/browse/CASSANDRA-9526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ron Kuris updated CASSANDRA-9526:
---------------------------------
    Attachment: PHI-Log-Debug-When-Close.patch.txt
                PHI-Race-Condition.patch.txt
                Monitor-Phi-JMX.patch.txt

There are three patches here. The main fix is in Monitor-Phi-JMX.patch. This fully resolves the reported issue.

While inspecting this code, I noticed a small unlikely race condition. If two phi values come in at the same time as the first one for a host, one could be lost due to the way the values are being added to the Hashtable. The second patch resolves that window, by switching to a ConcurrentHashMap and using putIfAbsent to atomically check for a prior value.

I doubt this could actually happen in the wild but it's still good defensive coding. Also, it removes Hashtable which is always synchronized.

The third patch will start generating debug log messages when PHI starts getting close. It's a great way to see that phi_convict_threshold might be too low. It's not WARN or even INFO because this could generate a lot of logs, but arguably it could be. If someone has trouble with nodes going offline, they can turn up the debugging levels and see that phi_convict_threshold is the culprit.

There is also some other code cleanup in the Phi-Log-Debug-When-Close patch.

> Provide a JMX hook to monitor phi values in the FailureDetector
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-9526
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9526
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Ron Kuris
>             Fix For: 2.0.x
>
>         Attachments: Monitor-Phi-JMX.patch.txt, PHI-Log-Debug-When-Close.patch.txt, PHI-Race-Condition.patch.txt
>
>
> phi_convict_threshold can be tuned, but there's currently no way to monitor the phi values to see if you're getting close.
> The attached patch adds the ability to get these values via JMX.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)