You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "C. Scott Andreas (JIRA)" <ji...@apache.org> on 2018/11/18 18:04:02 UTC

[jira] [Updated] (CASSANDRA-10245) Provide after the fact visibility into the reliability of the environment C* operates in

     [ https://issues.apache.org/jira/browse/CASSANDRA-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

C. Scott Andreas updated CASSANDRA-10245:
-----------------------------------------
    Component/s: Observability

> Provide after the fact visibility into the reliability of the environment C* operates in
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10245
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Observability
>            Reporter: Ariel Weisberg
>            Priority: Major
>             Fix For: 4.x
>
>
> I think that by default databases should not be completely dependent on operator provided tools for monitoring node and network health.
> The database should be able to detect and report on several dimensions of performance in its environment, and more specifically report on deviations from acceptable performance.
> * Node wide pauses
> * JVM wide pauses
> * Latency, and roundtrip time to all endpoints
> * Block device IO latency
> If flight recorder were available for use in production I would say as a start just turn that on, add jHiccup (inside and outside the server process), and a daemon inside the server to measure network performance between endpoints.
> FR is not available (requires a license in production) so instead focus on adding instrumentation for the most useful facets of flight recorder in diagnosing performance issues. I think we can get pretty far because what we need to do is not quite as undirected as the exploration FR and JMC facilitate.
> Until we dial in how we measure and how to signal without false positives I would expect this kind of logging to be in the background for post-hoc analysis.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org