You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stephen Mallette (Jira)" <ji...@apache.org> on 2020/05/27 17:28:00 UTC

[jira] [Commented] (CASSANDRA-15821) Metrics Documentation Enhancements

    [ https://issues.apache.org/jira/browse/CASSANDRA-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17117939#comment-17117939 ] 

Stephen Mallette commented on CASSANDRA-15821:
----------------------------------------------

I've pushed another batch of changes to my branch which cover now cover all the documented items in my spreadsheet (i.e. all the known metrics that I could identify after a simple cassandra initialization are now in the {{metrics.rst}}). 

A few odds and ends I noticed:

1. Seems a bit odd that {{DroppedMessageMetrics}} and {{MessagingMetrics}} aren't handled in a consistent fashion. The former use the {{Verb}} as the scope (which is nice) but the latter appends the {{Verb}} to the metric name itself (which seems less nice). I'm not sure what decisions led to this situation but I'd be curious to hear if anyone thinks this a concern at all.
2. ReadRepair.RepairedAsync does not appear to be in use? I could be missing something but it does not seem to be referenced in the code beyond its declaration. Could this be deleted?
3. {{DroppedMessageMetrics}} had PAGED_SLICE and RANGED_SLICE documented but they don't appear to be available. I couldn't quite isolate exactly when they were removed but I assume it's safe that that happened?
4. A minor point but I'd wonder what tolerance there is for making casing consistent throughout the metrics. For example, Client metrics has some a mixture of casing. For example, connectedNativeClients and AuthFailure - Would be nice to my eyes to change to "ConnectedNativeClients" in that example.


> Metrics Documentation Enhancements
> ----------------------------------
>
>                 Key: CASSANDRA-15821
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15821
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation/Website
>            Reporter: Stephen Mallette
>            Assignee: Stephen Mallette
>            Priority: Normal
>
> CASSANDRA-15582 involves quality around metrics and it was mentioned that reviewing and [improving documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst] around metrics would fall into that scope. Please consider some of this analysis in determining what improvements to make here:
> Please see [this spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing] that itemizes almost all of cassandra's metrics and whether they are documented or not (and other notes).  That spreadsheet is "almost all" because there are some metrics that don't seem to initialize as part of Cassandra startup (i was able to trigger some to initialize, but all were not immediately obvious). The missing metrics seem to be related to the following:
> * ThreadPool metrics - only some initialize at startup the list of which follow below
> * Streaming Metrics
> * HintedHandoff Metrics
> * HintsService Metrics
> Here are the ThreadPool scopes that get listed:
> {code}
> AntiEntropyStage
> CacheCleanupExecutor
> CompactionExecutor
> GossipStage
> HintsDispatcher
> MemtableFlushWriter
> MemtablePostFlush
> MemtableReclaimMemory
> MigrationStage
> MutationStage
> Native-Transport-Requests
> PendingRangeCalculator
> PerDiskMemtableFlushWriter_0
> ReadStage
> Repair-Task
> RequestResponseStage
> Sampler
> SecondaryIndexManagement
> ValidationExecutor
> ViewBuildExecutor
> {code}
> I noticed that Keyspace Metrics have this note: "Most of these metrics are the same as the Table Metrics above, only they are aggregated at the Keyspace level." I think I've isolated those metrics on table that are not on keyspace to specifically be:
> {code}
> BloomFilterFalsePositives
> BloomFilterFalseRatio
> BytesAnticompacted
> BytesFlushed
> BytesMutatedAnticompaction
> BytesPendingRepair
> BytesRepaired
> BytesUnrepaired
> CompactionBytesWritten
> CompressionRatio
> CoordinatorReadLatency
> CoordinatorScanLatency
> CoordinatorWriteLatency
> EstimatedColumnCountHistogram
> EstimatedPartitionCount
> EstimatedPartitionSizeHistogram
> KeyCacheHitRate
> LiveSSTableCount
> MaxPartitionSize
> MeanPartitionSize
> MinPartitionSize
> MutatedAnticompactionGauge
> PercentRepaired
> RowCacheHitOutOfRange
> RowCacheHit
> RowCacheMiss
> SpeculativeSampleLatencyNanos
> SyncTime
> WaitingOnFreeMemtableSpace
> DroppedMutations
> {code}
> Someone with greater knowledge of this area might consider it worth the effort to see if any of these metrics should be aggregated to the keyspace level in case they were inadvertently missed. In any case, perhaps the documentation could easily now reflect which metric names could be expected on Keyspace.
> The DroppedMessage metrics have a much larger body of scopes than just what were documented:
> {code}
> ASYMMETRIC_SYNC_REQ
> BATCH_REMOVE_REQ
> BATCH_REMOVE_RSP
> BATCH_STORE_REQ
> BATCH_STORE_RSP
> CLEANUP_MSG
> COUNTER_MUTATION_REQ
> COUNTER_MUTATION_RSP
> ECHO_REQ
> ECHO_RSP
> FAILED_SESSION_MSG
> FAILURE_RSP
> FINALIZE_COMMIT_MSG
> FINALIZE_PROMISE_MSG
> FINALIZE_PROPOSE_MSG
> GOSSIP_DIGEST_ACK
> GOSSIP_DIGEST_ACK2
> GOSSIP_DIGEST_SYN
> GOSSIP_SHUTDOWN
> HINT_REQ
> HINT_RSP
> INTERNAL_RSP
> MUTATION_REQ
> MUTATION_RSP
> PAXOS_COMMIT_REQ
> PAXOS_COMMIT_RSP
> PAXOS_PREPARE_REQ
> PAXOS_PREPARE_RSP
> PAXOS_PROPOSE_REQ
> PAXOS_PROPOSE_RSP
> PING_REQ
> PING_RSP
> PREPARE_CONSISTENT_REQ
> PREPARE_CONSISTENT_RSP
> PREPARE_MSG
> RANGE_REQ
> RANGE_RSP
> READ_REPAIR_REQ
> READ_REPAIR_RSP
> READ_REQ
> READ_RSP
> REPAIR_RSP
> REPLICATION_DONE_REQ
> REPLICATION_DONE_RSP
> REQUEST_RSP
> SCHEMA_PULL_REQ
> SCHEMA_PULL_RSP
> SCHEMA_PUSH_REQ
> SCHEMA_PUSH_RSP
> SCHEMA_VERSION_REQ
> SCHEMA_VERSION_RSP
> SNAPSHOT_MSG
> SNAPSHOT_REQ
> SNAPSHOT_RSP
> STATUS_REQ
> STATUS_RSP
> SYNC_REQ
> SYNC_RSP
> TRUNCATE_REQ
> TRUNCATE_RSP
> VALIDATION_REQ
> VALIDATION_RSP
> _SAMPLE
> _TEST_1
> _TEST_2
> _TRACE
> {code}
> I suppose I may yet be missing some metrics as my knowledge of what's available is limited to what I can get from JMX after cassandra initialization (and some initial starting commands) and what's int he documentation. If something is present that is missing from both then I won't know it's there.  Anyway, perhaps this issue can help build some discussion around the improvements that might be made given the analysis that has been provided so far. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org