You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stephen Mallette (Jira)" <ji...@apache.org> on 2020/05/26 19:19:00 UTC

[jira] [Assigned] (CASSANDRA-15821) Metrics Documentation Enhancements

     [ https://issues.apache.org/jira/browse/CASSANDRA-15821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stephen Mallette reassigned CASSANDRA-15821:
--------------------------------------------

    Assignee: Stephen Mallette

I've started with some initial changes here: 

https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15821

I mostly focused on Table/Keyspace metrics and ClientRequest metrics, adding those items noted as missing in the referenced spreadsheet. I've updated the spreadsheet accordingly to keep track of where things are.

At the risk of shifting a lot of things around I'd very much like to alphabetize the metric listings in the various tables. If anyone feels strongly against that for some reason, please let me know. I will save that particular change for my final steps with this issue. 

Note that I think there are some naming discrepancies among the Table/Keyspace metrics where the table and keyspace naming don't match for what I believe is the same metric:

* Table.SyncTime == Keyspace.RepairSyncTime
* Table.RepairedDataTrackingOverreadRows == Keyspace.RepairedOverreadRows
* Table.RepairedDataTrackingOverreadTime == Keyspace.RepairedOverreadTime
* Table.AllMemtablesHeapSize == Keyspace.AllMemtablesOnHeapDataSize
* Table.AllMemtablesOffHeapSize == Keyspace.AllMemtablesOffHeapDataSize
* Table.MemtableOnHeapSize == Keyspace.MemtableOnHeapDataSize
* Table.MemtableOffHeapSize == Keyspace.MemtableOffHeapDataSize

I've taken the liberty of documenting these items differently for now though I think it would be preferable to making the naming consistent (for which I could create another ticket). Unless there are objections to doing so, I will proceed in that fashion. I'm happy to hear any feedback - thanks!


> Metrics Documentation Enhancements
> ----------------------------------
>
>                 Key: CASSANDRA-15821
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15821
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Documentation/Website
>            Reporter: Stephen Mallette
>            Assignee: Stephen Mallette
>            Priority: Normal
>
> CASSANDRA-15582 involves quality around metrics and it was mentioned that reviewing and [improving documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst] around metrics would fall into that scope. Please consider some of this analysis in determining what improvements to make here:
> Please see [this spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing] that itemizes almost all of cassandra's metrics and whether they are documented or not (and other notes).  That spreadsheet is "almost all" because there are some metrics that don't seem to initialize as part of Cassandra startup (i was able to trigger some to initialize, but all were not immediately obvious). The missing metrics seem to be related to the following:
> * ThreadPool metrics - only some initialize at startup the list of which follow below
> * Streaming Metrics
> * HintedHandoff Metrics
> * HintsService Metrics
> Here are the ThreadPool scopes that get listed:
> {code}
> AntiEntropyStage
> CacheCleanupExecutor
> CompactionExecutor
> GossipStage
> HintsDispatcher
> MemtableFlushWriter
> MemtablePostFlush
> MemtableReclaimMemory
> MigrationStage
> MutationStage
> Native-Transport-Requests
> PendingRangeCalculator
> PerDiskMemtableFlushWriter_0
> ReadStage
> Repair-Task
> RequestResponseStage
> Sampler
> SecondaryIndexManagement
> ValidationExecutor
> ViewBuildExecutor
> {code}
> I noticed that Keyspace Metrics have this note: "Most of these metrics are the same as the Table Metrics above, only they are aggregated at the Keyspace level." I think I've isolated those metrics on table that are not on keyspace to specifically be:
> {code}
> BloomFilterFalsePositives
> BloomFilterFalseRatio
> BytesAnticompacted
> BytesFlushed
> BytesMutatedAnticompaction
> BytesPendingRepair
> BytesRepaired
> BytesUnrepaired
> CompactionBytesWritten
> CompressionRatio
> CoordinatorReadLatency
> CoordinatorScanLatency
> CoordinatorWriteLatency
> EstimatedColumnCountHistogram
> EstimatedPartitionCount
> EstimatedPartitionSizeHistogram
> KeyCacheHitRate
> LiveSSTableCount
> MaxPartitionSize
> MeanPartitionSize
> MinPartitionSize
> MutatedAnticompactionGauge
> PercentRepaired
> RowCacheHitOutOfRange
> RowCacheHit
> RowCacheMiss
> SpeculativeSampleLatencyNanos
> SyncTime
> WaitingOnFreeMemtableSpace
> DroppedMutations
> {code}
> Someone with greater knowledge of this area might consider it worth the effort to see if any of these metrics should be aggregated to the keyspace level in case they were inadvertently missed. In any case, perhaps the documentation could easily now reflect which metric names could be expected on Keyspace.
> The DroppedMessage metrics have a much larger body of scopes than just what were documented:
> {code}
> ASYMMETRIC_SYNC_REQ
> BATCH_REMOVE_REQ
> BATCH_REMOVE_RSP
> BATCH_STORE_REQ
> BATCH_STORE_RSP
> CLEANUP_MSG
> COUNTER_MUTATION_REQ
> COUNTER_MUTATION_RSP
> ECHO_REQ
> ECHO_RSP
> FAILED_SESSION_MSG
> FAILURE_RSP
> FINALIZE_COMMIT_MSG
> FINALIZE_PROMISE_MSG
> FINALIZE_PROPOSE_MSG
> GOSSIP_DIGEST_ACK
> GOSSIP_DIGEST_ACK2
> GOSSIP_DIGEST_SYN
> GOSSIP_SHUTDOWN
> HINT_REQ
> HINT_RSP
> INTERNAL_RSP
> MUTATION_REQ
> MUTATION_RSP
> PAXOS_COMMIT_REQ
> PAXOS_COMMIT_RSP
> PAXOS_PREPARE_REQ
> PAXOS_PREPARE_RSP
> PAXOS_PROPOSE_REQ
> PAXOS_PROPOSE_RSP
> PING_REQ
> PING_RSP
> PREPARE_CONSISTENT_REQ
> PREPARE_CONSISTENT_RSP
> PREPARE_MSG
> RANGE_REQ
> RANGE_RSP
> READ_REPAIR_REQ
> READ_REPAIR_RSP
> READ_REQ
> READ_RSP
> REPAIR_RSP
> REPLICATION_DONE_REQ
> REPLICATION_DONE_RSP
> REQUEST_RSP
> SCHEMA_PULL_REQ
> SCHEMA_PULL_RSP
> SCHEMA_PUSH_REQ
> SCHEMA_PUSH_RSP
> SCHEMA_VERSION_REQ
> SCHEMA_VERSION_RSP
> SNAPSHOT_MSG
> SNAPSHOT_REQ
> SNAPSHOT_RSP
> STATUS_REQ
> STATUS_RSP
> SYNC_REQ
> SYNC_RSP
> TRUNCATE_REQ
> TRUNCATE_RSP
> VALIDATION_REQ
> VALIDATION_RSP
> _SAMPLE
> _TEST_1
> _TEST_2
> _TRACE
> {code}
> I suppose I may yet be missing some metrics as my knowledge of what's available is limited to what I can get from JMX after cassandra initialization (and some initial starting commands) and what's int he documentation. If something is present that is missing from both then I won't know it's there.  Anyway, perhaps this issue can help build some discussion around the improvements that might be made given the analysis that has been provided so far. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org