You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefan Miklosovic (Jira)" <ji...@apache.org> on 2021/09/17 16:08:00 UTC
[jira] [Comment Edited] (CASSANDRA-13460) Diag. Events: Add local persistency

    [ https://issues.apache.org/jira/browse/CASSANDRA-13460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17416781#comment-17416781 ] 

Stefan Miklosovic edited comment on CASSANDRA-13460 at 9/17/21, 4:07 PM:
-------------------------------------------------------------------------

Hi [~mck], I have implemented diagnostic events logging into Chronicle queues in this branch (1), it is quite a big patch and it is not finished yet fully but I think this is enough for the first evaluation and to discuss this earlier to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" which are used for subscription. Implementation-wise, before this patch, there was already a consumer which was putting everything into memory. I implement diagnostic event logger on Chronicle queues in such a way that it is just another consumer but by consuming these events we are putting them into Chronicle queue instead to some in-memory structures. Upon disabling this diagnostic logger, this consumer is just unsubscribed.

From user's point of view, diagnostic events functionality has to be enabled in order to be able to enable diagnostic logging. Logging into Chronicle queues is not possible if diagnostic framework is disabled. On the other hand, diagnostic logging into Chronicle queues might be enabled and disabled on demand, similarly as it is done for audit. However, regardless of diagnostic logging into Chronicle queues being enabled or disabled, they are always put into the memory as it was before. There is a JMX method via which a user may read these events on demand but they can not be read on demand  from arbitrary position from Chronicle queue if they are written to disk. Hence user can still inspect these events on the fly from in-memory buffer, as it was before, but they are all persisted to disk if he choose so.

I have also extracted the common parts of BinLogger into separate abstract class and I created org.apache.cassandra.log package where it is located. Audit logging and Diagnostic logging is very similar and I found myself repeating a lot of code all over again in order to implement this so I simplified it a lot. I have also extracted commont stuff for options too.

I have also implemented diagnosticlogviewer tool, similar to auditlogviewer - my question here is if we want to also make some "generic" tool which would audit and diagnostic viewers extend because right now it is basically the same stuff except few changes which are mostly cosmetic. Hence I would like to know if you think it makes sense to try to extract common parts.

I have also implemented nodetool commands for disable, enable diagnostic logging and for its status, similar to audit log.

I would love to hear your feedback here, especially about the overall high-level implementation I did here so I am not doing something which is might be eventually rejected because of different expectations.

(1) [https://github.com/instaclustr/cassandra/tree/CASSANDRA-13460-2]


was (Author: stefan.miklosovic):
Hi [~mck], I have implemented diagnostic events logging into Chronicle queues in this branch (1), it is quite a big patch and it is not finished yet fully but I think this is enough for the first evaluation and to discuss this earlier to avoid any communication and expectation issues.

The main "work" is done in DiagnosticEventService and DiagnosticEventPersistence.. DiagnosticEventPersistence is based on "consumers" which are used for subscription. Implementation-wise, before this patch, there was already a consumer which was putting everything into memory. I implement diagnostic event logger on Chronicle queues in such a way that it is just another consumer but by consuming these events we are putting them into Chronicle queue instead to some in-memory structures. Upon disabling this diagnostic logger, this consumer is just unsubscribed.

From user's point of view, diagnostic events functionality has to be enabled in order to be able to enable diagnostic logging. Logging into Chronicle queues is not possible if diagnostic framework is disabled. On the other hand, diagnostic logging into Chronicle queues might be enabled and disabled on demand, similarly as it is done for audit. However, regardless of diagnostic logging into Chronicle queues being enabled on disabled, they are always put into the memory as it was before. There is a JMX method via which a user may read these events on demand but they can not be read on demand  from arbitrary position from Chronicle queue if they are written to disk. Hence user can still inspect these events on the fly from in-memory buffer, as it was before, but they are all persisted to disk if he choose so.

I have also extracted the common parts of BinLogger into separate abstract class and I created org.apache.cassandra.log package where it is located. Audit logging and Diagnostic logging is very similar and I found myself repeated a lot of code all over again in order to implement this so I simplified it a lot. I have also extracted commont stuff for options too.

I have also implemented diagnosticlogviewer tool, similar to auditlogviewer - my question here is if we want to also make some "generic" tool which would audit and diagnostic viewers extend because right now it is basically the same stuff except few changes which are mostly cosmetic. Hence I would like to know if you think it makes sense to try to extract common parts.

I have also implemented nodetool commands for disable, enable diagnostic logging and for its status, similar to audit log.

I would love to hear your feedback here, especially about the overall high-level implementation I did here so I am not doing something which is might be eventually rejected because of different expectations.

(1) https://github.com/instaclustr/cassandra/tree/CASSANDRA-13460-2

> Diag. Events: Add local persistency
> -----------------------------------
>
>                 Key: CASSANDRA-13460
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13460
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Legacy/Observability
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>             Fix For: 4.x
>
>         Attachments: 0001-Add-persistency-for-events-to-system-keyspace.patch
>
>
> Some generated events will be rather less frequent but very useful for retroactive troubleshooting. E.g. all events related to bootstraping and gossip would probably be worth saving, as they might provide valuable insights and will consume very little resources in low quantities. Imaging if we could e.g. in case of CASSANDRA-13348 just ask the user to -run a tool like {{./bin/diagdump BootstrapEvent}} on each host, to get us a detailed log of all relevant events-  provide a dump of all events as described in the [documentation|https://github.com/spodkowinski/cassandra/blob/WIP-13460/doc/source/operating/diag_events.rst]. 
> This could be done by saving events white-listed in cassandra.yaml to a local table. Maybe using a TTL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org