You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ariel Weisberg (JIRA)" <ji...@apache.org> on 2015/09/02 15:05:47 UTC
[jira] [Comment Edited] (CASSANDRA-10241) Keep a separate production debug log for troubleshooting

    [ https://issues.apache.org/jira/browse/CASSANDRA-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14726197#comment-14726197 ] 

Ariel Weisberg edited comment on CASSANDRA-10241 at 9/2/15 1:05 PM:
--------------------------------------------------------------------

My goal with this suggestion is to eliminate the conflicting priorities of user visible human readable logging and more verbose less readable debug/trace logging. We can't eliminate the overhead of trace logging so we can't log everything. Statements that execute one per operation or per record and statements that dump large data structures aren't good candidates. 

Once you can be chatty in the log you can start logging state transitions that you almost never care about, but are helpful when the unexpected happens. Over time you can start to accrue more useful logging as you work through real failures.

We also really need go/no go measurements for what is logged via this mechanism so we know what it is costing us. Maybe a JMX metric that reports on the log throughput, statements/second, bytes/second with some historical window so we can diagnose issues related to poorly designed logging. Or we can just be careful.

bq. Related: is it time to switch to AsyncAppender?
One thing you can do with async appending is have error and warn log synchronously so you get a stronger guarantee errors are visible (such as before voluntarily terminating). That somewhat mitigates the risk of missing important messages.

I think that switching to async appending is less critical in C* because it doesn't have a thread per core design. If there are places we log while holding locks, well that's a different story. As long as the thread doing the logging is the only one impacted and we don't have a design where for whatever reason we have some special threads everyone may end up waiting on then it is fine.

Put another way it's 4th or 5th nine issue.


was (Author: aweisberg):
My goal with this suggestion is to eliminate the conflicting priorities of user visible human readable logging and more verbose less readable debug/trace logging. We can't eliminate the overhead of trace logging so we can't log everything. Statements that execute one per operation or per record and statements that dump large data structures aren't good candidates. 

Once you can be chatty in the log you can start logging state transitions that you almost never care about, but are helpful when the unexpected happens. Over time you can start to accrue more useful logging as you work through real failures.

We also really need go/no go measurements for what is logged via this mechanism so we know what it is costing us. Maybe a JMX metric that reports on the log throughput, statements/second, bytes/second with some historical window so we can diagnose issues related to poorly designed logging. Or we can just be careful.

bq. Related: is it time to switch to AsyncAppender?
One thing you can do with AsyncAppending is have error and warn log synchronously so you get a stronger guarantee errors are visible (such as before voluntarily terminating). That somewhat mitigates the risk of missing important messages.

I think that switching to async appending is less critical in C* because it doesn't have a thread per core design. If there are places we log while holding locks, well that's a different story. As long as the thread doing the logging is the only one impacted and we don't have a design where for whatever reason we have some special threads everyone may end up waiting on then it is fine.

Put another way it's 4th or 5th nine issue.

> Keep a separate production debug log for troubleshooting
> --------------------------------------------------------
>
>                 Key: CASSANDRA-10241
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10241
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Config
>            Reporter: Jonathan Ellis
>            Assignee: Brandon Williams
>             Fix For: 2.1.x
>
>
> [~aweisberg] had the suggestion to keep a separate debug log for aid in troubleshooting, not intended for regular human consumption but where we can log things that might help if something goes wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)