You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Kjetil Valstadsve (JIRA)" <ji...@apache.org> on 2011/02/20 16:42:38 UTC

[jira] Updated: (CASSANDRA-2202) Better error reporting and meaningful toStrings

     [ https://issues.apache.org/jira/browse/CASSANDRA-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kjetil Valstadsve updated CASSANDRA-2202:
-----------------------------------------

    Attachment: errorreporting.txt

Various toStrings and exception rethrows.

> Better error reporting and meaningful toStrings
> -----------------------------------------------
>
>                 Key: CASSANDRA-2202
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2202
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.2
>            Reporter: Kjetil Valstadsve
>            Priority: Minor
>              Labels: exception, logging, stacktrace
>         Attachments: errorreporting.txt
>
>
> Having been through some trouble with the upgrade to 0.7.2 via 0.7.1, I've learned the hard way that Cassandra's error reporting can be improved.  In particular, I had issues with deserialization of bloom filters, and with some corrupt SSTables.
> Chatting with driftx on the IRC channel, I complained about lack of error reporting, and showed him a stacktrace that indicated a corrupted SSTable. I then added various toString implementations and some new try/catch/rethrow points in critical control flow, which revealed the name of the SSTable.  (It turned out to be a hints table, allowing driftx to advise me on the next step - namely delete the local tables and rely on read repair.) 
> I'm submitting a patch reflecting this work.  It addresses these issues, and is perhaps best taken as a suggested guideline, maybe as an eye-opener. 
> I generally advocate two practices/good habits wrt. error reporting:
> 1) Provide a good toString, reflecting the important state of an instance.  (In this case, file names carry valuable information.)
> 2) Trapping and reporting errors is more useful at some points of the control flow than in others. For instance, at a point in the code where we have gathered all relevant parameters and are about to embark on major I/O (like reading a table from disk).  In these points, a try/catch/throw can be used to capture this information.  It's ok to catch Exception here, if you ask me, as long as you wrap it as the cause of a new exception.  This new exception should describe the most important parameters involved in the failed call.  To do this, it should rely on the toStrings from item 1.
> 2a) The same goes for log messages.  Notice that the utility of a toString multiplies, as it will emit useful information wherever it's used for logging and/or exception messages.
> By following these points, you will get stack traces partitioned up with a chain of "Caused by" exceptions, each providing some information from its relevant layer in the call stack.
> I have applied this approach in the attached patch.  As a result, we were able to get the file name from a stacktrace that originally just said "EOFException", and the problem could be easily fixed. 
> Caveats:  Yes, you can get insanely long exception messages, and/or log messages.  To keep the post-mortem sessions manageable, constant tuning and re-tuning of toStrings and log/exception messages is needed. However, even a terrifyingly verbose exception won't be _less_ meaningful to the user than a nameless EOFException, and it will convey more information to the developers.  (Even if that information boils down to "this toString is way more verbose than it needs to be".)
> In short: This patch helps with detecting corrupt SSTables and bogus bloom filers, and I hope it inspires someone to give out more informative error messages as well. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira