You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Kjetil Valstadsve (JIRA)" <ji...@apache.org> on 2011/02/20 16:40:38 UTC

[jira] Created: (CASSANDRA-2202) Better error reporting and meaningful toStrings

Better error reporting and meaningful toStrings
-----------------------------------------------

Key: CASSANDRA-2202
URL: https://issues.apache.org/jira/browse/CASSANDRA-2202
Project: Cassandra
Issue Type: Improvement
Components: Core
Affects Versions: 0.7.2
Reporter: Kjetil Valstadsve
Priority: Minor

Having been through some trouble with the upgrade to 0.7.2 via 0.7.1, I've learned the hard way that Cassandra's error reporting can be improved. In particular, I had issues with deserialization of bloom filters, and with some corrupt SSTables.

Chatting with driftx on the IRC channel, I complained about lack of error reporting, and showed him a stacktrace that indicated a corrupted SSTable. I then added various toString implementations and some new try/catch/rethrow points in critical control flow, which revealed the name of the SSTable. (It turned out to be a hints table, allowing driftx to advise me on the next step - namely delete the local tables and rely on read repair.)

I'm submitting a patch reflecting this work. It addresses these issues, and is perhaps best taken as a suggested guideline, maybe as an eye-opener.

I generally advocate two practices/good habits wrt. error reporting:

1) Provide a good toString, reflecting the important state of an instance. (In this case, file names carry valuable information.)

2) Trapping and reporting errors is more useful at some points of the control flow than in others. For instance, at a point in the code where we have gathered all relevant parameters and are about to embark on major I/O (like reading a table from disk). In these points, a try/catch/throw can be used to capture this information. It's ok to catch Exception here, if you ask me, as long as you wrap it as the cause of a new exception. This new exception should describe the most important parameters involved in the failed call. To do this, it should rely on the toStrings from item 1.

2a) The same goes for log messages. Notice that the utility of a toString multiplies, as it will emit useful information wherever it's used for logging and/or exception messages.

By following these points, you will get stack traces partitioned up with a chain of "Caused by" exceptions, each providing some information from its relevant layer in the call stack.

I have applied this approach in the attached patch. As a result, we were able to get the file name from a stacktrace that originally just said "EOFException", and the problem could be easily fixed.

Caveats: Yes, you can get insanely long exception messages, and/or log messages. To keep the post-mortem sessions manageable, constant tuning and re-tuning of toStrings and log/exception messages is needed. However, even a terrifyingly verbose exception won't be _less_ meaningful to the user than a nameless EOFException, and it will convey more information to the developers. (Even if that information boils down to "this toString is way more verbose than it needs to be".)

In short: This patch helps with detecting corrupt SSTables and bogus bloom filers, and I hope it inspires someone to give out more informative error messages as well.

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2202) Better error reporting and meaningful toStrings

Posted by "Hudson (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997235#comment-12997235 ] 

Hudson commented on CASSANDRA-2202:
-----------------------------------

Integrated in Cassandra-0.7 #300 (See [https://hudson.apache.org/hudson/job/Cassandra-0.7/300/])
    Better error reporting for sstable problems.
Patch by Kjetil Valstadsve, reviewed by brandonwilliams for
CASSANDRA-2202.


> Better error reporting and meaningful toStrings
> -----------------------------------------------
>
>                 Key: CASSANDRA-2202
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2202
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.2
>            Reporter: Kjetil Valstadsve
>            Priority: Minor
>              Labels: exception, logging, stacktrace
>         Attachments: errorreporting.txt
>
>
> Having been through some trouble with the upgrade to 0.7.2 via 0.7.1, I've learned the hard way that Cassandra's error reporting can be improved.  In particular, I had issues with deserialization of bloom filters, and with some corrupt SSTables.
> Chatting with driftx on the IRC channel, I complained about lack of error reporting, and showed him a stacktrace that indicated a corrupted SSTable. I then added various toString implementations and some new try/catch/rethrow points in critical control flow, which revealed the name of the SSTable.  (It turned out to be a hints table, allowing driftx to advise me on the next step - namely delete the local tables and rely on read repair.) 
> I'm submitting a patch reflecting this work.  It addresses these issues, and is perhaps best taken as a suggested guideline, maybe as an eye-opener. 
> I generally advocate two practices/good habits wrt. error reporting:
> 1) Provide a good toString, reflecting the important state of an instance.  (In this case, file names carry valuable information.)
> 2) Trapping and reporting errors is more useful at some points of the control flow than in others. For instance, at a point in the code where we have gathered all relevant parameters and are about to embark on major I/O (like reading a table from disk).  In these points, a try/catch/throw can be used to capture this information.  It's ok to catch Exception here, if you ask me, as long as you wrap it as the cause of a new exception.  This new exception should describe the most important parameters involved in the failed call.  To do this, it should rely on the toStrings from item 1.
> 2a) The same goes for log messages.  Notice that the utility of a toString multiplies, as it will emit useful information wherever it's used for logging and/or exception messages.
> By following these points, you will get stack traces partitioned up with a chain of "Caused by" exceptions, each providing some information from its relevant layer in the call stack.
> I have applied this approach in the attached patch.  As a result, we were able to get the file name from a stacktrace that originally just said "EOFException", and the problem could be easily fixed. 
> Caveats:  Yes, you can get insanely long exception messages, and/or log messages.  To keep the post-mortem sessions manageable, constant tuning and re-tuning of toStrings and log/exception messages is needed. However, even a terrifyingly verbose exception won't be _less_ meaningful to the user than a nameless EOFException, and it will convey more information to the developers.  (Even if that information boils down to "this toString is way more verbose than it needs to be".)
> In short: This patch helps with detecting corrupt SSTables and bogus bloom filers, and I hope it inspires someone to give out more informative error messages as well. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (CASSANDRA-2202) Better error reporting and meaningful toStrings

Posted by "Kjetil Valstadsve (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kjetil Valstadsve updated CASSANDRA-2202:
-----------------------------------------

    Attachment: errorreporting.txt

Various toStrings and exception rethrows.

> Better error reporting and meaningful toStrings
> -----------------------------------------------
>
>                 Key: CASSANDRA-2202
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2202
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.7.2
>            Reporter: Kjetil Valstadsve
>            Priority: Minor
>              Labels: exception, logging, stacktrace
>         Attachments: errorreporting.txt
>
>
> Having been through some trouble with the upgrade to 0.7.2 via 0.7.1, I've learned the hard way that Cassandra's error reporting can be improved.  In particular, I had issues with deserialization of bloom filters, and with some corrupt SSTables.
> Chatting with driftx on the IRC channel, I complained about lack of error reporting, and showed him a stacktrace that indicated a corrupted SSTable. I then added various toString implementations and some new try/catch/rethrow points in critical control flow, which revealed the name of the SSTable.  (It turned out to be a hints table, allowing driftx to advise me on the next step - namely delete the local tables and rely on read repair.) 
> I'm submitting a patch reflecting this work.  It addresses these issues, and is perhaps best taken as a suggested guideline, maybe as an eye-opener. 
> I generally advocate two practices/good habits wrt. error reporting:
> 1) Provide a good toString, reflecting the important state of an instance.  (In this case, file names carry valuable information.)
> 2) Trapping and reporting errors is more useful at some points of the control flow than in others. For instance, at a point in the code where we have gathered all relevant parameters and are about to embark on major I/O (like reading a table from disk).  In these points, a try/catch/throw can be used to capture this information.  It's ok to catch Exception here, if you ask me, as long as you wrap it as the cause of a new exception.  This new exception should describe the most important parameters involved in the failed call.  To do this, it should rely on the toStrings from item 1.
> 2a) The same goes for log messages.  Notice that the utility of a toString multiplies, as it will emit useful information wherever it's used for logging and/or exception messages.
> By following these points, you will get stack traces partitioned up with a chain of "Caused by" exceptions, each providing some information from its relevant layer in the call stack.
> I have applied this approach in the attached patch.  As a result, we were able to get the file name from a stacktrace that originally just said "EOFException", and the problem could be easily fixed. 
> Caveats:  Yes, you can get insanely long exception messages, and/or log messages.  To keep the post-mortem sessions manageable, constant tuning and re-tuning of toStrings and log/exception messages is needed. However, even a terrifyingly verbose exception won't be _less_ meaningful to the user than a nameless EOFException, and it will convey more information to the developers.  (Even if that information boils down to "this toString is way more verbose than it needs to be".)
> In short: This patch helps with detecting corrupt SSTables and bogus bloom filers, and I hope it inspires someone to give out more informative error messages as well. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira