You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Wade Simmons (JIRA)" <ji...@apache.org> on 2010/07/16 22:29:52 UTC

[jira] Created: (CASSANDRA-1289) GossipTimerTask stops running if an Exception occurs

GossipTimerTask stops running if an Exception occurs
----------------------------------------------------

                 Key: CASSANDRA-1289
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1289
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.6.3, 0.6.2, 0.6.1, 0.6, 0.7
            Reporter: Wade Simmons


The GossipTimerTask run() method has a try/catch around its body, but it re-throws all Exceptions as RuntimeExceptions. This causes the GossipTimerTask to no longer run (due to the way the underlying Java Timer implementation works), stopping the periodic gossip status checks.

Combine this problem with a bug like CASSANDRA-757 (not yet fixed in 0.6.x) and you get into a state where the server keeps running, but gossip is no longer occurring, preventing node addition / removal from happening.

I see two potential choices:
1) Log the error but don't re-throw it so that the GossipTimerTask will continue to run on its next interval.
2) Shutdown the server, since continuing to run without gossip subtly breaks other functionality / knowledge of other nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-1289) GossipTimerTask stops running if an Exception occurs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12889554#action_12889554 ] 

Jonathan Ellis commented on CASSANDRA-1289:
-------------------------------------------

committed w/ changes since it was simple:

uses .error instead of .warn

uses .error(message, exception) so the entire stack trace will be logged

> GossipTimerTask stops running if an Exception occurs
> ----------------------------------------------------
>
>                 Key: CASSANDRA-1289
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1289
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6, 0.6.1, 0.6.2, 0.6.3
>            Reporter: Wade Simmons
>            Assignee: Brandon Williams
>             Fix For: 0.6.4
>
>         Attachments: 1289.txt
>
>
> The GossipTimerTask run() method has a try/catch around its body, but it re-throws all Exceptions as RuntimeExceptions. This causes the GossipTimerTask to no longer run (due to the way the underlying Java Timer implementation works), stopping the periodic gossip status checks.
> Combine this problem with a bug like CASSANDRA-757 (not yet fixed in 0.6.x) and you get into a state where the server keeps running, but gossip is no longer occurring, preventing node addition / removal from happening.
> I see two potential choices:
> 1) Log the error but don't re-throw it so that the GossipTimerTask will continue to run on its next interval.
> 2) Shutdown the server, since continuing to run without gossip subtly breaks other functionality / knowledge of other nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1289) GossipTimerTask stops running if an Exception occurs

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-1289:
----------------------------------------

    Attachment: 1289.txt

> GossipTimerTask stops running if an Exception occurs
> ----------------------------------------------------
>
>                 Key: CASSANDRA-1289
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1289
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6, 0.6.1, 0.6.2, 0.6.3
>            Reporter: Wade Simmons
>            Assignee: Brandon Williams
>             Fix For: 0.6.4
>
>         Attachments: 1289.txt
>
>
> The GossipTimerTask run() method has a try/catch around its body, but it re-throws all Exceptions as RuntimeExceptions. This causes the GossipTimerTask to no longer run (due to the way the underlying Java Timer implementation works), stopping the periodic gossip status checks.
> Combine this problem with a bug like CASSANDRA-757 (not yet fixed in 0.6.x) and you get into a state where the server keeps running, but gossip is no longer occurring, preventing node addition / removal from happening.
> I see two potential choices:
> 1) Log the error but don't re-throw it so that the GossipTimerTask will continue to run on its next interval.
> 2) Shutdown the server, since continuing to run without gossip subtly breaks other functionality / knowledge of other nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1289) GossipTimerTask stops running if an Exception occurs

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-1289:
----------------------------------------

    Attachment: 1289.txt

Patch to catch the exception and log it, as suggested in CASSANDRA-757

> GossipTimerTask stops running if an Exception occurs
> ----------------------------------------------------
>
>                 Key: CASSANDRA-1289
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1289
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6, 0.6.1, 0.6.2, 0.6.3, 0.7
>            Reporter: Wade Simmons
>         Attachments: 1289.txt
>
>
> The GossipTimerTask run() method has a try/catch around its body, but it re-throws all Exceptions as RuntimeExceptions. This causes the GossipTimerTask to no longer run (due to the way the underlying Java Timer implementation works), stopping the periodic gossip status checks.
> Combine this problem with a bug like CASSANDRA-757 (not yet fixed in 0.6.x) and you get into a state where the server keeps running, but gossip is no longer occurring, preventing node addition / removal from happening.
> I see two potential choices:
> 1) Log the error but don't re-throw it so that the GossipTimerTask will continue to run on its next interval.
> 2) Shutdown the server, since continuing to run without gossip subtly breaks other functionality / knowledge of other nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-1289) GossipTimerTask stops running if an Exception occurs

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-1289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-1289:
----------------------------------------

    Attachment:     (was: 1289.txt)

> GossipTimerTask stops running if an Exception occurs
> ----------------------------------------------------
>
>                 Key: CASSANDRA-1289
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-1289
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.6, 0.6.1, 0.6.2, 0.6.3
>            Reporter: Wade Simmons
>            Assignee: Brandon Williams
>             Fix For: 0.6.4
>
>         Attachments: 1289.txt
>
>
> The GossipTimerTask run() method has a try/catch around its body, but it re-throws all Exceptions as RuntimeExceptions. This causes the GossipTimerTask to no longer run (due to the way the underlying Java Timer implementation works), stopping the periodic gossip status checks.
> Combine this problem with a bug like CASSANDRA-757 (not yet fixed in 0.6.x) and you get into a state where the server keeps running, but gossip is no longer occurring, preventing node addition / removal from happening.
> I see two potential choices:
> 1) Log the error but don't re-throw it so that the GossipTimerTask will continue to run on its next interval.
> 2) Shutdown the server, since continuing to run without gossip subtly breaks other functionality / knowledge of other nodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.