You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (Resolved) (JIRA)" <ji...@apache.org> on 2011/11/03 15:15:32 UTC

[jira] [Resolved] (CASSANDRA-3316) Add a JMX call to force cleaning repair sessions (in case they are hang up)

     [ https://issues.apache.org/jira/browse/CASSANDRA-3316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-3316.
-----------------------------------------

    Resolution: Fixed
      Reviewer: slebresne

+1, committed.

I don't think it's worth adding a nodetool command (more precisely I think it's a feature that it's not too easy to trigger this) because we don't expect people to use that hopefully. It's more to have a solution available if it comes to that.
                
> Add a JMX call to force cleaning repair sessions (in case they are hang up)
> ---------------------------------------------------------------------------
>
>                 Key: CASSANDRA-3316
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3316
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 0.8.6
>            Reporter: Sylvain Lebresne
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 1.0.2
>
>         Attachments: 3316-v1.txt
>
>
> A repair session contains many parts, most of which are not local to the node (implying the node waits on those operation). You request merkle trees, then you schedule streaming (and in 1.0.0, some of the streaming don't involve the local node itself). It's lots of place where something can go wrong, and if so it leaves the repair hanging and as a consequence it leaves a repairSessions tasks sitting active on the 'AntiEntropy Session' executor.
> Obviously, we should improve the detection by repair of those things that can go wrong. CASSANDRA-2433 started and CASSANDRA-3112 is open to fill as much of the remaining parts as possible, but my bet is that it will be hard to cover everything (and it may not be worth of handling very improbable failure scenario). Besides CASSANDRA-3112 will involve change in the wire protocol, so it may take some time to be committed. In the meantime, it would be nice to provide a JMX call to force terminating repairSessions so that you don't end up in the case where you have enough 'zombie' sessions on the executor that you can't submit new ones (you could restart the node but it's ugly). Anyway, it's not a big issue but it would be simple to add such a JMX call.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira