You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Tuukka Luolamo (Created) (JIRA)" <ji...@apache.org> on 2011/10/12 02:35:12 UTC

[jira] [Created] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

If node fails to join a ring it will stay in joining state indefinately
-----------------------------------------------------------------------

                 Key: CASSANDRA-3351
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.8.6
         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
            Reporter: Tuukka Luolamo
            Priority: Trivial


While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Issue Comment Edited] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Tuukka Luolamo (Issue Comment Edited) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13125506#comment-13125506 ] 

Tuukka Luolamo edited comment on CASSANDRA-3351 at 10/12/11 12:48 AM:
----------------------------------------------------------------------

Also per driftx I ran gms trace and grabbed about 1 minute worth of log entries that I have attached here. Problem node ip is 10.82.211.8
                
      was (Author: tuke):
    Also per driftx I ran gms trace and grabbed about 1 minute worth of log entries that I have attached here.
                  
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.6
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Assignee: Brandon Williams
>            Priority: Trivial
>              Labels: nodetool
>         Attachments: cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Tuukka Luolamo (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tuukka Luolamo updated CASSANDRA-3351:
--------------------------------------

    Attachment: cassandra.log.gz

Also per driftx I ran gms trace and grabbed about 1 minute worth of log entries that I have attached here.
                
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.6
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Priority: Trivial
>              Labels: nodetool
>         Attachments: cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Tuukka Luolamo (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tuukka Luolamo updated CASSANDRA-3351:
--------------------------------------

    Description: While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.  (was: While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in state if I wish without joining a node with same ip or doing a whole cluster restart.)
    
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.6
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Priority: Trivial
>              Labels: nodetool
>         Attachments: cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "paul cannon (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133124#comment-13133124 ] 

paul cannon commented on CASSANDRA-3351:
----------------------------------------

+1
                
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.3
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Assignee: Brandon Williams
>            Priority: Trivial
>              Labels: gossip
>             Fix For: 0.8.8, 1.0.1
>
>         Attachments: 3351-trunk.txt, 3351.txt, cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Hudson (Commented) (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13133206#comment-13133206 ] 

Hudson commented on CASSANDRA-3351:
-----------------------------------

Integrated in Cassandra-0.8 #386 (See [https://builds.apache.org/job/Cassandra-0.8/386/])
    Prevent nodes that failed to join from being stuck in the joining state
indefinitely.
Patch by brandonwilliams, reviewed by Paul Cannon for CASSANDRA-3351

brandonwilliams : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1187578
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/gms/Gossiper.java

                
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.3
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Assignee: Brandon Williams
>            Priority: Trivial
>              Labels: gossip
>             Fix For: 0.8.8, 1.0.1
>
>         Attachments: 3351-trunk.txt, 3351.txt, cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3351:
----------------------------------------

    Attachment: 3351.txt

Two problems introduced in CASSANDRA-2496: the fatclient logic was changed a bit too much, and the isDeadState check was assuming fat clients had dead state, thus calling setHasToken to effectively not mark  them as fat clients.
                
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.6
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Assignee: Brandon Williams
>            Priority: Trivial
>              Labels: nodetool
>         Attachments: 3351.txt, cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3351:
----------------------------------------

             Reviewer: thepaul
    Affects Version/s:     (was: 0.8.6)
                       0.8.3
        Fix Version/s: 1.0.1
                       0.8.8
               Labels: gossip  (was: nodetool)
    
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.3
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Assignee: Brandon Williams
>            Priority: Trivial
>              Labels: gossip
>             Fix For: 0.8.8, 1.0.1
>
>         Attachments: 3351.txt, cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Brandon Williams (Assigned) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reassigned CASSANDRA-3351:
-------------------------------------------

    Assignee: Brandon Williams
    
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.6
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Assignee: Brandon Williams
>            Priority: Trivial
>              Labels: nodetool
>         Attachments: cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-3351) If node fails to join a ring it will stay in joining state indefinately

Posted by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/CASSANDRA-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-3351:
----------------------------------------

    Attachment: 3351-trunk.txt
    
> If node fails to join a ring it will stay in joining state indefinately
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-3351
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3351
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.8.3
>         Environment: xlarge ec2, Ubuntu 11.04 Natty, JNA
>            Reporter: Tuukka Luolamo
>            Assignee: Brandon Williams
>            Priority: Trivial
>              Labels: gossip
>             Fix For: 0.8.8, 1.0.1
>
>         Attachments: 3351-trunk.txt, 3351.txt, cassandra.log.gz
>
>
> While attempting to add a new node to my ring something went wrong and I had to terminate the node on ec2. After this the node keeps appearing in the ring command in "joining" state and never goes away. Per driftx on the Cassandra channel if I do a whole cluster restart it should go away, but since this is a production system this is not really possible. Additionally if I could join a node with same IP again this should go away, but being on ec2 this is not always easy. So not sure if this truly qualifies as a bug or more like a feature request, but I feel there should be a way to remove a node in any state if I wish without joining a node with same ip or doing a whole cluster restart.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira