You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "T Jake Luciani (JIRA)" <ji...@apache.org> on 2012/12/05 21:23:58 UTC

[jira] [Created] (CASSANDRA-5032) Downed node looses it's host-id

T Jake Luciani created CASSANDRA-5032:
-----------------------------------------

             Summary: Downed node looses it's host-id
                 Key: CASSANDRA-5032
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 1.2.0 beta 3
            Reporter: T Jake Luciani
            Assignee: Brandon Williams


We took down one of our nodes for maintenance and during that time it seems the other nodes haves lost the downed nodes node id

We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"

{code}
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
DN  10.6.27.98        ?          256     126.3%            null                                  27
{code}

We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5032) Downed node loses its host-id

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-5032:
----------------------------------------

    Fix Version/s: 1.2.0 rc1
    
> Downed node loses its host-id
> -----------------------------
>
>                 Key: CASSANDRA-5032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 3
>            Reporter: T Jake Luciani
>            Assignee: Brandon Williams
>              Labels: vnodes
>             Fix For: 1.2.0 rc1
>
>
> We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id
> We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
> UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
> DN  10.6.27.98        ?          256     126.3%            null                                  27
> {code}
> We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5032) Downed node loses its host-id

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510990#comment-13510990 ] 

T Jake Luciani commented on CASSANDRA-5032:
-------------------------------------------

Looks straight fwd. I can test but this looks like I'll need to recreate the cluster/system table?
                
> Downed node loses its host-id
> -----------------------------
>
>                 Key: CASSANDRA-5032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 3
>            Reporter: T Jake Luciani
>            Assignee: Brandon Williams
>              Labels: vnodes
>             Fix For: 1.2.0 rc1
>
>         Attachments: 0001-rename-ring_id-to-host_id-for-clarity.txt, 0002-Load-host_ids-when-adding-saved-endpoints.txt
>
>
> We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id
> We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
> UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
> DN  10.6.27.98        ?          256     126.3%            null                                  27
> {code}
> We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-5032) Downed node loses its host-id

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13510992#comment-13510992 ] 

Brandon Williams commented on CASSANDRA-5032:
---------------------------------------------

I tested it and it works, but yeah, you'll have to wipe out the system table, or I think starting once with -Dcassandra.load_ring=false will work.
                
> Downed node loses its host-id
> -----------------------------
>
>                 Key: CASSANDRA-5032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 3
>            Reporter: T Jake Luciani
>            Assignee: Brandon Williams
>              Labels: vnodes
>             Fix For: 1.2.0 rc1
>
>         Attachments: 0001-rename-ring_id-to-host_id-for-clarity.txt, 0002-Load-host_ids-when-adding-saved-endpoints.txt
>
>
> We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id
> We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
> UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
> DN  10.6.27.98        ?          256     126.3%            null                                  27
> {code}
> We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5032) Downed node looses its host-id

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-5032:
----------------------------------------

    Summary: Downed node looses its host-id  (was: Downed node looses it's host-id)
    
> Downed node looses its host-id
> ------------------------------
>
>                 Key: CASSANDRA-5032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 3
>            Reporter: T Jake Luciani
>            Assignee: Brandon Williams
>              Labels: vnodes
>
> We took down one of our nodes for maintenance and during that time it seems the other nodes haves lost the downed nodes node id
> We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
> UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
> DN  10.6.27.98        ?          256     126.3%            null                                  27
> {code}
> We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5032) Downed node loses its host-id

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

T Jake Luciani updated CASSANDRA-5032:
--------------------------------------

    Summary: Downed node loses its host-id  (was: Downed node looses its host-id)
    
> Downed node loses its host-id
> -----------------------------
>
>                 Key: CASSANDRA-5032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 3
>            Reporter: T Jake Luciani
>            Assignee: Brandon Williams
>              Labels: vnodes
>
> We took down one of our nodes for maintenance and during that time it seems the other nodes haves lost the downed nodes node id
> We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
> UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
> DN  10.6.27.98        ?          256     126.3%            null                                  27
> {code}
> We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5032) Downed node loses its host-id

Posted by "T Jake Luciani (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

T Jake Luciani updated CASSANDRA-5032:
--------------------------------------

    Description: 
We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id

We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"

{code}
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
DN  10.6.27.98        ?          256     126.3%            null                                  27
{code}

We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.


  was:
We took down one of our nodes for maintenance and during that time it seems the other nodes haves lost the downed nodes node id

We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"

{code}
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
DN  10.6.27.98        ?          256     126.3%            null                                  27
{code}

We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.


    
> Downed node loses its host-id
> -----------------------------
>
>                 Key: CASSANDRA-5032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 3
>            Reporter: T Jake Luciani
>            Assignee: Brandon Williams
>              Labels: vnodes
>
> We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id
> We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
> UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
> DN  10.6.27.98        ?          256     126.3%            null                                  27
> {code}
> We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (CASSANDRA-5032) Downed node loses its host-id

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-5032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-5032:
----------------------------------------

    Attachment: 0002-Load-host_ids-when-adding-saved-endpoints.txt
                0001-rename-ring_id-to-host_id-for-clarity.txt

First patch renames 'ring_id' in peers to 'host_id' because that's what we call it everywhere else including nodetool, so we should be consistent.

Second patch updates the hostid in TMD when it updates the tokens.
                
> Downed node loses its host-id
> -----------------------------
>
>                 Key: CASSANDRA-5032
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5032
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.0 beta 3
>            Reporter: T Jake Luciani
>            Assignee: Brandon Williams
>              Labels: vnodes
>             Fix For: 1.2.0 rc1
>
>         Attachments: 0001-rename-ring_id-to-host_id-for-clarity.txt, 0002-Load-host_ids-when-adding-saved-endpoints.txt
>
>
> We took down one of our nodes for maintenance and during that time it seems the other nodes have lost the downed nodes node id
> We also see lots of hint assertion exceptions "Missing host ID for 10.6.27.98"
> {code}
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address           Load       Tokens  Owns (effective)  Host ID                               Rack
> UN  10.6.27.96        129.37 GB  256     140.0%            59f3df94-e551-45ce-a3b0-51462f3ea868  27
> UN  10.6.27.97        125.24 GB  256     133.7%            f5bb146c-db51-475c-a44f-9facf2f1ad6e  27
> DN  10.6.27.98        ?          256     126.3%            null                                  27
> {code}
> We restarted c* on the two other nodes that are up, my guess is the host id was lost on restart of those.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira