You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paul Querna (JIRA)" <ji...@apache.org> on 2011/02/19 22:08:38 UTC

[jira] Created: (CASSANDRA-2201) Gossip synchronization issues

Gossip synchronization issues
-----------------------------

                 Key: CASSANDRA-2201
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.6.12
         Environment: r1071793 (0.6.12)
Ubuntu 9.10
24 node cluster.
JNA enabled.

java -version
java version "1.6.0_21"
Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)

            Reporter: Paul Querna


After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.

It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip

Correct nodetool ring output:
{{
pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
Address       Status     Load          Owns    Range                                      Ring
                                               163051967482949680409533666061055601315    
172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
}}


On the node that had a different nodetool ring output:
{{
pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
Address       Status     Load          Owns    Range                                      Ring
                                               163051967482949680409533666061055601315    
172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
}}

As you can see, it was missing 10.177.192.226 from the ring.

On cass11, everything else looked fine, including nothing in pending/active tpstats.

However, we did notice an exception on startup in the logs, on cass11
{{
2011-02-19_19:45:43.26906 INFO - Starting up server gossip
2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
}}

driftx said that it should be harmless, but its the only thing I see different about that node.


-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12997966#comment-12997966 ] 

Gary Dusbabek commented on CASSANDRA-2201:
------------------------------------------

With Brandon's comment in mind, a debug log taken from the node after issuing `nodetool ring` would be helpful. It lists the tokens the node knows about (in StorageService).  I'm pretty sure this is happening because StorageService.tokenMetadata_ has incomplete information in it.

> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>            Assignee: Brandon Williams
>         Attachments: CASSANDRA-2201-jstack.txt, CASSANDRA-2201-startup-trace.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Assigned: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams reassigned CASSANDRA-2201:
-------------------------------------------

    Assignee: Brandon Williams

> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>            Assignee: Brandon Williams
>         Attachments: CASSANDRA-2201-jstack.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Paul Querna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Querna updated CASSANDRA-2201:
-----------------------------------

    Attachment: CASSANDRA-2201-startup-trace.txt

log from a node with a bad ring, TRACE logging enabled on log4j.logger.org.apache.cassandra.gms

After this start, this node saw the ring as the following:
{code}
$ /data/cassandra/bin/nodetool -h localhost ring
Address       Status     Load          Owns    Range                                      Ring
                                               163051967482949680409533666061055601315    
172.21.2.222  Up         224.19 GB     4.17%   0                                          |<--|
172.21.2.169  Up         226.34 GB     8.33%   14178431955039102644307275309657008810     |   ^
10.177.192.89 Up         226.05 GB     4.17%   21267647932558653966460912964485513215     v   |
172.21.3.116  Up         227.23 GB     4.17%   28356863910078205288614550619314017620     |   ^
172.21.2.173  Up         227.64 GB     8.33%   42535295865117307932921825928971026430     v   |
172.21.2.223  Up         229.14 GB     8.33%   56713727820156410577229101238628035240     |   ^
172.21.3.115  Up         231.46 GB     8.33%   70892159775195513221536376548285044050     v   |
10.177.192.226Up         194.35 GB     4.17%   77981375752715064543690014203113548455     |   ^
172.21.1.32   Up         230.77 GB     4.17%   85070591730234615865843651857942052860     v   |
10.177.192.227Up         196.68 GB     4.17%   92159807707754167187997289512770557265     |   ^
10.177.192.228Up         192.05 GB     8.33%   106338239662793269832304564822427566075    v   |
172.21.3.117  Up         230.9 GB      4.17%   113427455640312821154458202477256070480    |   ^
172.21.0.26   Up         226.33 GB     8.33%   127605887595351923798765477786913079290    v   |
10.177.192.230Up         187.47 GB     4.17%   134695103572871475120919115441741583695    |   ^
172.21.2.225  Up         230.63 GB     4.17%   141784319550391026443072753096570088100    v   |
10.177.192.231Up         188.33 GB     4.17%   148873535527910577765226390751398592505    |   ^
172.21.3.119  Up         216.22 GB     4.17%   155962751505430129087380028406227096910    v   |
10.177.192.232Up         217.72 GB     4.17%   163051967482949680409533666061055601315    |-->|
{code}

> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>            Assignee: Brandon Williams
>         Attachments: CASSANDRA-2201-jstack.txt, CASSANDRA-2201-startup-trace.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-2201) Gossip synchronization issues

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-2201.
---------------------------------------

    Resolution: Cannot Reproduce
      Assignee:     (was: Brandon Williams)

> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>         Attachments: CASSANDRA-2201-jstack.txt, CASSANDRA-2201-startup-trace.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Issue Comment Edited: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Paul Querna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996863#comment-12996863 ] 

Paul Querna edited comment on CASSANDRA-2201 at 2/19/11 9:33 PM:
-----------------------------------------------------------------

After more investigation, we found that 3 of 24 nodes had inconsistent rings.

-Restarting them each seems to have fixed this issue.-

Restarting sometimes fixed the issue.

      was (Author: pquerna):
    After more investigation, we found that 3 of 24 nodes had inconsistent rings.

Restarting them each seems to have fixed this issue.
  
> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>         Attachments: CASSANDRA-2201-jstack.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Paul Querna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Querna updated CASSANDRA-2201:
-----------------------------------

    Attachment: CASSANDRA-2201-jstack.txt

jstack of one of the other nodes we found in this state

> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>         Attachments: CASSANDRA-2201-jstack.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Issue Comment Edited: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996931#comment-12996931 ] 

Brandon Williams edited comment on CASSANDRA-2201 at 2/19/11 11:33 PM:
-----------------------------------------------------------------------

Note: this log appears to be a capture from stdout and not the file, since there are no class names or line numbers, which would make things slightly easier.

The one thing this log indicates is, gossip is not the problem.  There are 25 gossip digests being sent by this node, which is correct for a 24 node cluster before CASSANDRA-2092 since it sends its own digest twice.  There are also 23 unique 'is now UP' messages, which is also correct since it wouldn't have one for itself.

This leads me to believe the problem must be in either the storage service or the replication strategy where the range to endpoint map is constructed, but I don't see anything obvious there. Are you using RAS/PFEPS?

      was (Author: brandon.williams):
    Note: this log appears to be a capture from stdout and not the file, since there are no class names or line numbers, which would make things slightly easier.

The one thing this log indicates is, gossip is not the problem.  There are 25 gossip digests being sent by this node, which is correct for a 24 node cluster before CASSANDRA-2092 since it sends its own digest twice.  There are also 23 unique 'is now UP' messages, which is also correct since it wouldn't have one for itself.

This leads me to believe the problem must be in either the storage service or the replication where the range to endpoint map is constructed, but I don't see anything obvious there. 
  
> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>            Assignee: Brandon Williams
>         Attachments: CASSANDRA-2201-jstack.txt, CASSANDRA-2201-startup-trace.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Paul Querna (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996863#comment-12996863 ] 

Paul Querna commented on CASSANDRA-2201:
----------------------------------------

After more investigation, we found that 3 of 24 nodes had inconsistent rings.

Restarting them each seems to have fixed this issue.

> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Paul Querna (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Querna updated CASSANDRA-2201:
-----------------------------------

    Description: 
After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.

It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip

Correct nodetool ring output:
{code}
pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
Address       Status     Load          Owns    Range                                      Ring
                                               163051967482949680409533666061055601315    
172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
{code}

On the node that had a different nodetool ring output:
{code}
pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
Address       Status     Load          Owns    Range                                      Ring
                                               163051967482949680409533666061055601315    
172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
{code}

As you can see, it was missing 10.177.192.226 from the ring.

On cass11, everything else looked fine, including nothing in pending/active tpstats.

However, we did notice an exception on startup in the logs, on cass11
{code}
2011-02-19_19:45:43.26906 INFO - Starting up server gossip
2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
{code}

driftx said that it should be harmless, but its the only thing I see different about that node.


  was:
After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.

It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip

Correct nodetool ring output:
{{
pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
Address       Status     Load          Owns    Range                                      Ring
                                               163051967482949680409533666061055601315    
172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
}}


On the node that had a different nodetool ring output:
{{
pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
Address       Status     Load          Owns    Range                                      Ring
                                               163051967482949680409533666061055601315    
172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
}}

As you can see, it was missing 10.177.192.226 from the ring.

On cass11, everything else looked fine, including nothing in pending/active tpstats.

However, we did notice an exception on startup in the logs, on cass11
{{
2011-02-19_19:45:43.26906 INFO - Starting up server gossip
2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
}}

driftx said that it should be harmless, but its the only thing I see different about that node.



> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (CASSANDRA-2201) Gossip synchronization issues

Posted by "Brandon Williams (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12996931#comment-12996931 ] 

Brandon Williams commented on CASSANDRA-2201:
---------------------------------------------

Note: this log appears to be a capture from stdout and not the file, since there are no class names or line numbers, which would make things slightly easier.

The one thing this log indicates is, gossip is not the problem.  There are 25 gossip digests being sent by this node, which is correct for a 24 node cluster before CASSANDRA-2092 since it sends its own digest twice.  There are also 23 unique 'is now UP' messages, which is also correct since it wouldn't have one for itself.

This leads me to believe the problem must be in either the storage service or the replication where the range to endpoint map is constructed, but I don't see anything obvious there. 

> Gossip synchronization issues
> -----------------------------
>
>                 Key: CASSANDRA-2201
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.6.12
>         Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
>            Reporter: Paul Querna
>            Assignee: Brandon Williams
>         Attachments: CASSANDRA-2201-jstack.txt, CASSANDRA-2201-startup-trace.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 10.177.192.88 Up         219.28 GB     4.17%   7089215977519551322153637654828504405      |   ^
> 172.21.2.169  Up         225.93 GB     4.17%   14178431955039102644307275309657008810     v   |
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     |   ^
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     v   |
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     |   ^
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     v   |
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     |   ^
> 172.21.2.223  Up         229.38 GB     4.17%   56713727820156410577229101238628035240     v   |
> 10.177.192.225Up         193.1 GB      4.17%   63802943797675961899382738893456539645     |   ^
> 172.21.3.115  Up         231.21 GB     4.17%   70892159775195513221536376548285044050     v   |
> 10.177.192.226Up         194.33 GB     4.17%   77981375752715064543690014203113548455     |   ^
> 172.21.1.32   Up         230.38 GB     4.17%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 172.21.2.224  Up         205.9 GB      4.17%   99249023685273718510150927167599061670     v   |
> 10.177.192.228Up         191.82 GB     4.17%   106338239662793269832304564822427566075    |   ^
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    v   |
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    |   ^
> 172.21.0.26   Up         226.12 GB     4.17%   127605887595351923798765477786913079290    v   |
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    |   ^
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    v   |
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    |   ^
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    v   |
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address       Status     Load          Owns    Range                                      Ring
>                                                163051967482949680409533666061055601315    
> 172.21.2.222  Up         224.03 GB     4.17%   0                                          |<--|
> 172.21.2.169  Up         225.93 GB     8.33%   14178431955039102644307275309657008810     |   ^
> 10.177.192.89 Up         225.91 GB     4.17%   21267647932558653966460912964485513215     v   |
> 172.21.3.116  Up         226.88 GB     4.17%   28356863910078205288614550619314017620     |   ^
> 10.177.192.90 Up         219.2 GB      4.17%   35446079887597756610768188274142522025     v   |
> 172.21.2.173  Up         227.44 GB     4.17%   42535295865117307932921825928971026430     |   ^
> 10.177.192.91 Up         182.44 GB     4.17%   49624511842636859255075463583799530835     v   |
> 172.21.3.115  Up         231.21 GB     12.50%  70892159775195513221536376548285044050     |   ^
> 172.21.1.32   Up         230.38 GB     8.33%   85070591730234615865843651857942052860     v   |
> 10.177.192.227Up         196.34 GB     4.17%   92159807707754167187997289512770557265     |   ^
> 10.177.192.228Up         191.82 GB     8.33%   106338239662793269832304564822427566075    v   |
> 172.21.3.117  Up         230.5 GB      4.17%   113427455640312821154458202477256070480    |   ^
> 10.177.192.229Up         193.2 GB      4.17%   120516671617832372476611840132084574885    v   |
> 172.21.0.26   Up         226 GB        4.17%   127605887595351923798765477786913079290    |   ^
> 10.177.192.230Up         187.28 GB     4.17%   134695103572871475120919115441741583695    v   |
> 172.21.2.225  Up         230.34 GB     4.17%   141784319550391026443072753096570088100    |   ^
> 10.177.192.231Up         188.05 GB     4.17%   148873535527910577765226390751398592505    v   |
> 172.21.3.119  Up         215.91 GB     4.17%   155962751505430129087380028406227096910    |   ^
> 10.177.192.232Up         217.41 GB     4.17%   163051967482949680409533666061055601315    |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749       at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750       at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira