You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2011/02/19 22:50:38 UTC
[jira] Assigned: (CASSANDRA-2201) Gossip synchronization issues
[ https://issues.apache.org/jira/browse/CASSANDRA-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams reassigned CASSANDRA-2201:
-------------------------------------------
Assignee: Brandon Williams
> Gossip synchronization issues
> -----------------------------
>
> Key: CASSANDRA-2201
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2201
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.6.12
> Environment: r1071793 (0.6.12)
> Ubuntu 9.10
> 24 node cluster.
> JNA enabled.
> java -version
> java version "1.6.0_21"
> Java(TM) SE Runtime Environment (build 1.6.0_21-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 17.0-b16, mixed mode)
> Reporter: Paul Querna
> Assignee: Brandon Williams
> Attachments: CASSANDRA-2201-jstack.txt
>
>
> After upgrading to 0.6.12ish, we noticed sometimes whole rows were being reported as missing from queries.
> It seemed random, and at first we thought there might be a wider problem in 0.6.12 -- but we found that one node of 24 had an incorrect gossip
> Correct nodetool ring output:
> {code}
> pquerna@cass0:/data/cassandra$ /data/cassandra/bin/nodetool -h localhost ring
> Address Status Load Owns Range Ring
> 163051967482949680409533666061055601315
> 172.21.2.222 Up 224.03 GB 4.17% 0 |<--|
> 10.177.192.88 Up 219.28 GB 4.17% 7089215977519551322153637654828504405 | ^
> 172.21.2.169 Up 225.93 GB 4.17% 14178431955039102644307275309657008810 v |
> 10.177.192.89 Up 225.91 GB 4.17% 21267647932558653966460912964485513215 | ^
> 172.21.3.116 Up 226.88 GB 4.17% 28356863910078205288614550619314017620 v |
> 10.177.192.90 Up 219.2 GB 4.17% 35446079887597756610768188274142522025 | ^
> 172.21.2.173 Up 227.44 GB 4.17% 42535295865117307932921825928971026430 v |
> 10.177.192.91 Up 182.44 GB 4.17% 49624511842636859255075463583799530835 | ^
> 172.21.2.223 Up 229.38 GB 4.17% 56713727820156410577229101238628035240 v |
> 10.177.192.225Up 193.1 GB 4.17% 63802943797675961899382738893456539645 | ^
> 172.21.3.115 Up 231.21 GB 4.17% 70892159775195513221536376548285044050 v |
> 10.177.192.226Up 194.33 GB 4.17% 77981375752715064543690014203113548455 | ^
> 172.21.1.32 Up 230.38 GB 4.17% 85070591730234615865843651857942052860 v |
> 10.177.192.227Up 196.34 GB 4.17% 92159807707754167187997289512770557265 | ^
> 172.21.2.224 Up 205.9 GB 4.17% 99249023685273718510150927167599061670 v |
> 10.177.192.228Up 191.82 GB 4.17% 106338239662793269832304564822427566075 | ^
> 172.21.3.117 Up 230.5 GB 4.17% 113427455640312821154458202477256070480 v |
> 10.177.192.229Up 193.2 GB 4.17% 120516671617832372476611840132084574885 | ^
> 172.21.0.26 Up 226.12 GB 4.17% 127605887595351923798765477786913079290 v |
> 10.177.192.230Up 187.28 GB 4.17% 134695103572871475120919115441741583695 | ^
> 172.21.2.225 Up 230.34 GB 4.17% 141784319550391026443072753096570088100 v |
> 10.177.192.231Up 188.05 GB 4.17% 148873535527910577765226390751398592505 | ^
> 172.21.3.119 Up 215.91 GB 4.17% 155962751505430129087380028406227096910 v |
> 10.177.192.232Up 217.41 GB 4.17% 163051967482949680409533666061055601315 |-->|
> {code}
> On the node that had a different nodetool ring output:
> {code}
> pquerna@cass11:~$ /data/cassandra/bin/nodetool -h localhost ring
> Address Status Load Owns Range Ring
> 163051967482949680409533666061055601315
> 172.21.2.222 Up 224.03 GB 4.17% 0 |<--|
> 172.21.2.169 Up 225.93 GB 8.33% 14178431955039102644307275309657008810 | ^
> 10.177.192.89 Up 225.91 GB 4.17% 21267647932558653966460912964485513215 v |
> 172.21.3.116 Up 226.88 GB 4.17% 28356863910078205288614550619314017620 | ^
> 10.177.192.90 Up 219.2 GB 4.17% 35446079887597756610768188274142522025 v |
> 172.21.2.173 Up 227.44 GB 4.17% 42535295865117307932921825928971026430 | ^
> 10.177.192.91 Up 182.44 GB 4.17% 49624511842636859255075463583799530835 v |
> 172.21.3.115 Up 231.21 GB 12.50% 70892159775195513221536376548285044050 | ^
> 172.21.1.32 Up 230.38 GB 8.33% 85070591730234615865843651857942052860 v |
> 10.177.192.227Up 196.34 GB 4.17% 92159807707754167187997289512770557265 | ^
> 10.177.192.228Up 191.82 GB 8.33% 106338239662793269832304564822427566075 v |
> 172.21.3.117 Up 230.5 GB 4.17% 113427455640312821154458202477256070480 | ^
> 10.177.192.229Up 193.2 GB 4.17% 120516671617832372476611840132084574885 v |
> 172.21.0.26 Up 226 GB 4.17% 127605887595351923798765477786913079290 | ^
> 10.177.192.230Up 187.28 GB 4.17% 134695103572871475120919115441741583695 v |
> 172.21.2.225 Up 230.34 GB 4.17% 141784319550391026443072753096570088100 | ^
> 10.177.192.231Up 188.05 GB 4.17% 148873535527910577765226390751398592505 v |
> 172.21.3.119 Up 215.91 GB 4.17% 155962751505430129087380028406227096910 | ^
> 10.177.192.232Up 217.41 GB 4.17% 163051967482949680409533666061055601315 |-->|
> {code}
> As you can see, it was missing 10.177.192.226 from the ring.
> On cass11, everything else looked fine, including nothing in pending/active tpstats.
> However, we did notice an exception on startup in the logs, on cass11
> {code}
> 2011-02-19_19:45:43.26906 INFO - Starting up server gossip
> 2011-02-19_19:45:43.39742 ERROR - Uncaught exception in thread Thread[Thread-11,5,main]
> 2011-02-19_19:45:43.39746 java.io.IOError: java.io.EOFException
> 2011-02-19_19:45:43.39747 at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:67)
> 2011-02-19_19:45:43.39748 Caused by: java.io.EOFException
> 2011-02-19_19:45:43.39749 at java.io.DataInputStream.readInt(DataInputStream.java:375)
> 2011-02-19_19:45:43.39750 at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:57)
> 2011-02-19_19:45:43.41481 INFO - Binding thrift service to /172.21.0.26:9160
> 2011-02-19_19:45:43.42050 INFO - Cassandra starting up...
> {code}
> driftx said that it should be harmless, but its the only thing I see different about that node.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira