You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Keith Thornhill <ke...@raptr.com> on 2010/05/20 07:05:17 UTC
Ring out of sync, cassandra_UnavailableException being thrown
in a 5 node cluster, i noticed in our client error log that one of the
nodes was consistently throwing cassandra_UnavailableException during
a read operation.
looking into jmx, it was obvious that one of the node's view of the
ring was out of sync.
$ nodetool -host 192.168.20.150 ring
Address Status Load Range
Ring
139508497374977076191526400448759597506
192.168.20.156Up 5.73 GB
733665530305941485083898696792520436 |<--|
192.168.20.158Up 3.41 GB
9629533262984150011756238989685472219 | ^
192.168.20.154Up 2.44 GB
31048334058970902242412812423471654868 v |
192.168.20.150Up 4.89 GB
105769574715070648260922426249777160699 | ^
192.168.20.152Up 5.24 GB
139508497374977076191526400448759597506 |-->|
$ nodetool -host 192.168.20.158 ring
Address Status Load Range
Ring
192.168.20.158Up 3.41 GB
9629533262984150011756238989685472219 |<--|
looking at the CF stats on that node, it is obvious that reads and
writes are happening, but i have to assume that those are coming from
proxy connections via the other nodes.
when restarting that node, the error logs in the other cluster nodes
show that they detect the server going away and then coming back into
the ring.
INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:39,448
OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:55,475
OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
INFO [GMFD:1] 2010-05-19 21:27:56,481 Gossiper.java (line 582) Node
/192.168.20.158 has restarted, now UP again
INFO [GMFD:1] 2010-05-19 21:27:56,482 StorageService.java (line 538)
Node /192.168.20.158 state jump to normal
any ideas on how to kick that node and remind it of its buddies?
thanks!
-keith
Re: Ring out of sync, cassandra_UnavailableException being thrown
Posted by Jonathan Ellis <jb...@gmail.com>.
Were you bootstrapping or otherwise moving nodes around?
I don't think anyone's tracked this bug down farther than "if you
restart the entire cluster, it goes away."
On Wed, May 19, 2010 at 10:05 PM, Keith Thornhill <ke...@raptr.com> wrote:
> in a 5 node cluster, i noticed in our client error log that one of the
> nodes was consistently throwing cassandra_UnavailableException during
> a read operation.
>
> looking into jmx, it was obvious that one of the node's view of the
> ring was out of sync.
>
> $ nodetool -host 192.168.20.150 ring
> Address Status Load Range
> Ring
>
> 139508497374977076191526400448759597506
> 192.168.20.156Up 5.73 GB
> 733665530305941485083898696792520436 |<--|
> 192.168.20.158Up 3.41 GB
> 9629533262984150011756238989685472219 | ^
> 192.168.20.154Up 2.44 GB
> 31048334058970902242412812423471654868 v |
> 192.168.20.150Up 4.89 GB
> 105769574715070648260922426249777160699 | ^
> 192.168.20.152Up 5.24 GB
> 139508497374977076191526400448759597506 |-->|
>
> $ nodetool -host 192.168.20.158 ring
> Address Status Load Range
> Ring
> 192.168.20.158Up 3.41 GB
> 9629533262984150011756238989685472219 |<--|
>
> looking at the CF stats on that node, it is obvious that reads and
> writes are happening, but i have to assume that those are coming from
> proxy connections via the other nodes.
>
> when restarting that node, the error logs in the other cluster nodes
> show that they detect the server going away and then coming back into
> the ring.
>
> INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:39,448
> OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
> INFO [WRITE-/192.168.20.158] 2010-05-19 21:27:55,475
> OutboundTcpConnection.java (line 102) error writing to /192.168.20.158
> INFO [GMFD:1] 2010-05-19 21:27:56,481 Gossiper.java (line 582) Node
> /192.168.20.158 has restarted, now UP again
> INFO [GMFD:1] 2010-05-19 21:27:56,482 StorageService.java (line 538)
> Node /192.168.20.158 state jump to normal
>
> any ideas on how to kick that node and remind it of its buddies?
>
> thanks!
> -keith
>
--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com