You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2010/02/16 20:55:27 UTC

[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834418#action_12834418 ] 

Jonathan Ellis commented on CASSANDRA-800:
------------------------------------------

the IOException is the same as #657 and is harmless (fixed in trunk, not going to be fixed in 0.5).

the ConcurrentModificationException  may be causing the deadness problem.  (it also might be related to CASSANDRA-757 but the stacktrace is different.)

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.