You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Ryan King (JIRA)" <ji...@apache.org> on 2010/02/16 20:49:27 UTC

[jira] Created: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Spurious Gossip Up/Down and IO Errors
-------------------------------------

                 Key: CASSANDRA-800
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
             Project: Cassandra
          Issue Type: Bug
          Components: Core
    Affects Versions: 0.5
            Reporter: Ryan King
             Fix For: 0.5, 0.6, 0.7


We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.

on 10.209.23.110

WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
 WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
    at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
    at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
    at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
    at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)


on 10.209.23.80 about the same time


ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
java.util.ConcurrentModificationException
    at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
    at java.util.HashMap$KeyIterator.next(HashMap.java:883)
    at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
    at java.util.HashSet.<init>(HashSet.java:100)
    at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
    at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
    at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
    at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
    at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
    at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
    at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
    at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
    at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)


just before that:

INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.


and just after that:

INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP


We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:

git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834989#action_12834989 ] 

Jonathan Ellis commented on CASSANDRA-800:
------------------------------------------

Gary: CME is a correctness issue though, priority shouldn't affect that.

Ryan: OOME is indeed configured to kill the server by default in CassandraDaemon:

            public void uncaughtException(Thread t, Throwable e)
            {
                logger.error("Fatal exception in thread " + t, e);
                if (e instanceof OutOfMemoryError)
                {
                    System.exit(100);
                }

if we have a catch-all statement somewhere that is neutering that, the stacktrace should make that obvious.

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-800:
-------------------------------------

    Attachment: 800.txt

Fix for OOME not killing the server attached

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>         Attachments: 800.txt
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834418#action_12834418 ] 

Jonathan Ellis commented on CASSANDRA-800:
------------------------------------------

the IOException is the same as #657 and is harmless (fixed in trunk, not going to be fixed in 0.5).

the ConcurrentModificationException  may be causing the deadness problem.  (it also might be related to CASSANDRA-757 but the stacktrace is different.)

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-800:
----------------------------------------

    Assignee: Jaakko Laine

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis resolved CASSANDRA-800.
--------------------------------------

    Resolution: Fixed
      Assignee:     (was: Jaakko Laine)

i'm going to close this as a dupe of CASSANDRA-757 even though they are different errors, since the right fix for 757 will be using a concurrent structure, which will fix any other CMEs too.

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>             Fix For: 0.5
>
>         Attachments: 800.txt
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Ryan King (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834628#action_12834628 ] 

Ryan King commented on CASSANDRA-800:
-------------------------------------

I'm skeptical about it too, but I've seen stranger effects from OOME's. We've made some config changes to (hopefully) reduce heap size pressure. I'll let you know if that improves the situation or now.

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Gary Dusbabek (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834982#action_12834982 ] 

Gary Dusbabek commented on CASSANDRA-800:
-----------------------------------------

OOME would appear in the logs.  I think the JVM is loaded and isn't making the right decisions about which threads to service.  I've been able to duplicate these exact errors on my dev machine when I spin up 4 cassandra instances, set RF=3 and send it a heavy write load.

The gossip thread is rather important in that if it gets stalled when the node isn't really down and the cluster is under a heavy write load, it could lead to writes getting sent to other (already loaded) nodes causing a cascade of failures.  

We might want to see about putting it in it's own thread group and giving it a higher priority.

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834612#action_12834612 ] 

Jonathan Ellis commented on CASSANDRA-800:
------------------------------------------

Ryan added in IRC:

"this may be the root of the problems I was describing above -- some threads may be dying due to OOM"

I'm skeptical that OOM could cause CME though.

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Ryan King (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan King updated CASSANDRA-800:
--------------------------------

        Fix Version/s:     (was: 0.7)
                           (was: 0.6)
                           (was: 0.5)
    Affects Version/s: 0.7
                       0.6

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis updated CASSANDRA-800:
-------------------------------------

    Fix Version/s: 0.5

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Ryan King (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834973#action_12834973 ] 

Ryan King commented on CASSANDRA-800:
-------------------------------------

After reducing the heap pressure these errors appear to have gone away. I think it would be reasonable to attribute this behavior to hitting OOME's, which killed some threads, but not all of them.

I think it would be best to kill the server when we hit an OOME.

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (CASSANDRA-800) Spurious Gossip Up/Down and IO Errors

Posted by "Ryan King (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835004#action_12835004 ] 

Ryan King commented on CASSANDRA-800:
-------------------------------------

gary-

we did see OOME's in the logs:

ERROR [pool-1-thread-5016] 2010-02-13 02:50:05,872 CassandraDaemon.java (line 71) Fatal exception in thread Thread[pool-1-thread-5016,5,main]
java.lang.OutOfMemoryError: Java heap space
 WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 02:46:24,194 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.111:7000 remote=/10.209.23.84:37322]
ERROR [pool-1-thread-4994] 2010-02-13 02:45:29,807 CassandraDaemon.java (line 71) Fatal exception in thread Thread[pool-1-thread-4994,5,main]
java.lang.OutOfMemoryError: Java heap space
ERROR [main] 2010-02-13 02:45:17,044 CassandraDaemon.java (line 184) Exception encountered during startup.
ERROR [MESSAGE-DESERIALIZER-POOL:1] 2010-02-13 02:45:17,044 DebuggableThreadPoolExecutor.java (line 162) Error in executor futuretask
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space
    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
    at java.util.concurrent.FutureTask.get(FutureTask.java:83)
    at org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor.afterExecute(DebuggableThreadPoolExecutor.java:154)
    at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:888)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
    at java.lang.Thread.run(Thread.java:619)
Caused by: java.lang.OutOfMemoryError: Java heap space

> Spurious Gossip Up/Down and IO Errors
> -------------------------------------
>
>                 Key: CASSANDRA-800
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-800
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.5, 0.6, 0.7
>            Reporter: Ryan King
>            Assignee: Jaakko Laine
>             Fix For: 0.5
>
>
> We're seeing a lot of nodes flapping. It appears to possibly be a race condition in Gossip.
> on 10.209.23.110
> WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:52720]
> WARN [MESSAGING-SERVICE-POOL:1] 2010-02-13 01:18:22,976 TcpConnection.java (line 484) Problem reading from socket connected to : java.nio.channels.SocketChannel[connected local=/10.209.23.110:7000 remote=/10.209.23.80:36128]
>  WARN [MESSAGING-SERVICE-POOL:2] 2010-02-13 01:18:22,977 TcpConnection.java (line 485) Exception was generated at : 02/13/2010 01:18:22 on thread MESSAGING-SERVICE-POOL:2
> Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
> java.io.IOException: Reached an EOL or something bizzare occured. Reading from: /10.209.23.80 BufferSizeRemaining: 16
>     at org.apache.cassandra.net.io.StartState.doRead(StartState.java:44)
>     at org.apache.cassandra.net.io.ProtocolState.read(ProtocolState.java:39)
>     at org.apache.cassandra.net.io.TcpReader.read(TcpReader.java:95)
>     at org.apache.cassandra.net.TcpConnection$ReadWorkItem.run(TcpConnection.java:445)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> on 10.209.23.80 about the same time
> ERROR [pool-1-thread-4751] 2010-02-13 01:17:12,261 Cassandra.java (line 1096) Internal error processing batch_insert
> java.util.ConcurrentModificationException
>     at java.util.HashMap$HashIterator.nextEntry(HashMap.java:848)
>     at java.util.HashMap$KeyIterator.next(HashMap.java:883)
>     at java.util.AbstractCollection.addAll(AbstractCollection.java:305)
>     at java.util.HashSet.<init>(HashSet.java:100)
>     at org.apache.cassandra.gms.Gossiper.getLiveMembers(Gossiper.java:173)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedMapForEndpoints(AbstractReplicationStrategy.java:120)
>     at org.apache.cassandra.locator.AbstractReplicationStrategy.getHintedEndpoints(AbstractReplicationStrategy.java:78)
>     at org.apache.cassandra.service.StorageService.getHintedEndpointMap(StorageService.java:1186)
>     at org.apache.cassandra.service.StorageProxy.insertBlocking(StorageProxy.java:169)
>     at org.apache.cassandra.service.CassandraServer.doInsert(CassandraServer.java:466)
>     at org.apache.cassandra.service.CassandraServer.batch_insert(CassandraServer.java:445)
>     at org.apache.cassandra.service.Cassandra$Processor$batch_insert.process(Cassandra.java:1088)
>     at org.apache.cassandra.service.Cassandra$Processor.process(Cassandra.java:817)
>     at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:253)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>     at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>     at java.lang.Thread.run(Thread.java:619)
> just before that:
> INFO [Timer-1] 2010-02-13 01:17:12,070 Gossiper.java (line 194) InetAddress /10.209.21.223 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.217 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,257 Gossiper.java (line 194) InetAddress /10.209.21.216 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.21.215 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,258 Gossiper.java (line 194) InetAddress /10.209.23.82 is now dead.
> and just after that:
> INFO [Timer-1] 2010-02-13 01:17:12,261 Gossiper.java (line 194) InetAddress /10.209.23.81 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,293 Gossiper.java (line 194) InetAddress /10.209.23.79 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,304 Gossiper.java (line 194) InetAddress /10.209.21.204 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,307 Gossiper.java (line 194) InetAddress /10.209.21.197 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,308 Gossiper.java (line 194) InetAddress /10.209.21.245 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,309 Gossiper.java (line 194) InetAddress /10.209.21.242 is now dead.
> INFO [Timer-1] 2010-02-13 01:17:12,310 Gossiper.java (line 194) InetAddress /10.209.23.106 is now dead.
> INFO [GMFD:1] 2010-02-13 01:17:26,780 Log4jLogger.java (line 41) 02/13/2010 01:17:26 - Remaining bytes zero. Stopping deserialization in EndPointState.
> INFO [GMFD:1] 2010-02-13 01:17:26,784 Gossiper.java (line 543) InetAddress /10.209.21.204 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,785 Gossiper.java (line 543) InetAddress /10.209.23.106 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,786 Gossiper.java (line 543) InetAddress /10.209.21.197 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:26,800 Gossiper.java (line 543) InetAddress /10.209.21.216 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,808 Gossiper.java (line 543) InetAddress /10.209.21.217 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.223 is now UP
> INFO [GMFD:1] 2010-02-13 01:17:41,823 Gossiper.java (line 543) InetAddress /10.209.21.215 is now UP
> We're on 298a0e66ba66c5d2a1e5d4a70f2f619ae3fbf72a from git.apache.org, which claims to be:
> git-svn-id: https://svn.apache.org/repos/asf/incubator/cassandra/branches/cassandra-0.5@9035

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.