You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Michael Moores <mm...@real.com> on 2010/10/12 22:51:46 UTC

0.7 beta2 - Fatal exception from RowMutationVerbHandler

I have a cluster of 8 nodes with a replication factor of 3 and consistency of QUORUM.
When I stop one node in the cluster I end up with socket read timeouts to other nodes:
org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
The timeout is set to 4 seconds.

The recent write latency on the other nodes becomes elevated, reaching 5-6 seconds.
The load on the cluster is minimal.

I looked at the system log on an adjacent node in the ring and found this output:

 INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:36,712 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
 INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:50,336 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
 INFO [Timer-0] 2010-10-12 12:14:22,792 Gossiper.java (line 196) InetAddress /172.27.109.32 is now dead.
ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,917 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.nio.BufferUnderflowException
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,918 AbstractCassandraDaemon.java (line 88) Fatal exception in thread Thread[MUTATION_STAGE:1315,5,main]
java.nio.BufferUnderflowException
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
ERROR [MUTATION_STAGE:1605] 2010-10-12 12:14:28,919 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.nio.BufferUnderflowException
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
....
....

I restarted the previously stopped node, and the system recovers, but with a few more underlflow exceptions:

 INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 Gossiper.java (line 594) Node /172.27.109.32 has restarted, now UP again
 INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,537 HintedHandOffManager.java (line 196) Started hinted handoff for endpoint /172.27.109.32
 INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 StorageService.java (line 643) Node /172.27.109.32 state jump to normal
 INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,538 HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to endpoint /172.27.109.32
 INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,538 StorageService.java (line 650) Will not change my token ownership to /172.27.109.32
ERROR [MUTATION_STAGE:1635] 2010-10-12 12:15:45,083 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
java.nio.BufferUnderflowException
        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)

Re: 0.7 beta2 - Fatal exception from RowMutationVerbHandler

Posted by Michael Moores <mm...@real.com>.

I created CASSANDRA-1617

On Oct 12, 2010, at 1:51 PM, Michael Moores wrote:

> I have a cluster of 8 nodes with a replication factor of 3 and consistency of QUORUM.
> When I stop one node in the cluster I end up with socket read timeouts to other nodes:
> org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
> The timeout is set to 4 seconds.
> 
> The recent write latency on the other nodes becomes elevated, reaching 5-6 seconds.
> The load on the cluster is minimal.
> 
> I looked at the system log on an adjacent node in the ring and found this output:
> 
> INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:36,712 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
> INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:50,336 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
> INFO [Timer-0] 2010-10-12 12:14:22,792 Gossiper.java (line 196) InetAddress /172.27.109.32 is now dead.
> ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,917 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,918 AbstractCassandraDaemon.java (line 88) Fatal exception in thread Thread[MUTATION_STAGE:1315,5,main]
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ERROR [MUTATION_STAGE:1605] 2010-10-12 12:14:28,919 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ....
> ....
> 
> I restarted the previously stopped node, and the system recovers, but with a few more underlflow exceptions:
> 
> INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 Gossiper.java (line 594) Node /172.27.109.32 has restarted, now UP again
> INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,537 HintedHandOffManager.java (line 196) Started hinted handoff for endpoint /172.27.109.32
> INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 StorageService.java (line 643) Node /172.27.109.32 state jump to normal
> INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,538 HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to endpoint /172.27.109.32
> INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,538 StorageService.java (line 650) Will not change my token ownership to /172.27.109.32
> ERROR [MUTATION_STAGE:1635] 2010-10-12 12:15:45,083 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> 
> 
>

Re: 0.7 beta2 - Fatal exception from RowMutationVerbHandler

Posted by Gary Dusbabek <gd...@gmail.com>.

Looks like a bug with hinted-handoff.  Will you file a ticket?

Gary

On Tue, Oct 12, 2010 at 15:51, Michael Moores <mm...@real.com> wrote:
> I have a cluster of 8 nodes with a replication factor of 3 and consistency of QUORUM.
> When I stop one node in the cluster I end up with socket read timeouts to other nodes:
> org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
> The timeout is set to 4 seconds.
>
> The recent write latency on the other nodes becomes elevated, reaching 5-6 seconds.
> The load on the cluster is minimal.
>
> I looked at the system log on an adjacent node in the ring and found this output:
>
>  INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:36,712 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
>  INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:50,336 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
>  INFO [Timer-0] 2010-10-12 12:14:22,792 Gossiper.java (line 196) InetAddress /172.27.109.32 is now dead.
> ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,917 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,918 AbstractCassandraDaemon.java (line 88) Fatal exception in thread Thread[MUTATION_STAGE:1315,5,main]
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ERROR [MUTATION_STAGE:1605] 2010-10-12 12:14:28,919 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ....
> ....
>
> I restarted the previously stopped node, and the system recovers, but with a few more underlflow exceptions:
>
>  INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 Gossiper.java (line 594) Node /172.27.109.32 has restarted, now UP again
>  INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,537 HintedHandOffManager.java (line 196) Started hinted handoff for endpoint /172.27.109.32
>  INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 StorageService.java (line 643) Node /172.27.109.32 state jump to normal
>  INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,538 HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to endpoint /172.27.109.32
>  INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,538 StorageService.java (line 650) Will not change my token ownership to /172.27.109.32
> ERROR [MUTATION_STAGE:1635] 2010-10-12 12:15:45,083 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
>
>
>
>

Re: 0.7 beta2 - Fatal exception from RowMutationVerbHandler

Posted by Michael Moores <mm...@real.com>.

I created CASSANDRA-1617

On Oct 12, 2010, at 1:51 PM, Michael Moores wrote:

> I have a cluster of 8 nodes with a replication factor of 3 and consistency of QUORUM.
> When I stop one node in the cluster I end up with socket read timeouts to other nodes:
> org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
> The timeout is set to 4 seconds.
> 
> The recent write latency on the other nodes becomes elevated, reaching 5-6 seconds.
> The load on the cluster is minimal.
> 
> I looked at the system log on an adjacent node in the ring and found this output:
> 
> INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:36,712 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
> INFO [WRITE-kv2-app02.dev.real.com/172.27.109.32] 2010-10-12 12:13:50,336 OutboundTcpConnection.java (line 115) error writing to kv2-app02.dev.real.com/172.27.109.32
> INFO [Timer-0] 2010-10-12 12:14:22,792 Gossiper.java (line 196) InetAddress /172.27.109.32 is now dead.
> ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,917 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ERROR [MUTATION_STAGE:1315] 2010-10-12 12:14:24,918 AbstractCassandraDaemon.java (line 88) Fatal exception in thread Thread[MUTATION_STAGE:1315,5,main]
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ERROR [MUTATION_STAGE:1605] 2010-10-12 12:14:28,919 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> ....
> ....
> 
> I restarted the previously stopped node, and the system recovers, but with a few more underlflow exceptions:
> 
> INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 Gossiper.java (line 594) Node /172.27.109.32 has restarted, now UP again
> INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,537 HintedHandOffManager.java (line 196) Started hinted handoff for endpoint /172.27.109.32
> INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,537 StorageService.java (line 643) Node /172.27.109.32 state jump to normal
> INFO [HINTED-HANDOFF-POOL:1] 2010-10-12 12:15:44,538 HintedHandOffManager.java (line 252) Finished hinted handoff of 0 rows to endpoint /172.27.109.32
> INFO [GOSSIP_STAGE:1] 2010-10-12 12:15:44,538 StorageService.java (line 650) Will not change my token ownership to /172.27.109.32
> ERROR [MUTATION_STAGE:1635] 2010-10-12 12:15:45,083 DebuggableThreadPoolExecutor.java (line 103) Error in ThreadPoolExecutor
> java.nio.BufferUnderflowException
>        at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:127)
>        at java.nio.ByteBuffer.get(ByteBuffer.java:675)
>        at org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:62)
>        at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:619)
> 
> 
>