You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alexis Lê-Quôc <al...@datadoghq.com> on 2011/04/27 22:39:07 UTC

0.7.4: Replication assertion error after removetoken, removetoken force and a restart

Hi,

I've been getting the following lately, every few seconds.

2011-04-27T20:21:18.299885+00:00 10.202.61.193 [MiscStage: 97] Error
in ThreadPoolExecutor
2011-04-27T20:21:18.299885+00:00 10.202.61.193 java.lang.AssertionError
2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
org.apache.cassandra.service.StorageService.confirmReplication(StorageService.java:1872)
2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
org.apache.cassandra.streaming.ReplicationFinishedVerbHandler.doVerb(ReplicationFinishedVerbHandler.java:38)
2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
java.lang.Thread.run(Thread.java:636)
2011-04-27T20:21:18.300555+00:00 10.202.61.193 [MiscStage: 97] Fatal
exception in thread Thread[MiscStage:97,5,main]

I see it coming from
 32 public class ReplicationFinishedVerbHandler implements IVerbHandler
 33 {
 34     private static Logger logger =
LoggerFactory.getLogger(ReplicationFinishedVerbHandler.class);
 35
 36     public void doVerb(Message msg, String id)
 37     {
 38         StorageService.instance.confirmReplication(msg.getFrom());
 39         Message response =
msg.getInternalReply(ArrayUtils.EMPTY_BYTE_ARRAY);
 40         if (logger.isDebugEnabled())
 41             logger.debug("Replying to " + id + "@" + msg.getFrom());
 42         MessagingService.instance().sendReply(response, id, msg.getFrom());
 43     }
 44 }

Before I dig deeper in the code, has anybody dealt with this before?

Thanks,

-- 
Alexis Lê-Quôc

Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart

Posted by aaron morton <aa...@thelastpickle.com>.

There is some confusion in the ring about nodes leaving. Check nodetool ring from every node and see if they agree. Check the logs to see if there is any information about node is sending the wrong message. 

Without knowing much more you could  try a rolling restart, but you may need a full restart see http://www.datastax.com/docs/0.7/troubleshooting/index#view-of-ring-differs-between-some-nodes if the ring state is different. 

Hope that helps. 
 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 21/08/2011, at 5:38 AM, Anand Somani wrote:

> 0.7.4/ 3 node cluster/ RF -3 /Quorum read/write
> 
> After I re-introduced a corrupted node, followed the process as (thanks to folks on the mailing list for helping me) listed on the operations wiki to handle failures.
> Still doing a cleanup on one node at this point. But I noticed that I am seeing this same exception appear 10/12 times in a minute, on an existing node (not the new one). I think it started around the removetoken.
> 
> How do I solve this, should I just restart this node? Any other cleanups/resets I need to do?
> 
> Thanks
> 
> 
> On Thu, Apr 28, 2011 at 2:26 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I *think* that code is used when one node tells others via gossip it is removing a token that is not it's own. The ode that receives information in gossip does some work and then replies to the first node with a REPLICATION_FINISHED message, which is the node I assume the error is happening on.
> 
> Have you been doing any moves / removes or additions or tokens/nodes?
> 
> Thanks
> Aaron
> 
> On 28 Apr 2011, at 08:39, Alexis Lê-Quôc wrote:
> 
> > Hi,
> >
> > I've been getting the following lately, every few seconds.
> >
> > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 [MiscStage: 97] Error
> > in ThreadPoolExecutor
> > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 java.lang.AssertionError
> > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> > org.apache.cassandra.service.StorageService.confirmReplication(StorageService.java:1872)
> > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> > org.apache.cassandra.streaming.ReplicationFinishedVerbHandler.doVerb(ReplicationFinishedVerbHandler.java:38)
> > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> > java.lang.Thread.run(Thread.java:636)
> > 2011-04-27T20:21:18.300555+00:00 10.202.61.193 [MiscStage: 97] Fatal
> > exception in thread Thread[MiscStage:97,5,main]
> >
> > I see it coming from
> > 32 public class ReplicationFinishedVerbHandler implements IVerbHandler
> > 33 {
> > 34     private static Logger logger =
> > LoggerFactory.getLogger(ReplicationFinishedVerbHandler.class);
> > 35
> > 36     public void doVerb(Message msg, String id)
> > 37     {
> > 38         StorageService.instance.confirmReplication(msg.getFrom());
> > 39         Message response =
> > msg.getInternalReply(ArrayUtils.EMPTY_BYTE_ARRAY);
> > 40         if (logger.isDebugEnabled())
> > 41             logger.debug("Replying to " + id + "@" + msg.getFrom());
> > 42         MessagingService.instance().sendReply(response, id, msg.getFrom());
> > 43     }
> > 44 }
> >
> > Before I dig deeper in the code, has anybody dealt with this before?
> >
> > Thanks,
> >
> > --
> > Alexis Lê-Quôc
> 
>

Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart

Posted by Anand Somani <me...@gmail.com>.

0.7.4/ 3 node cluster/ RF -3 /Quorum read/write

After I re-introduced a corrupted node, followed the process as (thanks to
folks on the mailing list for helping me) listed on the operations wiki to
handle failures.
Still doing a cleanup on one node at this point. But I noticed that I am
seeing this same exception appear 10/12 times in a minute, on an existing
node (not the new one). I think it started around the removetoken.

How do I solve this, should I just restart this node? Any other
cleanups/resets I need to do?

Thanks


On Thu, Apr 28, 2011 at 2:26 AM, aaron morton <aa...@thelastpickle.com>wrote:

> I *think* that code is used when one node tells others via gossip it is
> removing a token that is not it's own. The ode that receives information in
> gossip does some work and then replies to the first node with a
> REPLICATION_FINISHED message, which is the node I assume the error is
> happening on.
>
> Have you been doing any moves / removes or additions or tokens/nodes?
>
> Thanks
> Aaron
>
> On 28 Apr 2011, at 08:39, Alexis Lê-Quôc wrote:
>
> > Hi,
> >
> > I've been getting the following lately, every few seconds.
> >
> > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 [MiscStage: 97] Error
> > in ThreadPoolExecutor
> > 2011-04-27T20:21:18.299885+00:00 10.202.61.193 java.lang.AssertionError
> > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> >
> org.apache.cassandra.service.StorageService.confirmReplication(StorageService.java:1872)
> > 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> >
> org.apache.cassandra.streaming.ReplicationFinishedVerbHandler.doVerb(ReplicationFinishedVerbHandler.java:38)
> > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> >
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> > 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> > java.lang.Thread.run(Thread.java:636)
> > 2011-04-27T20:21:18.300555+00:00 10.202.61.193 [MiscStage: 97] Fatal
> > exception in thread Thread[MiscStage:97,5,main]
> >
> > I see it coming from
> > 32 public class ReplicationFinishedVerbHandler implements IVerbHandler
> > 33 {
> > 34     private static Logger logger =
> > LoggerFactory.getLogger(ReplicationFinishedVerbHandler.class);
> > 35
> > 36     public void doVerb(Message msg, String id)
> > 37     {
> > 38         StorageService.instance.confirmReplication(msg.getFrom());
> > 39         Message response =
> > msg.getInternalReply(ArrayUtils.EMPTY_BYTE_ARRAY);
> > 40         if (logger.isDebugEnabled())
> > 41             logger.debug("Replying to " + id + "@" + msg.getFrom());
> > 42         MessagingService.instance().sendReply(response, id,
> msg.getFrom());
> > 43     }
> > 44 }
> >
> > Before I dig deeper in the code, has anybody dealt with this before?
> >
> > Thanks,
> >
> > --
> > Alexis Lê-Quôc
>
>

Re: 0.7.4: Replication assertion error after removetoken, removetoken force and a restart

Posted by aaron morton <aa...@thelastpickle.com>.

I *think* that code is used when one node tells others via gossip it is removing a token that is not it's own. The ode that receives information in gossip does some work and then replies to the first node with a REPLICATION_FINISHED message, which is the node I assume the error is happening on. 

Have you been doing any moves / removes or additions or tokens/nodes?  

Thanks
Aaron

On 28 Apr 2011, at 08:39, Alexis Lê-Quôc wrote:

> Hi,
> 
> I've been getting the following lately, every few seconds.
> 
> 2011-04-27T20:21:18.299885+00:00 10.202.61.193 [MiscStage: 97] Error
> in ThreadPoolExecutor
> 2011-04-27T20:21:18.299885+00:00 10.202.61.193 java.lang.AssertionError
> 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> org.apache.cassandra.service.StorageService.confirmReplication(StorageService.java:1872)
> 2011-04-27T20:21:18.300038+00:00 10.202.61.193 10.202.61.193   at
> org.apache.cassandra.streaming.ReplicationFinishedVerbHandler.doVerb(ReplicationFinishedVerbHandler.java:38)
> 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:72)
> 2011-04-27T20:21:18.300047+00:00 10.202.61.193 10.202.61.193   at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> 2011-04-27T20:21:18.300055+00:00 10.202.61.193 10.202.61.193   at
> java.lang.Thread.run(Thread.java:636)
> 2011-04-27T20:21:18.300555+00:00 10.202.61.193 [MiscStage: 97] Fatal
> exception in thread Thread[MiscStage:97,5,main]
> 
> I see it coming from
> 32 public class ReplicationFinishedVerbHandler implements IVerbHandler
> 33 {
> 34     private static Logger logger =
> LoggerFactory.getLogger(ReplicationFinishedVerbHandler.class);
> 35
> 36     public void doVerb(Message msg, String id)
> 37     {
> 38         StorageService.instance.confirmReplication(msg.getFrom());
> 39         Message response =
> msg.getInternalReply(ArrayUtils.EMPTY_BYTE_ARRAY);
> 40         if (logger.isDebugEnabled())
> 41             logger.debug("Replying to " + id + "@" + msg.getFrom());
> 42         MessagingService.instance().sendReply(response, id, msg.getFrom());
> 43     }
> 44 }
> 
> Before I dig deeper in the code, has anybody dealt with this before?
> 
> Thanks,
> 
> -- 
> Alexis Lê-Quôc