You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonathan Colby <jo...@gmail.com> on 2011/04/08 16:19:04 UTC

Is the repair still going on or did it fail because of exceptions?

It seems on my cluster there are a few unserializable Rows.  I'm trying to run a repair on the nodes, but it also seems that the replica nodes have unreadable or unserializable rows.    The problem is, I cannot determine if the repair is still going on, or if was interrupted because of these errors.   It is unclear because nothing else related to the repair show up in the logs.  It's been about 5 hours and I also don't see anything happening when I perform a "nodetool netstats" on the nodes.  The nodetool repair command is still blocking from the console.

On the node I'm trying to repair, I see this after launching a "repair":

...
 INFO [manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc] 2011-04-08 11:41:55,520 AntiEntropyService.java (line 770) Waiting for repair requests: [#<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-7
9caafd7d3cc, /10.46.108.102, (DFS,main)>, #<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc, /10.46.108.101, (DFS,main)>, #<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc
, /10.46.108.100, (DFS,main)>, #<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc, /10.47.108.101, (DFS,main)>]
...

In the log of the node "10.46.108.102" where the repair tries to compare the replica data,   I see a couple of the below exceptions a few minutes later.    Are the exceptions bad enough to cause the repair to fail?


ERROR [CompactionExecutor:1] 2011-04-08 11:43:01,177 PrecompactedRow.java (line 82) Skipping row DecoratedKey(1782314446006375058060694305099335169, 4d657373616765456e726963686d656e743a31343236) in /va
r/lib/cassandra/data/DFS/main-f-177-Data.db
java.io.EOFException
        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)
        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
        at org.apache.cassandra.io.util.BufferedRandomAccessFile.readBytes(BufferedRandomAccessFile.java:268)
        at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:310)
        at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:267)
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94)
        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
        at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:176)
        at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
        at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
        at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
        at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
        at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
        at org.apache.cassandra.db.CompactionManager.doValidationCompaction(CompactionManager.java:803)
        at org.apache.cassandra.db.CompactionManager.access$800(CompactionManager.java:56)
        at org.apache.cassandra.db.CompactionManager$6.call(CompactionManager.java:358)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
ERROR [CompactionExecutor:1] 2011-04-08 11:43:53,762 PrecompactedRow.java (line 82) Skipping row DecoratedKey(8073554114801607394928746621229606383, 34393734663734382d316330302d346164372d613333372d3162
34303866613933333832) in /var/lib/cassandra/data/DFS/main-f-177-Data.db
java.io.EOFException
        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)
        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
        at org.apache.cassandra.io.util.BufferedRandomAccessFile.readBytes(BufferedRandomAccessFile.java:268)
        at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:310)
        at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:267)
:

nodetool netstats reports:

Mode: Normal
Not sending any streams.
Not receiving any streams.
Pool Name                    Active   Pending      Completed
Commands                        n/a         0         526207
Responses                       n/a         0        1747991 

Re: Is the repair still going on or did it fail because of exceptions?

Posted by Sylvain Lebresne <sy...@datastax.com>.
Sadly repair isn't very resilient to errors and has failed. There is a
few ticket open to improve this and repair in general but right now,
if any problems occurs during repairs, it will fail (and nodetool
repair won't return, so you could just ctrl-c).

Provided you're on a recent enough cassandra, I suggest you run scrub
on the node giving the error.

--
Sylvain

On Fri, Apr 8, 2011 at 9:19 AM, Jonathan Colby <jo...@gmail.com> wrote:
> It seems on my cluster there are a few unserializable Rows.  I'm trying to run a repair on the nodes, but it also seems that the replica nodes have unreadable or unserializable rows.    The problem is, I cannot determine if the repair is still going on, or if was interrupted because of these errors.   It is unclear because nothing else related to the repair show up in the logs.  It's been about 5 hours and I also don't see anything happening when I perform a "nodetool netstats" on the nodes.  The nodetool repair command is still blocking from the console.
>
> On the node I'm trying to repair, I see this after launching a "repair":
>
> ...
>  INFO [manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc] 2011-04-08 11:41:55,520 AntiEntropyService.java (line 770) Waiting for repair requests: [#<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-7
> 9caafd7d3cc, /10.46.108.102, (DFS,main)>, #<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc, /10.46.108.101, (DFS,main)>, #<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc
> , /10.46.108.100, (DFS,main)>, #<TreeRequest manual-repair-6160b400-2c82-4ccb-9451-79caafd7d3cc, /10.47.108.101, (DFS,main)>]
> ...
>
> In the log of the node "10.46.108.102" where the repair tries to compare the replica data,   I see a couple of the below exceptions a few minutes later.    Are the exceptions bad enough to cause the repair to fail?
>
>
> ERROR [CompactionExecutor:1] 2011-04-08 11:43:01,177 PrecompactedRow.java (line 82) Skipping row DecoratedKey(1782314446006375058060694305099335169, 4d657373616765456e726963686d656e743a31343236) in /va
> r/lib/cassandra/data/DFS/main-f-177-Data.db
> java.io.EOFException
>        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)
>        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
>        at org.apache.cassandra.io.util.BufferedRandomAccessFile.readBytes(BufferedRandomAccessFile.java:268)
>        at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:310)
>        at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:267)
>        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:94)
>        at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:35)
>        at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:176)
>        at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
>        at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:139)
>        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:108)
>        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:43)
>        at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)
>        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>        at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
>        at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
>        at org.apache.cassandra.db.CompactionManager.doValidationCompaction(CompactionManager.java:803)
>        at org.apache.cassandra.db.CompactionManager.access$800(CompactionManager.java:56)
>        at org.apache.cassandra.db.CompactionManager$6.call(CompactionManager.java:358)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> ERROR [CompactionExecutor:1] 2011-04-08 11:43:53,762 PrecompactedRow.java (line 82) Skipping row DecoratedKey(8073554114801607394928746621229606383, 34393734663734382d316330302d346164372d613333372d3162
> 34303866613933333832) in /var/lib/cassandra/data/DFS/main-f-177-Data.db
> java.io.EOFException
>        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:383)
>        at java.io.RandomAccessFile.readFully(RandomAccessFile.java:361)
>        at org.apache.cassandra.io.util.BufferedRandomAccessFile.readBytes(BufferedRandomAccessFile.java:268)
>        at org.apache.cassandra.utils.ByteBufferUtil.read(ByteBufferUtil.java:310)
>        at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:267)
> :
>
> nodetool netstats reports:
>
> Mode: Normal
> Not sending any streams.
> Not receiving any streams.
> Pool Name                    Active   Pending      Completed
> Commands                        n/a         0         526207
> Responses                       n/a         0        1747991