You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Héctor Izquierdo Seliva <iz...@strands.com> on 2011/07/08 18:38:54 UTC

Corrupted data

Hi everyone,

I'm having thousands of these errors:

 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
CompactionManager.java (line 737) Non-fatal error reading row
(stacktrace follows)
java.io.IOError: java.io.IOException: Impossible row size
6292724931198053
	at
org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
	at
org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
	at org.apache.cassandra.db.compaction.CompactionManager.access
$600(CompactionManager.java:65)
	at org.apache.cassandra.db.compaction.CompactionManager
$3.call(CompactionManager.java:250)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor
$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor
$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: Impossible row size 6292724931198053
	... 9 more
 INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
CompactionManager.java (line 743) Retrying from row index; data is -8
bytes starting at 4735525245
 WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
CompactionManager.java (line 767) Retry failed too.  Skipping to next
row (retry's stacktrace follows)
java.io.IOError: java.io.EOFException: bloom filter claims to be
863794556 bytes, longer than entire row size -8


THis is during scrub, as I saw similar errors while in normal operation.
Is there anything I can do? It looks like I'm going to lose a ton of
data

Re: Corrupted data

Posted by Jonathan Ellis <jb...@gmail.com>.

Sounds like your non-repair workload is using too much of the heap.

Alternatively, you could have a very large supercolumn that causes the
OOM when it is read.

2011/7/9 Héctor Izquierdo Seliva <iz...@strands.com>:
> Hi Peter.
>
>  I have a problem with repair, and it's that it always brings the node
> doing the repairs down. I've tried setting index_interval to 5000, and
> it still dies with OutOfMemory errors, or even worse, it generates
> thousands of tiny sstables before dying.
>
> I've tried like 20 repairs during this week. None of them finished. This
> is on a 16GB machine using 12GB heap so it doesn't crash (too early).
>
>
> El sáb, 09-07-2011 a las 16:16 +0200, Peter Schuller escribió:
>> >> - Have you been running repair consistently ?
>> >
>> > Nop, only when something breaks
>>
>> This is unrelated to the problem you were asking about, but if you
>> never run delete, make sure you are aware of:
>>
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> http://wiki.apache.org/cassandra/DistributedDeletes
>>
>>
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Corrupted data

Posted by Héctor Izquierdo Seliva <iz...@strands.com>.

All the important stuff is using QUORUM. Normal operation uses around
3-4 GB of heap out of 6. I've also tried running repair on a per CF
basis, and still no luck. I've found it's faster to bootstrap a node
again than repairing it.

Once I have the cluster in a sane state I'll try running a repair as
part of normal operation and see if manages to finish.

Btw, we are not using super columns.

Thanks for the tips

El sáb, 09-07-2011 a las 17:57 -0700, aaron morton escribió:
> > Nop, only when something breaks
> Unless you've been working at QUORUM life is about to get trickier.  Repair is an essential part of running a cassandra cluster, without it you risk data loss and dead data coming back to life. 
> 
> If you have been writing at QUORUM, so have a reasonable expectation of data replication, the normal approach is to happily let scrub skip the rows, after scrub has completed a repair will see the data repaired using one of the other replicas. That's probably already happened as the scrub process skipped the rows when writing them out to the new files. 
> 
> Try to run repair. Try running it on a single CF to start with.
> 
> 
> Good luck
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 9 Jul 2011, at 16:45, Héctor Izquierdo Seliva wrote:
> 
> > Hi Peter.
> > 
> > I have a problem with repair, and it's that it always brings the node
> > doing the repairs down. I've tried setting index_interval to 5000, and
> > it still dies with OutOfMemory errors, or even worse, it generates
> > thousands of tiny sstables before dying.
> > 
> > I've tried like 20 repairs during this week. None of them finished. This
> > is on a 16GB machine using 12GB heap so it doesn't crash (too early).
> > 
> > 
> > El sáb, 09-07-2011 a las 16:16 +0200, Peter Schuller escribió:
> >>>> - Have you been running repair consistently ?
> >>> 
> >>> Nop, only when something breaks
> >> 
> >> This is unrelated to the problem you were asking about, but if you
> >> never run delete, make sure you are aware of:
> >> 
> >> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> >> http://wiki.apache.org/cassandra/DistributedDeletes
> >> 
> >> 
> > 
> > 
>

Re: Corrupted data

Posted by aaron morton <aa...@thelastpickle.com>.

> Nop, only when something breaks
Unless you've been working at QUORUM life is about to get trickier.  Repair is an essential part of running a cassandra cluster, without it you risk data loss and dead data coming back to life. 

If you have been writing at QUORUM, so have a reasonable expectation of data replication, the normal approach is to happily let scrub skip the rows, after scrub has completed a repair will see the data repaired using one of the other replicas. That's probably already happened as the scrub process skipped the rows when writing them out to the new files. 

Try to run repair. Try running it on a single CF to start with.


Good luck

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 9 Jul 2011, at 16:45, Héctor Izquierdo Seliva wrote:

> Hi Peter.
> 
> I have a problem with repair, and it's that it always brings the node
> doing the repairs down. I've tried setting index_interval to 5000, and
> it still dies with OutOfMemory errors, or even worse, it generates
> thousands of tiny sstables before dying.
> 
> I've tried like 20 repairs during this week. None of them finished. This
> is on a 16GB machine using 12GB heap so it doesn't crash (too early).
> 
> 
> El sáb, 09-07-2011 a las 16:16 +0200, Peter Schuller escribió:
>>>> - Have you been running repair consistently ?
>>> 
>>> Nop, only when something breaks
>> 
>> This is unrelated to the problem you were asking about, but if you
>> never run delete, make sure you are aware of:
>> 
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> http://wiki.apache.org/cassandra/DistributedDeletes
>> 
>> 
> 
>

Re: Corrupted data

Posted by Héctor Izquierdo Seliva <iz...@strands.com>.

Hi Peter.

 I have a problem with repair, and it's that it always brings the node
doing the repairs down. I've tried setting index_interval to 5000, and
it still dies with OutOfMemory errors, or even worse, it generates
thousands of tiny sstables before dying.

I've tried like 20 repairs during this week. None of them finished. This
is on a 16GB machine using 12GB heap so it doesn't crash (too early).


El sáb, 09-07-2011 a las 16:16 +0200, Peter Schuller escribió:
> >> - Have you been running repair consistently ?
> >
> > Nop, only when something breaks
> 
> This is unrelated to the problem you were asking about, but if you
> never run delete, make sure you are aware of:
> 
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> http://wiki.apache.org/cassandra/DistributedDeletes
> 
>

Re: Corrupted data

Posted by Yan Chunlu <sp...@gmail.com>.

oh the error seems from jmx


sorry but seems I dont have more error messages, the node repair just never
ends... and strace the process find out nothing, it is not doing anything.

is there anyway to get more information about this?  do I need to do a major
compaction on every column family? thanks!

On Mon, Jul 11, 2011 at 1:36 AM, aaron morton <aa...@thelastpickle.com>wrote:

> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
>
> see
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
>
> <http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds>
>
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> I cannot find that anywhere in the code base, can you provide some more
> information ?
>
> Cheers
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10 Jul 2011, at 03:26, Yan Chunlu wrote:
>
> I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes
> and didn't running node repair more than 10 days, did not aware of this is
> critical.  I run node repair recently and one of the node always hung...
> from log it seems doing nothing related to the repair.
>
> so I got two problems:
>
> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> then nothing.
>
> thanks!
>
> On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <
> peter.schuller@infidyne.com> wrote:
>
>> >> - Have you been running repair consistently ?
>> >
>> > Nop, only when something breaks
>>
>> This is unrelated to the problem you were asking about, but if you
>> never run delete, make sure you are aware of:
>>
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> http://wiki.apache.org/cassandra/DistributedDeletes
>>
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> 闫春路
>
>
>


-- 
Charles

Re: Corrupted data

Posted by Yan Chunlu <sp...@gmail.com>.

it has already run about 20 hours...

On Mon, Jul 11, 2011 at 1:36 AM, aaron morton <aa...@thelastpickle.com>wrote:

> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
>
> see
> http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds
>
>
> <http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds>
>
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> I cannot find that anywhere in the code base, can you provide some more
> information ?
>
> Cheers
>
>  -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 10 Jul 2011, at 03:26, Yan Chunlu wrote:
>
> I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes
> and didn't running node repair more than 10 days, did not aware of this is
> critical.  I run node repair recently and one of the node always hung...
> from log it seems doing nothing related to the repair.
>
> so I got two problems:
>
> 1) do I need to treat every node as failure and do a rolling replacement?
>  since there might be some inconsistent in the cluster even I have no way to
> find out.
> 2) is that the reason that caused the node repair hung? the log message
> says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException:
> Read timed out
>
> then nothing.
>
> thanks!
>
> On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <
> peter.schuller@infidyne.com> wrote:
>
>> >> - Have you been running repair consistently ?
>> >
>> > Nop, only when something breaks
>>
>> This is unrelated to the problem you were asking about, but if you
>> never run delete, make sure you are aware of:
>>
>> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
>> http://wiki.apache.org/cassandra/DistributedDeletes
>>
>>
>> --
>> / Peter Schuller
>>
>
>
>
> --
> 闫春路
>
>
>


-- 
Charles

Re: Corrupted data

Posted by aaron morton <aa...@thelastpickle.com>.

> 1) do I need to treat every node as failure and do a rolling replacement?  since there might be some inconsistent in the cluster even I have no way to find out.
see http://wiki.apache.org/cassandra/Operations#Dealing_with_the_consequences_of_nodetool_repair_not_running_within_GCGraceSeconds

> 2) is that the reason that caused the node repair hung? the log message says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read timed out
I cannot find that anywhere in the code base, can you provide some more information ? 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 10 Jul 2011, at 03:26, Yan Chunlu wrote:

> I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes and didn't running node repair more than 10 days, did not aware of this is critical.  I run node repair recently and one of the node always hung... from log it seems doing nothing related to the repair.
> 
> so I got two problems:
> 
> 1) do I need to treat every node as failure and do a rolling replacement?  since there might be some inconsistent in the cluster even I have no way to find out.
> 2) is that the reason that caused the node repair hung? the log message says:
> Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read timed out
> 
> then nothing.
> 
> thanks!
> 
> On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <pe...@infidyne.com> wrote:
> >> - Have you been running repair consistently ?
> >
> > Nop, only when something breaks
> 
> This is unrelated to the problem you were asking about, but if you
> never run delete, make sure you are aware of:
> 
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> http://wiki.apache.org/cassandra/DistributedDeletes
> 
> 
> --
> / Peter Schuller
> 
> 
> 
> -- 
> 闫春路

Re: Corrupted data

Posted by Yan Chunlu <sp...@gmail.com>.

I am running RF=2(I have changed it from 2->3 and back to 2) and 3 nodes and
didn't running node repair more than 10 days, did not aware of this is
critical.  I run node repair recently and one of the node always hung...
from log it seems doing nothing related to the repair.

so I got two problems:

1) do I need to treat every node as failure and do a rolling replacement?
 since there might be some inconsistent in the cluster even I have no way to
find out.
2) is that the reason that caused the node repair hung? the log message
says:
Jul 10, 2011 4:40:35 AM ClientCommunicatorAdmin Checker-run
WARNING: Failed to check the connection: java.net.SocketTimeoutException:
Read timed out

then nothing.

thanks!

On Sat, Jul 9, 2011 at 10:16 PM, Peter Schuller <peter.schuller@infidyne.com
> wrote:

> >> - Have you been running repair consistently ?
> >
> > Nop, only when something breaks
>
> This is unrelated to the problem you were asking about, but if you
> never run delete, make sure you are aware of:
>
> http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
> http://wiki.apache.org/cassandra/DistributedDeletes
>
>
> --
> / Peter Schuller
>

-- 
闫春路

Re: Corrupted data

Posted by Peter Schuller <pe...@infidyne.com>.

>> - Have you been running repair consistently ?
>
> Nop, only when something breaks

This is unrelated to the problem you were asking about, but if you
never run delete, make sure you are aware of:

http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair
http://wiki.apache.org/cassandra/DistributedDeletes


-- 
/ Peter Schuller

Re: Corrupted data

Posted by Héctor Izquierdo Seliva <iz...@strands.com>.

Hi Aaron,

El vie, 08-07-2011 a las 14:47 -0700, aaron morton escribió:
> You may not lose data. 
> 
> - What version and whats the upgrade history?

all versions from 0.7.1 to 0.8.1. All cfs were in 0.8.1 format though

> - What RF / node count / CL  ?

RF=3, node count = 6
> - Have you been running repair consistently ?

Nop, only when something breaks

> - Is this on a single node or all nodes ?

A couple of nodes. Scrub told there were a few thousand of columns it
could not restore.
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote:
> 
> > Hi everyone,
> > 
> > I'm having thousands of these errors:
> > 
> > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> > CompactionManager.java (line 737) Non-fatal error reading row
> > (stacktrace follows)
> > java.io.IOError: java.io.IOException: Impossible row size
> > 6292724931198053
> > 	at
> > org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
> > 	at
> > org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
> > 	at org.apache.cassandra.db.compaction.CompactionManager.access
> > $600(CompactionManager.java:65)
> > 	at org.apache.cassandra.db.compaction.CompactionManager
> > $3.call(CompactionManager.java:250)
> > 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> > 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> > 	at java.util.concurrent.ThreadPoolExecutor
> > $Worker.runTask(ThreadPoolExecutor.java:886)
> > 	at java.util.concurrent.ThreadPoolExecutor
> > $Worker.run(ThreadPoolExecutor.java:908)
> > 	at java.lang.Thread.run(Thread.java:662)
> > Caused by: java.io.IOException: Impossible row size 6292724931198053
> > 	... 9 more
> > INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
> > CompactionManager.java (line 743) Retrying from row index; data is -8
> > bytes starting at 4735525245
> > WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> > CompactionManager.java (line 767) Retry failed too.  Skipping to next
> > row (retry's stacktrace follows)
> > java.io.IOError: java.io.EOFException: bloom filter claims to be
> > 863794556 bytes, longer than entire row size -8
> > 
> > 
> > THis is during scrub, as I saw similar errors while in normal operation.
> > Is there anything I can do? It looks like I'm going to lose a ton of
> > data
> > 
>

Re: Corrupted data

Posted by aaron morton <aa...@thelastpickle.com>.

You may not lose data. 

- What version and whats the upgrade history?
- What RF / node count / CL  ?
- Have you been running repair consistently ?
- Is this on a single node or all nodes ?

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 8 Jul 2011, at 09:38, Héctor Izquierdo Seliva wrote:

> Hi everyone,
> 
> I'm having thousands of these errors:
> 
> WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 737) Non-fatal error reading row
> (stacktrace follows)
> java.io.IOError: java.io.IOException: Impossible row size
> 6292724931198053
> 	at
> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
> 	at
> org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
> 	at org.apache.cassandra.db.compaction.CompactionManager.access
> $600(CompactionManager.java:65)
> 	at org.apache.cassandra.db.compaction.CompactionManager
> $3.call(CompactionManager.java:250)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor
> $Worker.runTask(ThreadPoolExecutor.java:886)
> 	at java.util.concurrent.ThreadPoolExecutor
> $Worker.run(ThreadPoolExecutor.java:908)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Impossible row size 6292724931198053
> 	... 9 more
> INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 743) Retrying from row index; data is -8
> bytes starting at 4735525245
> WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 767) Retry failed too.  Skipping to next
> row (retry's stacktrace follows)
> java.io.IOError: java.io.EOFException: bloom filter claims to be
> 863794556 bytes, longer than entire row size -8
> 
> 
> THis is during scrub, as I saw similar errors while in normal operation.
> Is there anything I can do? It looks like I'm going to lose a ton of
> data
>

Re: Corrupted data

Posted by Jonathan Ellis <jb...@gmail.com>.

That looks a lot like what I've seen from machines with bad ram.

2011/7/8 Héctor Izquierdo Seliva <iz...@strands.com>:
> Hi everyone,
>
> I'm having thousands of these errors:
>
>  WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 737) Non-fatal error reading row
> (stacktrace follows)
> java.io.IOError: java.io.IOException: Impossible row size
> 6292724931198053
>        at
> org.apache.cassandra.db.compaction.CompactionManager.scrubOne(CompactionManager.java:719)
>        at
> org.apache.cassandra.db.compaction.CompactionManager.doScrub(CompactionManager.java:633)
>        at org.apache.cassandra.db.compaction.CompactionManager.access
> $600(CompactionManager.java:65)
>        at org.apache.cassandra.db.compaction.CompactionManager
> $3.call(CompactionManager.java:250)
>        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>        at java.util.concurrent.ThreadPoolExecutor
> $Worker.runTask(ThreadPoolExecutor.java:886)
>        at java.util.concurrent.ThreadPoolExecutor
> $Worker.run(ThreadPoolExecutor.java:908)
>        at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Impossible row size 6292724931198053
>        ... 9 more
>  INFO [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 743) Retrying from row index; data is -8
> bytes starting at 4735525245
>  WARN [CompactionExecutor:1] 2011-07-08 16:36:45,705
> CompactionManager.java (line 767) Retry failed too.  Skipping to next
> row (retry's stacktrace follows)
> java.io.IOError: java.io.EOFException: bloom filter claims to be
> 863794556 bytes, longer than entire row size -8
>
>
> THis is during scrub, as I saw similar errors while in normal operation.
> Is there anything I can do? It looks like I'm going to lose a ton of
> data
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com