You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Dan Hendry <da...@gmail.com> on 2011/01/23 21:38:43 UTC

Errors During Compaction

I have run into a strange problem and was hoping for suggestions on how to
fix it (0.7.0). When compaction occurs on one node for what appears to be
one specific column family, the following error pops up the Cassandra log.
Compaction apparently fails and temp files don't get cleaned up. After a
while and what seems to be multiple failed compactions on the CF, the node
runs out of disk space and crashes. Not sure if it is a related problem or a
function of this being a heavily used column family but after failing to
compact, compaction restarts on the same CF exacerbating the issue.

 

Problems with this specific node started earlier this weekend when it
crashed with and OOM error. This is quite surprising since my memtable
thresholds and GC settings have been tuned to run with quite a bit of
overhead during normal operation (max heap usage usually <= 10 GB on a 12 GB
heap, average usage of 6-8 GB). I could not find anything abnormal in the
logs which would prompt an OOM.

 

I will look things over tomorrow and try to provide a bit more information
on the problem but as a solution, I was going to wipe out all SSTables for
this CF on this node and then run a repair. Far from ideal, is this a
reasonable solution?

 

 

ERROR [CompactionExecutor:1] 2011-01-23 14:10:29,855
AbstractCassandraDaemon.java (line 91) Fatal exception in thread
Thread[CompactionExecutor:1,1,RMI Runtime]

java.io.IOError: java.io.EOFException: attempted to skip -1983579368 bytes
but only skipped 0

        at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdenti
tyIterator.java:78)

        at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTa
bleScanner.java:178)

        at
org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTa
bleScanner.java:143)

        at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:135)

        at
org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)

        at
org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIter
ator.java:284)

        at
org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIt
erator.java:326)

        at
org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIte
rator.java:230)

        at
org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.jav
a:68)

        at
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator
.java:136)

        at
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131
)

        at
org.apache.commons.collections.iterators.FilterIterator.setNextObject(Filter
Iterator.java:183)

        at
org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterat
or.java:94)

        at
org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.jav
a:323)

        at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)

        at
org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)

        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

        at java.util.concurrent.FutureTask.run(FutureTask.java:138)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)

        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.io.EOFException: attempted to skip -1983579368 bytes but
only skipped 0

        at
org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java
:52)

        at
org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdenti
tyIterator.java:69)

        ... 20 more

 

Dan Hendry

(403) 660-2297

RE: Errors During Compaction

Posted by Dan Hendry <da...@gmail.com>.

Limited joy I would say :)  No long term damage at least.

 

I ended up deleting (moving to another disk) all the sstables which fixed the problem. I ran in to even more problems during repair (detailed in another recent email) but it seems to have worked regardless. Just to be safe, I am in the process of starting a ‘manual repair’ (copying SSTables from other nodes for this particular CF then restarting and running a cleanup + major compaction).

 

Any thoughts on what the root cause of this problem could be? It is somewhat worrying that a CF can randomly become corrupt bringing down the whole node. Cassandras handling of a corrupt CF (regardless of how rare an occurrence) is less than elegant. 

 

Dan

 

From: Aaron Morton [mailto:aaron@thelastpickle.com] 
Sent: January-25-11 16:03
To: user@cassandra.apache.org
Subject: Re: Errors During Compaction

 

Dan how did you go with this? More joy, less joy or a continuation of the current level of joy?

 

Aaron

 


On 24/01/2011, at 9:38 AM, Dan Hendry <da...@gmail.com> wrote:

I have run into a strange problem and was hoping for suggestions on how to fix it (0.7.0). When compaction occurs on one node for what appears to be one specific column family, the following error pops up the Cassandra log. Compaction apparently fails and temp files don’t get cleaned up. After a while and what seems to be multiple failed compactions on the CF, the node runs out of disk space and crashes. Not sure if it is a related problem or a function of this being a heavily used column family but after failing to compact, compaction restarts on the same CF exacerbating the issue.

 

Problems with this specific node started earlier this weekend when it crashed with and OOM error. This is quite surprising since my memtable thresholds and GC settings have been tuned to run with quite a bit of overhead during normal operation (max heap usage usually <= 10 GB on a 12 GB heap, average usage of 6-8 GB). I could not find anything abnormal in the logs which would prompt an OOM.

 

I will look things over tomorrow and try to provide a bit more information on the problem but as a solution, I was going to wipe out all SSTables for this CF on this node and then run a repair. Far from ideal, is this a reasonable solution?

 

 

ERROR [CompactionExecutor:1] 2011-01-23 14:10:29,855 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,RMI Runtime]

java.io.IOError: java.io.EOFException: attempted to skip -1983579368 bytes but only skipped 0

        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:78)

        at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:178)

        at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:143)

        at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:135)

        at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)

        at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)

        at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)

        at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)

        at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)

        at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

        at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

        at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)

        at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)

        at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)

        at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)

        at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)

        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)

        at java.util.concurrent.FutureTask.run(FutureTask.java:138)

        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

        at java.lang.Thread.run(Thread.java:662)

Caused by: java.io.EOFException: attempted to skip -1983579368 bytes but only skipped 0

        at org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:52)

        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69)

        ... 20 more

 

Dan Hendry

(403) 660-2297

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.872 / Virus Database: 271.1.1/3402 - Release Date: 01/25/11 02:34:00

Re: Errors During Compaction

Posted by Aaron Morton <aa...@thelastpickle.com>.

Dan how did you go with this? More joy, less joy or a continuation of the current level of joy?

Aaron


On 24/01/2011, at 9:38 AM, Dan Hendry <da...@gmail.com> wrote:

> I have run into a strange problem and was hoping for suggestions on how to fix it (0.7.0). When compaction occurs on one node for what appears to be one specific column family, the following error pops up the Cassandra log. Compaction apparently fails and temp files don’t get cleaned up. After a while and what seems to be multiple failed compactions on the CF, the node runs out of disk space and crashes. Not sure if it is a related problem or a function of this being a heavily used column family but after failing to compact, compaction restarts on the same CF exacerbating the issue.
> 
>  
> 
> Problems with this specific node started earlier this weekend when it crashed with and OOM error. This is quite surprising since my memtable thresholds and GC settings have been tuned to run with quite a bit of overhead during normal operation (max heap usage usually <= 10 GB on a 12 GB heap, average usage of 6-8 GB). I could not find anything abnormal in the logs which would prompt an OOM.
> 
>  
> 
> I will look things over tomorrow and try to provide a bit more information on the problem but as a solution, I was going to wipe out all SSTables for this CF on this node and then run a repair. Far from ideal, is this a reasonable solution?
> 
>  
> 
>  
> 
> ERROR [CompactionExecutor:1] 2011-01-23 14:10:29,855 AbstractCassandraDaemon.java (line 91) Fatal exception in thread Thread[CompactionExecutor:1,1,RMI Runtime]
> 
> java.io.IOError: java.io.EOFException: attempted to skip -1983579368 bytes but only skipped 0
> 
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:78)
> 
>         at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:178)
> 
>         at org.apache.cassandra.io.sstable.SSTableScanner$KeyScanningIterator.next(SSTableScanner.java:143)
> 
>         at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:135)
> 
>         at org.apache.cassandra.io.sstable.SSTableScanner.next(SSTableScanner.java:38)
> 
>         at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)
> 
>         at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)
> 
>         at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)
> 
>         at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)
> 
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
> 
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
> 
>         at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
> 
>         at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
> 
>         at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:323)
> 
>         at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:122)
> 
>         at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:92)
> 
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> 
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> 
>         at java.lang.Thread.run(Thread.java:662)
> 
> Caused by: java.io.EOFException: attempted to skip -1983579368 bytes but only skipped 0
> 
>         at org.apache.cassandra.io.sstable.IndexHelper.skipBloomFilter(IndexHelper.java:52)
> 
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.<init>(SSTableIdentityIterator.java:69)
> 
>         ... 20 more
> 
>  
> 
> Dan Hendry
> 
> (403) 660-2297
> 
>