You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sylvain Lebresne (JIRA)" <ji...@apache.org> on 2011/06/20 12:46:47 UTC

[jira] [Issue Comment Edited] (CASSANDRA-2793) SSTable "Corrupt (negative) value length encountered" exception blocks compaction.

    [ https://issues.apache.org/jira/browse/CASSANDRA-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051915#comment-13051915 ] 

Sylvain Lebresne edited comment on CASSANDRA-2793 at 6/20/11 10:46 AM:
-----------------------------------------------------------------------

bq. Hi the issue reported was that the sstable corruption is blocking compaction with the consequence the bucket of sstables Cassandra wants to compact just grows and you get huge cpu load (from repeated attempts at compaction and increasing read inefficiency).

This is a dupe of CASSANDRA-2261.

bq. the trace also shows that it has just skipped the corrupted row so in fact it hasn't solved the problem at all.

In most cases of corruption, there is not much more we can do than skip the row. As the long as the corruption is local and you don't use RF=1, this is usually not a big deal (which does not mean corruption is something we should be happy with).

bq. The corruption itself is also an issue

Corruption can be of two forms: either we have a bug or the corruption is external (bad hard drive for instance). Hard drive corruptions do happen and there is not much we can do about it (well, actually we should use checksum to at least better dectect them : CASSANDRA-1717). On the front of a bug, since I see this happens on a Super column family, it could be due to a race fixed by CASSANDRA-2675.



      was (Author: slebresne):
    bq. Hi the issue reported was that the sstable corruption is blocking compaction with the consequence the bucket of sstables Cassandra wants to compact just grows and you get huge cpu load (from repeated attempts at compaction and increasing read inefficiency).

This is a dupe of https://issues.apache.org/jira/browse/CASSANDRA-2261.

bq. the trace also shows that it has just skipped the corrupted row so in fact it hasn't solved the problem at all.

In most cases of corruption, there is not much more we can do than skip the row. As the long as the corruption is local and you don't use RF=1, this is usually not a big deal (which does not mean corruption is something we should be happy with).

bq. The corruption itself is also an issue

Corruption can be of two forms: either we have a bug or the corruption is external (bad hard drive for instance). Hard drive corruptions do happen and there is not much we can do about it (well, actually we should use checksum to at least better dectect them : CASSANDRA-1717). On the front of a bug, since I see this happens on a Super column family, it could be due to a race fixed by CASSANDRA-2675.


  
> SSTable "Corrupt (negative) value length encountered" exception blocks compaction.
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2793
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2793
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 0.7.6
>         Environment: Ubuntu
>            Reporter: Dominic Williams
>
> A node was consistently experiencing high CPU load. Examination of the logs showed that compaction of an sstable was failing with an error:
>  INFO [CompactionExecutor:1] 2011-06-17 00:18:51,676 CompactionManager.java (line 395) Compacting [SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6993-Data.db'),SSTableReader(
> path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6995-Data.db'),SSTableReader(path='/var/opt/cassandra
> /data/FightMyMonster/UserMonsters-f-6996-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6998-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/Use
> rMonsters-f-7000-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7002-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7004-Data.db
> '),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7006-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7008-Data.db'),SSTableReader(path='/
> var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7010-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7012-Data.db'),SSTableReader(path='/var/opt/cassandra/data/F
> ightMyMonster/UserMonsters-f-7014-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7016-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonste
> rs-f-7018-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7020-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7022-Data.db'),SSTa
> bleReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7024-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7026-Data.db'),SSTableReader(path='/var/opt
> /cassandra/data/FightMyMonster/UserMonsters-f-7028-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7030-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyM
> onster/UserMonsters-f-7032-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7034-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-70
> 36-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7038-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7040-Data.db'),SSTableRead
> er(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7042-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7044-Data.db'),SSTableReader(path='/var/opt/cassan
> dra/data/FightMyMonster/UserMonsters-f-7046-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7048-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7050-Data.db'),SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7052-Data.db')]
> ERROR [CompactionExecutor:1] 2011-06-17 00:19:21,446 AbstractCassandraDaemon.java (line 114) Fatal exception in thread Thread[CompactionExecutor:1,1,main]
> java.io.IOError: java.io.IOException: Corrupt (negative) value length encountered        at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)        at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>         at java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
>         at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)        at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
>         at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)        at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
>         at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
>         at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:154)        at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:110)
>         at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:45)
>         at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:74)
>         at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)
>         at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)
>         at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)
>         at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)
>         at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:448)
>         at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:124)
>         at org.apache.cassandra.db.CompactionManager$1.call(CompactionManager.java:94)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: Corrupt (negative) value length encountered
>         at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99)
>         at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
>         ... 26 more
> Scrub was run on the keyspace (as a last ditch measure) but this did not work:
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:42,023 CompactionManager.java (line 511) Scrubbing SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7494-Data.db')
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:43,317 CompactionManager.java (line 652) Scrub of SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-7494-Data.db') complete: 379 row
> s in new sstable and 0 empty (tombstoned) rows dropped
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:43,317 CompactionManager.java (line 511) Scrubbing SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db')
>  WARN [CompactionExecutor:1] 2011-06-17 00:43:44,516 CompactionManager.java (line 606) Non-fatal error reading row (stacktrace follows)
> java.io.IOError: java.io.IOException: Corrupt (negative) value length encountered
>         at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:252)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:268)
>         at org.apache.cassandra.io.util.ColumnIterator.next(ColumnSortedMap.java:227)
>         at java.util.concurrent.ConcurrentSkipListMap.buildFromSorted(ConcurrentSkipListMap.java:1493)
>         at java.util.concurrent.ConcurrentSkipListMap.<init>(ConcurrentSkipListMap.java:1443)
>         at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:379)
>         at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:362)
>         at org.apache.cassandra.db.SuperColumnSerializer.deserialize(SuperColumn.java:322)
>         at org.apache.cassandra.db.ColumnFamilySerializer.deserializeColumns(ColumnFamilySerializer.java:129)
>         at org.apache.cassandra.io.sstable.SSTableIdentityIterator.getColumnFamilyWithColumns(SSTableIdentityIterator.java:201)
>         at org.apache.cassandra.io.PrecompactedRow.<init>(PrecompactedRow.java:78)
>         at org.apache.cassandra.db.CompactionManager.getCompactedRow(CompactionManager.java:783)
>         at org.apache.cassandra.db.CompactionManager.doScrub(CompactionManager.java:590)
>         at org.apache.cassandra.db.CompactionManager.access$600(CompactionManager.java:56)
>         at org.apache.cassandra.db.CompactionManager$3.call(CompactionManager.java:195)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: Corrupt (negative) value length encountered
>         at org.apache.cassandra.utils.ByteBufferUtil.readWithLength(ByteBufferUtil.java:315)
>         at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:99)
>         at org.apache.cassandra.io.util.ColumnIterator.deserializeNext(ColumnSortedMap.java:248)
>         ... 19 more
>  WARN [CompactionExecutor:1] 2011-06-17 00:43:44,517 CompactionManager.java (line 640) Row at 9517800 is unreadable; skipping to next
>  INFO [CompactionExecutor:1] 2011-06-17 00:43:45,073 CompactionManager.java (line 652) Scrub of SSTableReader(path='/var/opt/cassandra/data/FightMyMonster/UserMonsters-f-6994-Data.db') complete: 1029 rows in new sstable and 0 empty (tombstoned) rows dropped
>  WARN [CompactionExecutor:1] 2011-06-17 00:43:45,073 CompactionManager.java (line 654) Unable to recover 1 rows that were skipped.  You can attempt manual recovery from the pre-scrub snapshot.  You can also run nodetool repair to transfer the data from a healthy replica, if any

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira