You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Andre Sprenger <an...@getanet.de> on 2013/02/11 18:13:47 UTC

RuntimeException during leveled compaction

Hi,

I'm running a 6 node Cassandra 1.1.5 cluster on EC2. We have switched to
leveled compaction a couple of weeks ago,
this has been successful. Some days ago 3 of the nodes start to log the
following exception during compaction of
a particular column family:

ERROR [CompactionExecutor:726] 2013-02-11 13:02:26,582
AbstractCassandraDaemon.java (line 135) Exception in thread
Thread[CompactionExecutor:726,1,main]
java.lang.RuntimeException: Last written key
DecoratedKey(84590743047470232854915142878708713938,
31333535333333383530323237303130313030303232313537303030303132393832)
>= current key DecoratedKey(28357704665244162161305918843747894551,
31333430313336313830333831303130313030303230313632303030303036363338)
writing into
/var/cassandra/data/AdServer/EventHistory/Adserver-EventHistory-tmp-he-68638-Data.db
        at
org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
        at
org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
        at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
        at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
        at
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

Compaction does not happen any more for the column family and read
performance gets worse because of the growing
number of data files accessed during reads. Looks like one or more of the
data files are corrupt and have keys
that are stored out of order.

Any help to resolve this situation would be greatly appreciated.

Thanks
Andre

Re: RuntimeException during leveled compaction

Posted by aaron morton <aa...@thelastpickle.com>.

That sounds like something wrong with the way the rows are merged during compaction then. 

Can you run the compaction with DEBUG logging and raise a ticket? You may want to do this with the node not in the ring. Five minutes after it starts it will run pending compactions, so if there if compactions are not running they should start again. 

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 13/02/2013, at 8:11 PM, Andre Sprenger <an...@getanet.de> wrote:

> 
> Aaron,
> 
> thanks for your help. 
> 
> I ran 'nodetool scrub' and it finished after a couple of hours. But there are no infos about 
> out of order rows in the logs and the compaction on the column family still raises the same
> exception. 
> 
> With the row key I could identify some of the errant SSTables and removed them during
> a node restart. On some nodes compaction is working for the moment but there are likely
> more corrupt datafiles and than I would be in the same situation as before.
> 
> So I still need some help to resolve this issue!
> 
> Cheers
> Andre
> 
> 
> 2013/2/12 aaron morton <aa...@thelastpickle.com>
> snapshot all nodes so you have a backup: nodetool snapshot -t corrupt
> 
> run nodetool scrub on the errant CF. 
> 
> Look for messages such as:
> 
> "Out of order row detected…"
> "%d out of order rows found while scrubbing %s; Those have been written (in order) to a new sstable (%s)"
> 
> In the logs. 
> 
> Cheers
>   
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 12/02/2013, at 6:13 AM, Andre Sprenger <an...@getanet.de> wrote:
> 
>> Hi,
>> 
>> I'm running a 6 node Cassandra 1.1.5 cluster on EC2. We have switched to leveled compaction a couple of weeks ago, 
>> this has been successful. Some days ago 3 of the nodes start to log the following exception during compaction of 
>> a particular column family:
>> 
>> ERROR [CompactionExecutor:726] 2013-02-11 13:02:26,582 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[CompactionExecutor:726,1,main]
>> java.lang.RuntimeException: Last written key DecoratedKey(84590743047470232854915142878708713938, 31333535333333383530323237303130313030303232313537303030303132393832) 
>> >= current key DecoratedKey(28357704665244162161305918843747894551, 31333430313336313830333831303130313030303230313632303030303036363338) 
>> writing into /var/cassandra/data/AdServer/EventHistory/Adserver-EventHistory-tmp-he-68638-Data.db
>>         at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
>>         at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
>>         at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
>>         at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
>>         at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
>>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> 
>> Compaction does not happen any more for the column family and read performance gets worse because of the growing 
>> number of data files accessed during reads. Looks like one or more of the data files are corrupt and have keys
>> that are stored out of order.
>> 
>> Any help to resolve this situation would be greatly appreciated.
>> 
>> Thanks
>> Andre
>> 
> 
>

Re: RuntimeException during leveled compaction

Posted by Andre Sprenger <an...@getanet.de>.

Aaron,

thanks for your help.

I ran 'nodetool scrub' and it finished after a couple of hours. But there
are no infos about
out of order rows in the logs and the compaction on the column family still
raises the same
exception.

With the row key I could identify some of the errant SSTables and removed
them during
a node restart. On some nodes compaction is working for the moment but
there are likely
more corrupt datafiles and than I would be in the same situation as before.

So I still need some help to resolve this issue!

Cheers
Andre


2013/2/12 aaron morton <aa...@thelastpickle.com>

> snapshot all nodes so you have a backup: nodetool snapshot -t corrupt
>
> run nodetool scrub on the errant CF.
>
> Look for messages such as:
>
> "Out of order row detected…"
> "%d out of order rows found while scrubbing %s; Those have been written
> (in order) to a new sstable (%s)"
>
> In the logs.
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 12/02/2013, at 6:13 AM, Andre Sprenger <an...@getanet.de>
> wrote:
>
> Hi,
>
> I'm running a 6 node Cassandra 1.1.5 cluster on EC2. We have switched to
> leveled compaction a couple of weeks ago,
> this has been successful. Some days ago 3 of the nodes start to log the
> following exception during compaction of
> a particular column family:
>
> ERROR [CompactionExecutor:726] 2013-02-11 13:02:26,582
> AbstractCassandraDaemon.java (line 135) Exception in thread
> Thread[CompactionExecutor:726,1,main]
> java.lang.RuntimeException: Last written key
> DecoratedKey(84590743047470232854915142878708713938,
> 31333535333333383530323237303130313030303232313537303030303132393832)
> >= current key DecoratedKey(28357704665244162161305918843747894551,
> 31333430313336313830333831303130313030303230313632303030303036363338)
> writing into
> /var/cassandra/data/AdServer/EventHistory/Adserver-EventHistory-tmp-he-68638-Data.db
>         at
> org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
>         at
> org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
>         at
> org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
>
> Compaction does not happen any more for the column family and read
> performance gets worse because of the growing
> number of data files accessed during reads. Looks like one or more of the
> data files are corrupt and have keys
> that are stored out of order.
>
> Any help to resolve this situation would be greatly appreciated.
>
> Thanks
> Andre
>
>
>

Re: RuntimeException during leveled compaction

Posted by aaron morton <aa...@thelastpickle.com>.

snapshot all nodes so you have a backup: nodetool snapshot -t corrupt

run nodetool scrub on the errant CF. 

Look for messages such as:

"Out of order row detected…"
"%d out of order rows found while scrubbing %s; Those have been written (in order) to a new sstable (%s)"

In the logs. 

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 12/02/2013, at 6:13 AM, Andre Sprenger <an...@getanet.de> wrote:

> Hi,
> 
> I'm running a 6 node Cassandra 1.1.5 cluster on EC2. We have switched to leveled compaction a couple of weeks ago, 
> this has been successful. Some days ago 3 of the nodes start to log the following exception during compaction of 
> a particular column family:
> 
> ERROR [CompactionExecutor:726] 2013-02-11 13:02:26,582 AbstractCassandraDaemon.java (line 135) Exception in thread Thread[CompactionExecutor:726,1,main]
> java.lang.RuntimeException: Last written key DecoratedKey(84590743047470232854915142878708713938, 31333535333333383530323237303130313030303232313537303030303132393832) 
> >= current key DecoratedKey(28357704665244162161305918843747894551, 31333430313336313830333831303130313030303230313632303030303036363338) 
> writing into /var/cassandra/data/AdServer/EventHistory/Adserver-EventHistory-tmp-he-68638-Data.db
>         at org.apache.cassandra.io.sstable.SSTableWriter.beforeAppend(SSTableWriter.java:134)
>         at org.apache.cassandra.io.sstable.SSTableWriter.append(SSTableWriter.java:153)
>         at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:159)
>         at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
>         at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 
> Compaction does not happen any more for the column family and read performance gets worse because of the growing 
> number of data files accessed during reads. Looks like one or more of the data files are corrupt and have keys
> that are stored out of order.
> 
> Any help to resolve this situation would be greatly appreciated.
> 
> Thanks
> Andre
>