You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Rudolf van der Leeden <ru...@scoreloop.com> on 2012/09/10 18:37:09 UTC

Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Hi,

I'm getting 5 identical assertions while running 'nodetool cleanup' on a
Cassandra 1.1.4 node with Load=104G and 80m keys.
>From  system.log :

ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265
AbstractCassandraDaemon.java (line 134) Exception in thread
Thread[CompactionExecutor:576,1,main]
java.lang.AssertionError
        at
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
        at
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
        at
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
        at
org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
        at
org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992)
        at
org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
        at
org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
        at
org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
        at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)

After 3 hours the job is done and there are 11390 compaction tasks pending.
My question: Can these assertions be ignored? Or do I need to worry about
it?

Thanks for your help and best regards,
-Rudolf.

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by aaron morton <aa...@thelastpickle.com>.

> My question: Can these assertions be ignored? Or do I need to worry about it?
That looks like a problem.

Can you raise a ticket on https://issues.apache.org/jira/browse/CASSANDRA ?

May be good to include information on:
* how long you've been using Levelled Compaction.
* Is this all CF's or just one ?
* If you can identify the CF can you include the .json file that is kept on disk. It contains information about levelled compaction. 

Cheers
  
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 11/09/2012, at 4:37 AM, Rudolf van der Leeden <ru...@scoreloop.com> wrote:

> Hi,
> 
> I'm getting 5 identical assertions while running 'nodetool cleanup' on a Cassandra 1.1.4 node with Load=104G and 80m keys.
> From  system.log :
> 
> ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265 AbstractCassandraDaemon.java (line 134) Exception in thread Thread[CompactionExecutor:576,1,main]
> java.lang.AssertionError
>         at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
>         at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
>         at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
>         at org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
>         at org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992)
>         at org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
>         at org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
>         at org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
>         at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
> 
> After 3 hours the job is done and there are 11390 compaction tasks pending.
> My question: Can these assertions be ignored? Or do I need to worry about it?
> 
> Thanks for your help and best regards,
> -Rudolf.
>

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Rudolf van der Leeden <ru...@scoreloop.com>.

> Could you, as Aaron suggested, open a ticket?
>

Done:  https://issues.apache.org/jira/browse/CASSANDRA-4644

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Omid Aladini <om...@gmail.com>.

Could you, as Aaron suggested, open a ticket?

-- Omid

On Tue, Sep 11, 2012 at 2:35 PM, Rudolf van der Leeden
<ru...@scoreloop.com> wrote:
>> Which version of Cassandra has your data been created initially with?
>> A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
>> and inter-level overlaps in CFs with Leveled Compaction. Your sstables
>> generated with 1.1.3 and later should not have this issue [1] [2].
>> In case you have old Leveled-compacted sstables (generated with 1.1.2
>> or earlier. including 1.0.x) you need to run offline scrub using
>> Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
>> out-of-order sstables and inter-level overlaps caused by previous
>> versions of LCS. You need to take nodes down in order to run offline
>> scrub.
>
>
> The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT
> leveled compaction).
> After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems.
> Then we ran more tests and created more and very big keys with millions of
> columns.
> The assertion only shows up with one particular CF containing these big
> keys.
> So, from your explanation, I don't think an offline scrub will help.
>
> Thanks,
> -Rudolf.
>

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Rudolf van der Leeden <ru...@scoreloop.com>.

>
> Which version of Cassandra has your data been created initially with?
> A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
> and inter-level overlaps in CFs with Leveled Compaction. Your sstables
> generated with 1.1.3 and later should not have this issue [1] [2].
> In case you have old Leveled-compacted sstables (generated with 1.1.2
> or earlier. including 1.0.x) you need to run offline scrub using
> Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
> out-of-order sstables and inter-level overlaps caused by previous
> versions of LCS. You need to take nodes down in order to run offline
> scrub.
>

The data was orginally created on a 1.1.2 cluster with STCS (i.e. NOT
leveled compaction).
After the upgrade to 1.1.4 we changed from STCS to LCS w/o problems.
Then we ran more tests and created more and very big keys with millions of
columns.
The assertion only shows up with one particular CF containing these big
keys.
So, from your explanation, I don't think an offline scrub will help.

Thanks,
-Rudolf.

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Mikhail Panchenko <m...@mihasya.com>.

Based on the steps outlined here
https://issues.apache.org/jira/browse/CASSANDRA-4644?focusedCommentId=13453156&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13453156it
seems that LCS was not used until after 1.1.4 and they were able to do
a
full repair cleanup compact cycle on 1.1.4 before running into problems.

I don't see any major bugfixes for LCS in 1.1.5 either, so this appears to
be a legitimate bug if the timeline is correct.

On Tue, Sep 11, 2012 at 2:50 PM, Omid Aladini <om...@gmail.com> wrote:

> On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
> <ja...@ecyrd.com> wrote:
> >
> >> A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
> >> and inter-level overlaps in CFs with Leveled Compaction. Your sstables
> >> generated with 1.1.3 and later should not have this issue [1] [2].
> >
> > Does this mean that LCS on 1.0.x should be considered unsafe to
> > use? I'm using them for semi-wide frequently-updated CounterColumns
> > and they're performing much better on LCS than on STCS.
>
> That's true. "Unsafe" in the sense that your data might not be in the
> right shape with respect to order of keys in sstables and LCS's
> properties and you might need to offline-scrub when you upgrade to the
> latest 1.1.x.
>
> >> In case you have old Leveled-compacted sstables (generated with 1.1.2
> >> or earlier. including 1.0.x) you need to run offline scrub using
> >> Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
> >> out-of-order sstables and inter-level overlaps caused by previous
> >> versions of LCS. You need to take nodes down in order to run offline
> >> scrub.
> >
> > The  1.1.5 README does not mention this. Should it?
>
> The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
> I agree it would be helpful to have it on NEWS.txt.
>
> Cheers,
> Omid
>
> > /Janne
> >
>

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Omid Aladini <om...@gmail.com>.

On Wed, Sep 12, 2012 at 9:38 AM, Janne Jalkanen
<ja...@ecyrd.com> wrote:
> OK, so what's the worst case here? Data loss? Bad performance?

Low performance is for sure a side effect. I can't comment on data
loss (and I'm curious about as well) because it depends on how data
off of an out-of-order sstable was being indexed and served prior to
Cassandra 1.1.1 (that the bug became apparent) which is essential for
counter repairs, for example.

-- Omid

>> The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
>> I agree it would be helpful to have it on NEWS.txt.
>
> I'll file a bug on this, unless someone can get to it first :)
>
> /Janne

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Janne Jalkanen <ja...@ecyrd.com>.

On 12 Sep 2012, at 00:50, Omid Aladini wrote:

> On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
> <ja...@ecyrd.com> wrote:
>> 
>> Does this mean that LCS on 1.0.x should be considered unsafe to
>> use? I'm using them for semi-wide frequently-updated CounterColumns
>> and they're performing much better on LCS than on STCS.
> 
> That's true. "Unsafe" in the sense that your data might not be in the
> right shape with respect to order of keys in sstables and LCS's
> properties and you might need to offline-scrub when you upgrade to the
> latest 1.1.x.

OK, so what's the worst case here? Data loss? Bad performance?

> The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
> I agree it would be helpful to have it on NEWS.txt.

I'll file a bug on this, unless someone can get to it first :)

/Janne

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Omid Aladini <om...@gmail.com>.

On Tue, Sep 11, 2012 at 8:33 PM, Janne Jalkanen
<ja...@ecyrd.com> wrote:
>
>> A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
>> and inter-level overlaps in CFs with Leveled Compaction. Your sstables
>> generated with 1.1.3 and later should not have this issue [1] [2].
>
> Does this mean that LCS on 1.0.x should be considered unsafe to
> use? I'm using them for semi-wide frequently-updated CounterColumns
> and they're performing much better on LCS than on STCS.

That's true. "Unsafe" in the sense that your data might not be in the
right shape with respect to order of keys in sstables and LCS's
properties and you might need to offline-scrub when you upgrade to the
latest 1.1.x.

>> In case you have old Leveled-compacted sstables (generated with 1.1.2
>> or earlier. including 1.0.x) you need to run offline scrub using
>> Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
>> out-of-order sstables and inter-level overlaps caused by previous
>> versions of LCS. You need to take nodes down in order to run offline
>> scrub.
>
> The  1.1.5 README does not mention this. Should it?

The fix was released on 1.1.3 (LCS fix) and 1.1.4 (offline scrub) and
I agree it would be helpful to have it on NEWS.txt.

Cheers,
Omid

> /Janne
>

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Janne Jalkanen <ja...@ecyrd.com>.

> A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
> and inter-level overlaps in CFs with Leveled Compaction. Your sstables
> generated with 1.1.3 and later should not have this issue [1] [2].

Does this mean that LCS on 1.0.x should be considered unsafe to use? I'm using them for semi-wide frequently-updated CounterColumns and they're performing much better on LCS than on STCS.

> In case you have old Leveled-compacted sstables (generated with 1.1.2
> or earlier. including 1.0.x) you need to run offline scrub using
> Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
> out-of-order sstables and inter-level overlaps caused by previous
> versions of LCS. You need to take nodes down in order to run offline
> scrub.

The  1.1.5 README does not mention this. Should it?

/Janne

Re: Assertions running Cleanup on a 3-node cluster with Cassandra 1.1.4 and LCS

Posted by Omid Aladini <om...@gmail.com>.

Which version of Cassandra has your data been created initially with?

A bug in Cassandra 1.1.2 and earlier could cause out-of-order sstables
and inter-level overlaps in CFs with Leveled Compaction. Your sstables
generated with 1.1.3 and later should not have this issue [1] [2].

In case you have old Leveled-compacted sstables (generated with 1.1.2
or earlier. including 1.0.x) you need to run offline scrub using
Cassandra 1.1.4 or later via /bin/sstablescrub command so it'll fix
out-of-order sstables and inter-level overlaps caused by previous
versions of LCS. You need to take nodes down in order to run offline
scrub.

> After 3 hours the job is done and there are 11390 compaction tasks pending.
> My question: Can these assertions be ignored? Or do I need to worry about
> it?

They can't be ignored since pending compactions elevate the upper
bound on number of disk seeks you need to make to read a row and you
don't get the nice guarantees of leveled compaction.

Cheers,
Omid

[1] https://issues.apache.org/jira/browse/CASSANDRA-4411
[2] https://issues.apache.org/jira/browse/CASSANDRA-4321

On Mon, Sep 10, 2012 at 6:37 PM, Rudolf van der Leeden
<ru...@scoreloop.com> wrote:
> Hi,
>
> I'm getting 5 identical assertions while running 'nodetool cleanup' on a
> Cassandra 1.1.4 node with Load=104G and 80m keys.
> From  system.log :
>
> ERROR [CompactionExecutor:576] 2012-09-10 11:25:50,265
> AbstractCassandraDaemon.java (line 134) Exception in thread
> Thread[CompactionExecutor:576,1,main]
> java.lang.AssertionError
>         at
> org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:214)
>         at
> org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:158)
>         at
> org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:531)
>         at
> org.apache.cassandra.db.DataTracker.replaceCompactedSSTables(DataTracker.java:254)
>         at
> org.apache.cassandra.db.ColumnFamilyStore.replaceCompactedSSTables(ColumnFamilyStore.java:992)
>         at
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:200)
>         at
> org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
>         at
> org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
>         at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>         at java.lang.Thread.run(Thread.java:662)
>
> After 3 hours the job is done and there are 11390 compaction tasks pending.
> My question: Can these assertions be ignored? Or do I need to worry about
> it?
>
> Thanks for your help and best regards,
> -Rudolf.
>