You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by horschi <ho...@gmail.com> on 2014/06/14 22:02:36 UTC

Cassandra 2.0.8 MemoryMeter goes crazy

Hi everyone,

this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
All 3 nodes were upgraded. SStables are upgraded.

Unfortunetaly we are now experiencing that Cassandra starts to hang every
10 hours or so.

We can see the MemoryMeter being very active, every time it is hanging.
Both in tpstats and in the system.log:

 INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481)
CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0
(just-counted was 64.0).  calculation took 0ms for 0 cells

This line is logged hundreds of times per second (!) when Cassandra is
down. CPU is a 100% busy.

Interestingly this is only logged for this particular Columnfamily. This CF
is used as a queue, which only contains a few entries (datafiles are about
4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones).

            Table: ResponsePortal
            SSTable count: 1
            Space used (live), bytes: 4863
            Space used (total), bytes: 4863
            SSTable Compression Ratio: 0.9545454545454546
            Number of keys (estimate): 128
            Memtable cell count: 0
            Memtable data size, bytes: 0
            Memtable switch count: 1
            Local read count: 0
            Local read latency: 0.000 ms
            Local write count: 5
            Local write latency: 0.000 ms
            Pending tasks: 0
            Bloom filter false positives: 0
            Bloom filter false ratio: 0.00000
            Bloom filter space used, bytes: 176
            Compacted partition minimum bytes: 43
            Compacted partition maximum bytes: 50
            Compacted partition mean bytes: 50
            Average live cells per slice (last five minutes): 0.0
            Average tombstones per slice (last five minutes): 0.0


        Table: ResponsePortal
        SSTable count: 1
        Space used (live), bytes: 4765
        Space used (total), bytes: 5777
        SSTable Compression Ratio: 0.75
        Number of keys (estimate): 128
        Memtable cell count: 0
        Memtable data size, bytes: 0
        Memtable switch count: 12
        Local read count: 0
        Local read latency: 0.000 ms
        Local write count: 1096
        Local write latency: 0.000 ms
        Pending tasks: 0
        Bloom filter false positives: 0
        Bloom filter false ratio: 0.00000
        Bloom filter space used, bytes: 16
        Compacted partition minimum bytes: 43
        Compacted partition maximum bytes: 50
        Compacted partition mean bytes: 50
        Average live cells per slice (last five minutes): 0.0
        Average tombstones per slice (last five minutes): 0.0


Has anyone ever seen this or has an idea what could be wrong? It seems that
2.0 can handle this column family not as good as 1.2 could.

Any hints on what could be wrong are greatly appreciated :-)

Cheers,
Christian

Re: Cassandra 2.0.8 MemoryMeter goes crazy

Posted by horschi <ho...@gmail.com>.

Hi again,

before people start replying here: I just reported a Jira ticket:
https://issues.apache.org/jira/browse/CASSANDRA-7401

I think Memtable.maybeUpdateLiveRatio() needs some love.

kind regards,
Christian



On Sat, Jun 14, 2014 at 10:02 PM, horschi <ho...@gmail.com> wrote:

> Hi everyone,
>
> this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
> All 3 nodes were upgraded. SStables are upgraded.
>
> Unfortunetaly we are now experiencing that Cassandra starts to hang every
> 10 hours or so.
>
> We can see the MemoryMeter being very active, every time it is hanging.
> Both in tpstats and in the system.log:
>
>  INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481)
> CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0
> (just-counted was 64.0).  calculation took 0ms for 0 cells
>
> This line is logged hundreds of times per second (!) when Cassandra is
> down. CPU is a 100% busy.
>
> Interestingly this is only logged for this particular Columnfamily. This
> CF is used as a queue, which only contains a few entries (datafiles are
> about 4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones).
>
>             Table: ResponsePortal
>             SSTable count: 1
>             Space used (live), bytes: 4863
>             Space used (total), bytes: 4863
>             SSTable Compression Ratio: 0.9545454545454546
>             Number of keys (estimate): 128
>             Memtable cell count: 0
>             Memtable data size, bytes: 0
>             Memtable switch count: 1
>             Local read count: 0
>             Local read latency: 0.000 ms
>             Local write count: 5
>             Local write latency: 0.000 ms
>             Pending tasks: 0
>             Bloom filter false positives: 0
>             Bloom filter false ratio: 0.00000
>             Bloom filter space used, bytes: 176
>             Compacted partition minimum bytes: 43
>             Compacted partition maximum bytes: 50
>             Compacted partition mean bytes: 50
>             Average live cells per slice (last five minutes): 0.0
>             Average tombstones per slice (last five minutes): 0.0
>
>
>         Table: ResponsePortal
>         SSTable count: 1
>         Space used (live), bytes: 4765
>         Space used (total), bytes: 5777
>         SSTable Compression Ratio: 0.75
>         Number of keys (estimate): 128
>         Memtable cell count: 0
>         Memtable data size, bytes: 0
>         Memtable switch count: 12
>         Local read count: 0
>         Local read latency: 0.000 ms
>         Local write count: 1096
>         Local write latency: 0.000 ms
>         Pending tasks: 0
>         Bloom filter false positives: 0
>         Bloom filter false ratio: 0.00000
>         Bloom filter space used, bytes: 16
>         Compacted partition minimum bytes: 43
>         Compacted partition maximum bytes: 50
>         Compacted partition mean bytes: 50
>         Average live cells per slice (last five minutes): 0.0
>         Average tombstones per slice (last five minutes): 0.0
>
>
> Has anyone ever seen this or has an idea what could be wrong? It seems
> that 2.0 can handle this column family not as good as 1.2 could.
>
> Any hints on what could be wrong are greatly appreciated :-)
>
> Cheers,
> Christian
>

Re: Cassandra 2.0.8 MemoryMeter goes crazy

Posted by Robert Coli <rc...@eventbrite.com>.

On Mon, Jun 16, 2014 at 11:03 AM, horschi <ho...@gmail.com> wrote:

> About running mixed versions:
> I thought running mixed versions is ok. Running repair with mixed versions
> is not though. Right?
>

Running with split major versions for longer than it takes to do a rolling
restart is not supported.

=Rob

Re: Cassandra 2.0.8 MemoryMeter goes crazy

Posted by horschi <ho...@gmail.com>.

Hi Robert,

sorry, I am using our own internal terminology :-)

The entire cluster was upgraded. All 3 nodes of that cluster are on 2.0.8
now.

About the issue:
To me it looks like there is something wrong in the Memtable class. Some
very special edge case on CFs that are updated rarely. I cant say if it is
new to 2.0 or if it already existed in 1.2.

About running mixed versions:
I thought running mixed versions is ok. Running repair with mixed versions
is not though. Right?

kind regards,
Christian

On Mon, Jun 16, 2014 at 7:50 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Sat, Jun 14, 2014 at 1:02 PM, horschi <ho...@gmail.com> wrote:
>
>> this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
>> All 3 nodes were upgraded. SStables are upgraded.
>>
>
> One of your *clusters* or one of your *systems*?
>
> Running with split major versions is not supported.
>
> =Rob
>

Re: Cassandra 2.0.8 MemoryMeter goes crazy

Posted by Robert Coli <rc...@eventbrite.com>.

On Sat, Jun 14, 2014 at 1:02 PM, horschi <ho...@gmail.com> wrote:

> this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
> All 3 nodes were upgraded. SStables are upgraded.
>

One of your *clusters* or one of your *systems*?

Running with split major versions is not supported.

=Rob