You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Dan Hendry <da...@gmail.com> on 2011/10/20 18:53:09 UTC

Cassandra 1.0.0 - Node Load Bug

I have been playing around with Cassandra 1.0.0 in our test environment it
seems pretty sweet so far. I have however come across what appears to be a
bug tracking node load. I have enabled compression and levelled compaction
on all CFs (scrub  + snapshot deletion) and the nodes have been operating
normally for a day or two. I started getting concerned when the load as
reported by nodetool ring kept increasing (it seems monotonically) despite
seeing a compression ratio of ~2.5x (as a side note, I find it strange
Cassandra does not provide the compression ratio via jmx or in the logs). I
initially thought there might be a bug in cleaning up obsolete SSTables but
I then noticed the following discrepancy:

 

Nodetool ring reports:

                10.112.27.65    datacenter1 rack1       Up     Normal  8.64
GB         50.00%  170141183460469231731687303715884105727

 

Yet du . -h reports: only 2.4G in the data directory.

 

After restarting the node, nodetool ring reports a more accurate:

10.112.27.65    datacenter1 rack1       Up     Normal  2.35 GB
50.00%  170141183460469231731687303715884105727

 

Again, both compression and levelled compaction have been enabled on all
CFs. Is this a known issue or has anybody else observed a similar pattern?

 

Dan Hendry

(403) 660-2297

 


Re: Cassandra 1.0.0 - Node Load Bug

Posted by Henrik Schröder <sk...@gmail.com>.
We're also seeing something similar since upgrading to 1.0.0.

We have a 6-node cluster with replication factor of 3, but three of the
nodes are older running 32-bit Windows Server 2008, and three of the nodes
are newer and running 64-bit Windows Server 2008 R2, and we're running
32-bit java on the older nodes and 64-bit java on the newer nodes. We are
*not* using compression and we are *not* using leveled compaction, and we
also see that nodetool ring and info report the wrong load, it's growing
faster than actual disk usage. Restarting a node restores the reported load
to the correct number.

However, this only happens on the newer nodes running 64-bit java, not on
the older nodes running 32-bit.

Nodetool ring reports:
10.0.0.57       datacenter1 rack1       Up     Normal  25.7 GB
16.67%
10.0.0.50       datacenter1 rack1       Up     Normal  12.34 GB
16.67%
10.0.0.58       datacenter1 rack1       Up     Normal  11.74 GB
16.67%
10.0.0.51       datacenter1 rack1       Up     Normal  12.25 GB
16.67%
10.0.0.56       datacenter1 rack1       Up     Normal  17.94 GB
16.67%
10.0.0.52       datacenter1 rack1       Up     Normal  12.56 GB
16.67%

.56, .57, .58 are the newer nodes, I restarted .58, and then it reports the
correct size, while .57 and .56 report the wrong size. This is after about a
week of uptime for all nodes, and the bug makes the newer nodes report about
twice the actual datasize.

Running compaction does not correct the reported load number, only
restarting Cassandra fixes it.

I hope this helps a little bit at least.


/Henrik Schröder

On Thu, Oct 20, 2011 at 18:53, Dan Hendry <da...@gmail.com> wrote:

> I have been playing around with Cassandra 1.0.0 in our test environment it
> seems pretty sweet so far. I have however come across what appears to be a
> bug tracking node load. I have enabled compression and levelled compaction
> on all CFs (scrub  + snapshot deletion) and the nodes have been operating
> normally for a day or two. I started getting concerned when the load as
> reported by nodetool ring kept increasing (it seems monotonically) despite
> seeing a compression ratio of ~2.5x (as a side note, I find it strange
> Cassandra does not provide the compression ratio via jmx or in the logs). I
> initially thought there might be a bug in cleaning up obsolete SSTables but
> I then noticed the following discrepancy:****
>
> ** **
>
> Nodetool ring reports:****
>
>                 10.112.27.65    datacenter1 rack1       Up     Normal  8.64
> GB         50.00%  170141183460469231731687303715884105727****
>
> ** **
>
> Yet du . –h reports: only 2.4G in the data directory.****
>
> ** **
>
> After restarting the node, nodetool ring reports a more accurate:****
>
> 10.112.27.65    datacenter1 rack1       Up     Normal  2.35 GB
> 50.00%  170141183460469231731687303715884105727****
>
> ** **
>
> Again, both compression and levelled compaction have been enabled on all
> CFs. Is this a known issue or has anybody else observed a similar pattern?
> ****
>
> ** **
>
> Dan Hendry****
>
> (403) 660-2297****
>
> ** **
>

Re: Cassandra 1.0.0 - Node Load Bug

Posted by Edward Capriolo <ed...@gmail.com>.
On Thu, Oct 20, 2011 at 12:53 PM, Dan Hendry <da...@gmail.com>wrote:

> I have been playing around with Cassandra 1.0.0 in our test environment it
> seems pretty sweet so far. I have however come across what appears to be a
> bug tracking node load. I have enabled compression and levelled compaction
> on all CFs (scrub  + snapshot deletion) and the nodes have been operating
> normally for a day or two. I started getting concerned when the load as
> reported by nodetool ring kept increasing (it seems monotonically) despite
> seeing a compression ratio of ~2.5x (as a side note, I find it strange
> Cassandra does not provide the compression ratio via jmx or in the logs). I
> initially thought there might be a bug in cleaning up obsolete SSTables but
> I then noticed the following discrepancy:
>
>
>
> Nodetool ring reports:
>
>                 10.112.27.65    datacenter1 rack1       Up     Normal  8.64
> GB         50.00%  170141183460469231731687303715884105727
>
>
>
> Yet du . –h reports: only 2.4G in the data directory.
>
>
>
> After restarting the node, nodetool ring reports a more accurate:
>
> 10.112.27.65    datacenter1 rack1       Up     Normal  2.35 GB
> 50.00%  170141183460469231731687303715884105727
>
>
>
> Again, both compression and levelled compaction have been enabled on all
> CFs. Is this a known issue or has anybody else observed a similar pattern?
>
>
>
> Dan Hendry
>
> (403) 660-2297
>
>
>


In 0.7.8 known bug. Scrub would 'orphan' files that way and the counts would
be off like this. If you restart the node and the counts like equal again it
might be some type of regression.

Re: Cassandra 1.0.0 - Node Load Bug

Posted by Jonathan Ellis <jb...@gmail.com>.
You're right, that is in 1.0.0.  Don't know what the OP is seeing, then.

On Fri, Oct 21, 2011 at 6:32 AM, Jeremiah Jordan
<JE...@morningstar.com> wrote:
> I thought this patch made it into the 1.0 release?  I remember it being referenced in one of the re-rolls.
>
>
> On Oct 20, 2011, at 9:56 PM, "Jonathan Ellis" <jb...@gmail.com> wrote:
>
>> That looks to me like it's reporting uncompressed size as the load.
>> Should be fixed in the 1.0 branch for 1.0.1.
>> (https://issues.apache.org/jira/browse/CASSANDRA-3338)
>>
>> On Thu, Oct 20, 2011 at 11:53 AM, Dan Hendry <da...@gmail.com> wrote:
>>> I have been playing around with Cassandra 1.0.0 in our test environment it
>>> seems pretty sweet so far. I have however come across what appears to be a
>>> bug tracking node load. I have enabled compression and levelled compaction
>>> on all CFs (scrub  + snapshot deletion) and the nodes have been operating
>>> normally for a day or two. I started getting concerned when the load as
>>> reported by nodetool ring kept increasing (it seems monotonically) despite
>>> seeing a compression ratio of ~2.5x (as a side note, I find it strange
>>> Cassandra does not provide the compression ratio via jmx or in the logs). I
>>> initially thought there might be a bug in cleaning up obsolete SSTables but
>>> I then noticed the following discrepancy:
>>>
>>>
>>>
>>> Nodetool ring reports:
>>>
>>>                 10.112.27.65    datacenter1 rack1       Up     Normal  8.64
>>> GB         50.00%  170141183460469231731687303715884105727
>>>
>>>
>>>
>>> Yet du . –h reports: only 2.4G in the data directory.
>>>
>>>
>>>
>>> After restarting the node, nodetool ring reports a more accurate:
>>>
>>> 10.112.27.65    datacenter1 rack1       Up     Normal  2.35 GB
>>> 50.00%  170141183460469231731687303715884105727
>>>
>>>
>>>
>>> Again, both compression and levelled compaction have been enabled on all
>>> CFs. Is this a known issue or has anybody else observed a similar pattern?
>>>
>>>
>>>
>>> Dan Hendry
>>>
>>> (403) 660-2297
>>>
>>>
>>
>>
>>
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support
>> http://www.datastax.com
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Cassandra 1.0.0 - Node Load Bug

Posted by Jeremiah Jordan <JE...@morningstar.com>.
I thought this patch made it into the 1.0 release?  I remember it being referenced in one of the re-rolls.


On Oct 20, 2011, at 9:56 PM, "Jonathan Ellis" <jb...@gmail.com> wrote:

> That looks to me like it's reporting uncompressed size as the load.
> Should be fixed in the 1.0 branch for 1.0.1.
> (https://issues.apache.org/jira/browse/CASSANDRA-3338)
> 
> On Thu, Oct 20, 2011 at 11:53 AM, Dan Hendry <da...@gmail.com> wrote:
>> I have been playing around with Cassandra 1.0.0 in our test environment it
>> seems pretty sweet so far. I have however come across what appears to be a
>> bug tracking node load. I have enabled compression and levelled compaction
>> on all CFs (scrub  + snapshot deletion) and the nodes have been operating
>> normally for a day or two. I started getting concerned when the load as
>> reported by nodetool ring kept increasing (it seems monotonically) despite
>> seeing a compression ratio of ~2.5x (as a side note, I find it strange
>> Cassandra does not provide the compression ratio via jmx or in the logs). I
>> initially thought there might be a bug in cleaning up obsolete SSTables but
>> I then noticed the following discrepancy:
>> 
>> 
>> 
>> Nodetool ring reports:
>> 
>>                 10.112.27.65    datacenter1 rack1       Up     Normal  8.64
>> GB         50.00%  170141183460469231731687303715884105727
>> 
>> 
>> 
>> Yet du . –h reports: only 2.4G in the data directory.
>> 
>> 
>> 
>> After restarting the node, nodetool ring reports a more accurate:
>> 
>> 10.112.27.65    datacenter1 rack1       Up     Normal  2.35 GB
>> 50.00%  170141183460469231731687303715884105727
>> 
>> 
>> 
>> Again, both compression and levelled compaction have been enabled on all
>> CFs. Is this a known issue or has anybody else observed a similar pattern?
>> 
>> 
>> 
>> Dan Hendry
>> 
>> (403) 660-2297
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Cassandra 1.0.0 - Node Load Bug

Posted by Jonathan Ellis <jb...@gmail.com>.
That looks to me like it's reporting uncompressed size as the load.
Should be fixed in the 1.0 branch for 1.0.1.
(https://issues.apache.org/jira/browse/CASSANDRA-3338)

On Thu, Oct 20, 2011 at 11:53 AM, Dan Hendry <da...@gmail.com> wrote:
> I have been playing around with Cassandra 1.0.0 in our test environment it
> seems pretty sweet so far. I have however come across what appears to be a
> bug tracking node load. I have enabled compression and levelled compaction
> on all CFs (scrub  + snapshot deletion) and the nodes have been operating
> normally for a day or two. I started getting concerned when the load as
> reported by nodetool ring kept increasing (it seems monotonically) despite
> seeing a compression ratio of ~2.5x (as a side note, I find it strange
> Cassandra does not provide the compression ratio via jmx or in the logs). I
> initially thought there might be a bug in cleaning up obsolete SSTables but
> I then noticed the following discrepancy:
>
>
>
> Nodetool ring reports:
>
>                 10.112.27.65    datacenter1 rack1       Up     Normal  8.64
> GB         50.00%  170141183460469231731687303715884105727
>
>
>
> Yet du . –h reports: only 2.4G in the data directory.
>
>
>
> After restarting the node, nodetool ring reports a more accurate:
>
> 10.112.27.65    datacenter1 rack1       Up     Normal  2.35 GB
> 50.00%  170141183460469231731687303715884105727
>
>
>
> Again, both compression and levelled compaction have been enabled on all
> CFs. Is this a known issue or has anybody else observed a similar pattern?
>
>
>
> Dan Hendry
>
> (403) 660-2297
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com