You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Reik Schatz <re...@gmail.com> on 2013/01/15 18:23:53 UTC

write count increase after 1.2 update

Hi, we are running a 1.1.6 (datastax) test cluster with 6 nodes. After the
recent 1.2 release we have set up a second cluster - also having 6 nodes
running 1.2 (datastax).

They are now running in parallel. We noticed an increase in the number of
writes in our monitoring tool (Datadog). The tool is using the write count
statistic of nodetool cfstats. So we ran nodetool cfstats on one node in
each cluster. To get an initial write count. Then we ran it again after 60
sec. It looks like the 1.2 received about twice the amount of writes.

The way our application is designed is that the writes are idempotent, so
we don't see a size increase. Were there any changes in between 1.1.6 > 1.2
that could explain this behavior?

I know that 1.2 has the concept of virtual nodes, to spread out the data
more evenly. So if the "write count" value was actually the sum of all
writes to all nodes in the, this increase would make sense.

Reik

ps. the clusters are not 100% identical. i.e. since bloom filters are now
off-heap, we changed settings for heap size and memtables. Cluster 1.1.6:
heap 8G, memtables 1/3 of heap. Cluster 1.2.0: heap 4G, memtables 2G. Not
sure it can have an impact on the problem.

Re: write count increase after 1.2 update

Posted by aaron morton <aa...@thelastpickle.com>.

> Semi-lame post to this mailing list i guess :( I should have checked that earlier
No problems. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 17/01/2013, at 11:50 PM, Reik Schatz <re...@gmail.com> wrote:

> Cool feature, didn't know it existed. It turned however out that everything works fine! There was a configuration error that duplicated a AWS sns->sqs subscription, so we go twice the amount of data delivered to our application. Semi-lame post to this mailing list i guess :( I should have checked that earlier but major thanks for looking into this and replying.
> 
> Cheers,
> Reik
> 
> 
> On Thu, Jan 17, 2013 at 4:28 AM, aaron morton <aa...@thelastpickle.com> wrote:
> You *may* be seeing this https://issues.apache.org/jira/browse/CASSANDRA-2503
> 
> It was implemented in 1.1.0 but perhaps data in the original cluster is more compacted than the new one. 
> 
> Are the increases for all CF's are just a few?
> Do you have a work load of infrequent writes to rows followed by wide reads ?
> 
> Cheers
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> On 16/01/2013, at 6:23 AM, Reik Schatz <re...@gmail.com> wrote:
> 
>> Hi, we are running a 1.1.6 (datastax) test cluster with 6 nodes. After the recent 1.2 release we have set up a second cluster - also having 6 nodes running 1.2 (datastax).
>> 
>> They are now running in parallel. We noticed an increase in the number of writes in our monitoring tool (Datadog). The tool is using the write count statistic of nodetool cfstats. So we ran nodetool cfstats on one node in each cluster. To get an initial write count. Then we ran it again after 60 sec. It looks like the 1.2 received about twice the amount of writes. 
>> 
>> The way our application is designed is that the writes are idempotent, so we don't see a size increase. Were there any changes in between 1.1.6 > 1.2 that could explain this behavior?
>> 
>> I know that 1.2 has the concept of virtual nodes, to spread out the data more evenly. So if the "write count" value was actually the sum of all writes to all nodes in the, this increase would make sense.
>> 
>> Reik
>> 
>> ps. the clusters are not 100% identical. i.e. since bloom filters are now off-heap, we changed settings for heap size and memtables. Cluster 1.1.6: heap 8G, memtables 1/3 of heap. Cluster 1.2.0: heap 4G, memtables 2G. Not sure it can have an impact on the problem.
> 
>

Re: write count increase after 1.2 update

Posted by Reik Schatz <re...@gmail.com>.

Cool feature, didn't know it existed. It turned however out that everything
works fine! There was a configuration error that duplicated a AWS sns->sqs
subscription, so we go twice the amount of data delivered to our
application. Semi-lame post to this mailing list i guess :( I should have
checked that earlier but major thanks for looking into this and replying.

Cheers,
Reik


On Thu, Jan 17, 2013 at 4:28 AM, aaron morton <aa...@thelastpickle.com>wrote:

> You *may* be seeing this
> https://issues.apache.org/jira/browse/CASSANDRA-2503
>
> It was implemented in 1.1.0 but perhaps data in the original cluster is
> more compacted than the new one.
>
> Are the increases for all CF's are just a few?
> Do you have a work load of infrequent writes to rows followed by wide
> reads ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 16/01/2013, at 6:23 AM, Reik Schatz <re...@gmail.com> wrote:
>
> Hi, we are running a 1.1.6 (datastax) test cluster with 6 nodes. After the
> recent 1.2 release we have set up a second cluster - also having 6 nodes
> running 1.2 (datastax).
>
> They are now running in parallel. We noticed an increase in the number of
> writes in our monitoring tool (Datadog). The tool is using the write count
> statistic of nodetool cfstats. So we ran nodetool cfstats on one node in
> each cluster. To get an initial write count. Then we ran it again after 60
> sec. It looks like the 1.2 received about twice the amount of writes.
>
> The way our application is designed is that the writes are idempotent, so
> we don't see a size increase. Were there any changes in between 1.1.6 > 1.2
> that could explain this behavior?
>
> I know that 1.2 has the concept of virtual nodes, to spread out the data
> more evenly. So if the "write count" value was actually the sum of all
> writes to all nodes in the, this increase would make sense.
>
> Reik
>
> ps. the clusters are not 100% identical. i.e. since bloom filters are now
> off-heap, we changed settings for heap size and memtables. Cluster 1.1.6:
> heap 8G, memtables 1/3 of heap. Cluster 1.2.0: heap 4G, memtables 2G. Not
> sure it can have an impact on the problem.
>
>
>

Re: write count increase after 1.2 update

Posted by aaron morton <aa...@thelastpickle.com>.

You *may* be seeing this https://issues.apache.org/jira/browse/CASSANDRA-2503

It was implemented in 1.1.0 but perhaps data in the original cluster is more compacted than the new one. 

Are the increases for all CF's are just a few?
Do you have a work load of infrequent writes to rows followed by wide reads ?

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 16/01/2013, at 6:23 AM, Reik Schatz <re...@gmail.com> wrote:

> Hi, we are running a 1.1.6 (datastax) test cluster with 6 nodes. After the recent 1.2 release we have set up a second cluster - also having 6 nodes running 1.2 (datastax).
> 
> They are now running in parallel. We noticed an increase in the number of writes in our monitoring tool (Datadog). The tool is using the write count statistic of nodetool cfstats. So we ran nodetool cfstats on one node in each cluster. To get an initial write count. Then we ran it again after 60 sec. It looks like the 1.2 received about twice the amount of writes. 
> 
> The way our application is designed is that the writes are idempotent, so we don't see a size increase. Were there any changes in between 1.1.6 > 1.2 that could explain this behavior?
> 
> I know that 1.2 has the concept of virtual nodes, to spread out the data more evenly. So if the "write count" value was actually the sum of all writes to all nodes in the, this increase would make sense.
> 
> Reik
> 
> ps. the clusters are not 100% identical. i.e. since bloom filters are now off-heap, we changed settings for heap size and memtables. Cluster 1.1.6: heap 8G, memtables 1/3 of heap. Cluster 1.2.0: heap 4G, memtables 2G. Not sure it can have an impact on the problem.