You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Tamar Fraenkel <ta...@tok-media.com> on 2012/10/08 22:08:12 UTC

READ messages dropped

Hi!
In the last 3 days I see many messages of "READ messages dropped in last
5000ms" on one of my 3 nodes cluster.
I see no errors in the log.
There are also messages of "Finished hinted handoff of 0 rows to endpoint"
but I had those for a while now, so I don't know if they are related.
I am running Cassandra 1.0.8 on a 3 node cluster on EC2 m1.large instances.
Rep factor 3 (Quorum read and write)

Does anyone have a clue what I should be looking for, or how to solve it?
Thanks,

*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956

Re: READ messages dropped

Posted by Tyler Hobbs <ty...@datastax.com>.

On Fri, Oct 12, 2012 at 2:24 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:

>
> Thanks for the response. My cluster is in a bad state those recent days.
>
> I have 29 CFs, and my disk is 5% full... So I guess the VMs still have
> more space to go, and I am not sure this is considered many CFs.
>

That's not too many CFs.  I don't know how much 5% of your disk space is in
absolute numbers, which is more important.  The most important measure for
whether you are approaching limits is really disk utilization (as in how
busy the disk is, not how much data it's holding).  OpsCenter exposes
metrics for this that you should check.

>
> But maybe I have memory issues. I enlarge cassandra memory from about ~2G
> to ~4G (out of ~8G). This was done because at that stage I had lots of key
> caches. I then reduced them to almost 0 on all CF. I guess now I can reduce
> the memory back to ~2 or ~3 G. Will that help?

I would leave your heap at 4G.  You really do want key caching enabled in
almost all circumstances; it can save you a lot of disk activity on reads.
If you need to bump your heap up to 4.5G to accommodate key caches, it's
worth it.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: READ messages dropped

Posted by Tamar Fraenkel <ta...@tok-media.com>.

Hi!
Thanks for the response. My cluster is in a bad state those recent days.

I have 29 CFs, and my disk is 5% full... So I guess the VMs still have more
space to go, and I am not sure this is considered many CFs.

But maybe I have memory issues. I enlarge cassandra memory from about ~2G
to ~4G (out of ~8G). This was done because at that stage I had lots of key
caches. I then reduced them to almost 0 on all CF. I guess now I can reduce
the memory back to ~2 or ~3 G. Will that help?
Thanks
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956

On Thu, Oct 11, 2012 at 10:46 PM, Tyler Hobbs <ty...@datastax.com> wrote:

> On Wed, Oct 10, 2012 at 3:10 PM, Tamar Fraenkel <ta...@tok-media.com>wrote:
>
>>
>> What I did noticed while looking at the logs (which are also running
>> OpsCenter), is that there is some correlation between the dropped reads and
>> flushes of OpsCenter column families to disk and or compactions. What are
>> the rollups CFs? why is there so much traffic in them?
>
>
> The rollups CFs hold the performance metric data that OpsCenter stores
> about your cluster.  Typically these aren't actually very high traffic
> column families, but that depends on how many column families you have
> (more CFs require more metrics to be stored).  If you have a lot of column
> families, you have a couple of options for reducing the amount of metric
> data that's stored:
> http://www.datastax.com/docs/opscenter/trouble_shooting_opsc#limiting-the-metrics-collected-by-opscenter
>
> Assuming you don't have a large number of CFs, your nodes may legitimately
> be nearing capacity.
>
> --
> Tyler Hobbs
> DataStax <http://datastax.com/>
>
>

Re: READ messages dropped

Posted by Tyler Hobbs <ty...@datastax.com>.

On Wed, Oct 10, 2012 at 3:10 PM, Tamar Fraenkel <ta...@tok-media.com> wrote:

>
> What I did noticed while looking at the logs (which are also running
> OpsCenter), is that there is some correlation between the dropped reads and
> flushes of OpsCenter column families to disk and or compactions. What are
> the rollups CFs? why is there so much traffic in them?

The rollups CFs hold the performance metric data that OpsCenter stores
about your cluster.  Typically these aren't actually very high traffic
column families, but that depends on how many column families you have
(more CFs require more metrics to be stored).  If you have a lot of column
families, you have a couple of options for reducing the amount of metric
data that's stored:
http://www.datastax.com/docs/opscenter/trouble_shooting_opsc#limiting-the-metrics-collected-by-opscenter

Assuming you don't have a large number of CFs, your nodes may legitimately
be nearing capacity.

-- 
Tyler Hobbs
DataStax <http://datastax.com/>

Re: READ messages dropped

Posted by Tamar Fraenkel <ta...@tok-media.com>.

Hi!
Thanks for the answer.
I don't see much change in the load this Cassandra cluster is under, so why
is the sudden surge of such messages?
What I did noticed while looking at the logs (which are also running
OpsCenter), is that there is some correlation between the dropped reads and
flushes of OpsCenter column families to disk and or compactions. What are
the rollups CFs? why is there so much traffic in them?
Thanks,
*Tamar Fraenkel *
Senior Software Engineer, TOK Media

[image: Inline image 1]

tamar@tok-media.com
Tel:   +972 2 6409736
Mob:  +972 54 8356490
Fax:   +972 2 5612956





On Wed, Oct 10, 2012 at 1:00 AM, aaron morton <aa...@thelastpickle.com>wrote:

> or how to solve it?
>
> Simple solution is move to m1.xlarge :)
>
> In the last 3 days I see many messages of "READ messages dropped in last
> 5000ms" on one of my 3 nodes cluster.
>
> The node is not able to keep up with the load.
>
> Possible causes include excessive GC, aggressive compaction, or simply too
> many requests.
>
> it also a good idea to take a look at iostat to see if the disk is keeping
> up.
>
> Hope that helps
>
>   -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 9/10/2012, at 9:08 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:
>
> Hi!
> In the last 3 days I see many messages of "READ messages dropped in last
> 5000ms" on one of my 3 nodes cluster.
> I see no errors in the log.
> There are also messages of "Finished hinted handoff of 0 rows to endpoint"
> but I had those for a while now, so I don't know if they are related.
> I am running Cassandra 1.0.8 on a 3 node cluster on EC2 m1.large
> instances. Rep factor 3 (Quorum read and write)
>
> Does anyone have a clue what I should be looking for, or how to solve it?
> Thanks,
>
> *Tamar Fraenkel *
> Senior Software Engineer, TOK Media
>
> <tokLogo.png>
>
>
> tamar@tok-media.com
> Tel:   +972 2 6409736
> Mob:  +972 54 8356490
> Fax:   +972 2 5612956
>
>
>
>
>

Re: READ messages dropped

Posted by aaron morton <aa...@thelastpickle.com>.

> or how to solve it?
Simple solution is move to m1.xlarge :)

> In the last 3 days I see many messages of "READ messages dropped in last 5000ms" on one of my 3 nodes cluster.
The node is not able to keep up with the load. 

Possible causes include excessive GC, aggressive compaction, or simply too many requests.

it also a good idea to take a look at iostat to see if the disk is keeping up. 

Hope that helps 
 
-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 9/10/2012, at 9:08 AM, Tamar Fraenkel <ta...@tok-media.com> wrote:

> Hi!
> In the last 3 days I see many messages of "READ messages dropped in last 5000ms" on one of my 3 nodes cluster.
> I see no errors in the log.
> There are also messages of "Finished hinted handoff of 0 rows to endpoint" but I had those for a while now, so I don't know if they are related.
> I am running Cassandra 1.0.8 on a 3 node cluster on EC2 m1.large instances. Rep factor 3 (Quorum read and write)
> 
> Does anyone have a clue what I should be looking for, or how to solve it?
> Thanks,
> 
> Tamar Fraenkel 
> Senior Software Engineer, TOK Media 
> 
> <tokLogo.png>
> 
> tamar@tok-media.com
> Tel:   +972 2 6409736 
> Mob:  +972 54 8356490 
> Fax:   +972 2 5612956 
> 
> 
>