You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Andrew Bialecki <an...@gmail.com> on 2013/07/03 18:59:31 UTC

Lots of replicate on write tasks pending, want to investigate

In one of our load tests, we're incrementing a single counter column as
well as appending columns to a single row (essentially a timeline). You can
think of it as counting the instances of an event and then keeping a
timeline of those events. The ratio is of increments to "appends" is 1:1.

When we run this on a test cluster with RF = 3, one node gets backed up
with a lot of replicate on write tasks pending, eventually maxing out at
4128. We think it's a disk I/O issue that's causing the slowdown (lot of
reads), but we're still investigating. A few questions that might speed up
understanding the issue:

1. Is there any way to see metadata about the replicate on write tasks
pending? We're splitting apart the load test to pinpoint which of those
operations is causing an issue, but if there's a way to see that queue,
that might save us some work.

2. I'm assuming in our case the cause is incrementing counters because disk
reads are part of the write path for counters and are not for appending
columns to a row. Does that logic make sense?

Thanks in advance,
Andrew

Re: Lots of replicate on write tasks pending, want to investigate

Posted by Sylvain Lebresne <sy...@datastax.com>.
> The write path (not replicate on write) for counters involves a read,
>

I'm afraid you got it wrong. The read done during counter writes *is* done
by the replicate on write taks. Though really, the replicate on write taks
are just one part of the counter write path (they are not "not the write
path").

--
Sylvain


>
> On Wed, Jul 3, 2013 at 1:03 PM, Robert Coli <rc...@eventbrite.com> wrote:
>
>> On Wed, Jul 3, 2013 at 9:59 AM, Andrew Bialecki <
>> andrew.bialecki@gmail.com> wrote:
>>
>>> 2. I'm assuming in our case the cause is incrementing counters because
>>> disk reads are part of the write path for counters and are not for
>>> appending columns to a row. Does that logic make sense?
>>>
>>
>> That's a pretty reasonable assumption if you are not doing any other
>> reads and you see your disk busy doing non-compaction related reads. :)
>>
>> =Rob
>>
>
>

Re: Lots of replicate on write tasks pending, want to investigate

Posted by Andrew Bialecki <an...@gmail.com>.
Can someone remind me why replicate on write tasks might be related to the
high disk I/O? My understanding is the replicate on write involves sending
the update to other nodes, so it shouldn't involve any disk activity --
disk activity would be during the mutation/write phase.

The write path (not replicate on write) for counters involves a read, so
that explains the high disk I/O, but for that I'd expect to see many write
requests pending (which we see a bit), but not replicate on writes backing
up. What am I missing?

Andrew


On Wed, Jul 3, 2013 at 1:03 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Jul 3, 2013 at 9:59 AM, Andrew Bialecki <andrew.bialecki@gmail.com
> > wrote:
>
>> 2. I'm assuming in our case the cause is incrementing counters because
>> disk reads are part of the write path for counters and are not for
>> appending columns to a row. Does that logic make sense?
>>
>
> That's a pretty reasonable assumption if you are not doing any other reads
> and you see your disk busy doing non-compaction related reads. :)
>
> =Rob
>

Re: Lots of replicate on write tasks pending, want to investigate

Posted by Robert Coli <rc...@eventbrite.com>.
On Wed, Jul 3, 2013 at 9:59 AM, Andrew Bialecki
<an...@gmail.com>wrote:

> 2. I'm assuming in our case the cause is incrementing counters because
> disk reads are part of the write path for counters and are not for
> appending columns to a row. Does that logic make sense?
>

That's a pretty reasonable assumption if you are not doing any other reads
and you see your disk busy doing non-compaction related reads. :)

=Rob