You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@cassandra.apache.org by Anthony Molinaro <an...@alumni.caltech.edu> on 2010/04/20 23:57:24 UTC

Cassandra 0.5.1 restarts slow

Hi,

  I have a cassandra cluster where a couple things are happening.  Every
once in a while a node will start to get backed up.  Checking tpstats I
see a very large value for ROW-MUTATION-STAGE.  Sometimes it will be able
to clear it if I give it enough time, other times the vm OOMs.  With some
nodes I also see this happen during restarts, I'll restart and have to
wait 6-12 hours for the node to not be marked as 'Down'.
I've seen
http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
and ended up with the following settings.

KeysCachedFraction            : 0.01
MemtableSizeInMB              : 100
MemtableObjectCountInMillions : 0.5
Heap                          : -Xmx5G

I only have 2 CFs in this instance and entries are small so in most cases
I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is
about 60MB-120MB for the 2 CFs combined.

Anyone have any pointers on where to look next?  These are m1.large EC2
instances (I want to move to xlarge to get more memory, but haven't yet
gotten clarification on the best process for node replacement, per my
other thread).

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Cassandra 0.5.1 restarts slow

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
On Wed, Apr 21, 2010 at 01:24:45PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 1:11 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > Interesting, in the config I see
> >
> >  <!-- Time to wait for a reply from other nodes before failing the command -->
> >  <RpcTimeoutInMillis>5000</RpcTimeoutInMillis>
> >
> > So I thought that timeout was for inter-node communication not the thrift
> > API, but I see how you probably consider both inter-node traffic and
> > thrift traffic as clients.  Does this RPC Timeout apply to both?
> 
> rpctimeout applies to internal messages but if an operation times out
> at that level a Thrift exception will be passed to the client.

Ahh, I see, basically percolates back up the call chain.

> > Somewhat off-topic but relating to timeouts, is there any plans to tune
> > the timeouts for Gossip nodes?  EC2 network is horribly flakey, and I
> > often see node go Dead, the come back a few seconds later, so just
> > wondering if there's a way to tune the check to occur less frequently?
> 
> increase failuredetector.phiConvictThreshold.

Is that a property? (ie, do I set it with -Dfailuredetector.phiConvictThreshold)
What is the unit?  Are there other super secret properties that might
be useful for tuning?

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Cassandra 0.5.1 restarts slow

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Apr 21, 2010 at 1:11 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Interesting, in the config I see
>
>  <!-- Time to wait for a reply from other nodes before failing the command -->
>  <RpcTimeoutInMillis>5000</RpcTimeoutInMillis>
>
> So I thought that timeout was for inter-node communication not the thrift
> API, but I see how you probably consider both inter-node traffic and
> thrift traffic as clients.  Does this RPC Timeout apply to both?

rpctimeout applies to internal messages but if an operation times out
at that level a Thrift exception will be passed to the client.

> Somewhat off-topic but relating to timeouts, is there any plans to tune
> the timeouts for Gossip nodes?  EC2 network is horribly flakey, and I
> often see node go Dead, the come back a few seconds later, so just
> wondering if there's a way to tune the check to occur less frequently?

increase failuredetector.phiConvictThreshold.

Re: Cassandra 0.5.1 restarts slow

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
On Wed, Apr 21, 2010 at 12:52:32PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 12:45 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> >> as for why it backs up in the first place before the restart, you can
> >> either (a) throttle writes [set your timeout lower, make your clients
> >> back off temporarily when it gets a timeoutexception]
> >
> > What timeout is this?  Something in the thrift API or a cassandra
> > configuration?
> 
> the latter.  iirc it is "RPCTimeout"

Interesting, in the config I see

 <!-- Time to wait for a reply from other nodes before failing the command -->
 <RpcTimeoutInMillis>5000</RpcTimeoutInMillis>

So I thought that timeout was for inter-node communication not the thrift
API, but I see how you probably consider both inter-node traffic and
thrift traffic as clients.  Does this RPC Timeout apply to both?

Somewhat off-topic but relating to timeouts, is there any plans to tune
the timeouts for Gossip nodes?  EC2 network is horribly flakey, and I
often see node go Dead, the come back a few seconds later, so just
wondering if there's a way to tune the check to occur less frequently?

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Cassandra 0.5.1 restarts slow

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Apr 21, 2010 at 12:45 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
>> as for why it backs up in the first place before the restart, you can
>> either (a) throttle writes [set your timeout lower, make your clients
>> back off temporarily when it gets a timeoutexception]
>
> What timeout is this?  Something in the thrift API or a cassandra
> configuration?

the latter.  iirc it is "RPCTimeout"

Re: Cassandra 0.5.1 restarts slow

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
On Wed, Apr 21, 2010 at 12:21:31PM -0500, Jonathan Ellis wrote:
> [moving to user@]
> 
> 0.6 fixes replaying faster than it can flush.

Yeah, I noticed some of those fixes, and will probably take the leap into
0.6 if I can keep my cluster running (it's not doing too bad, I do about
400K reads and 250K writes per minute spread over 23 nodes), however some
of the m1.large instances get into this backed up state frequently. 
So I need to keep the cluster running first.

> as for why it backs up in the first place before the restart, you can
> either (a) throttle writes [set your timeout lower, make your clients
> back off temporarily when it gets a timeoutexception]

What timeout is this?  Something in the thrift API or a cassandra
configuration?

> or (b) add capacity.  (b) is recommended.

Yeah I've been doing that adding xlarge instances with raid0 disks which
work better, but I keep running into issues with the old instances which
hold up this work.  I'll keep chugging along and hopefully get things
sorted.

-Anthony

> 
> https://issues.apache.org/jira/browse/CASSANDRA-685 will mitigate this
> but there is still no substitute for adding capacity to match demand.
> 
> On Tue, Apr 20, 2010 at 4:57 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > Hi,
> >
> >  I have a cassandra cluster where a couple things are happening.  Every
> > once in a while a node will start to get backed up.  Checking tpstats I
> > see a very large value for ROW-MUTATION-STAGE.  Sometimes it will be able
> > to clear it if I give it enough time, other times the vm OOMs.  With some
> > nodes I also see this happen during restarts, I'll restart and have to
> > wait 6-12 hours for the node to not be marked as 'Down'.
> > I've seen
> > http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
> > and ended up with the following settings.
> >
> > KeysCachedFraction            : 0.01
> > MemtableSizeInMB              : 100
> > MemtableObjectCountInMillions : 0.5
> > Heap                          : -Xmx5G
> >
> > I only have 2 CFs in this instance and entries are small so in most cases
> > I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is
> > about 60MB-120MB for the 2 CFs combined.
> >
> > Anyone have any pointers on where to look next?  These are m1.large EC2
> > instances (I want to move to xlarge to get more memory, but haven't yet
> > gotten clarification on the best process for node replacement, per my
> > other thread).
> >
> > Thanks,
> >
> > -Anthony
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <an...@alumni.caltech.edu>
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Cassandra 0.5.1 restarts slow

Posted by Jonathan Ellis <jb...@gmail.com>.
[moving to user@]

0.6 fixes replaying faster than it can flush.

as for why it backs up in the first place before the restart, you can
either (a) throttle writes [set your timeout lower, make your clients
back off temporarily when it gets a timeoutexception] or (b) add
capacity.  (b) is recommended.

https://issues.apache.org/jira/browse/CASSANDRA-685 will mitigate this
but there is still no substitute for adding capacity to match demand.

On Tue, Apr 20, 2010 at 4:57 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Hi,
>
>  I have a cassandra cluster where a couple things are happening.  Every
> once in a while a node will start to get backed up.  Checking tpstats I
> see a very large value for ROW-MUTATION-STAGE.  Sometimes it will be able
> to clear it if I give it enough time, other times the vm OOMs.  With some
> nodes I also see this happen during restarts, I'll restart and have to
> wait 6-12 hours for the node to not be marked as 'Down'.
> I've seen
> http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
> and ended up with the following settings.
>
> KeysCachedFraction            : 0.01
> MemtableSizeInMB              : 100
> MemtableObjectCountInMillions : 0.5
> Heap                          : -Xmx5G
>
> I only have 2 CFs in this instance and entries are small so in most cases
> I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is
> about 60MB-120MB for the 2 CFs combined.
>
> Anyone have any pointers on where to look next?  These are m1.large EC2
> instances (I want to move to xlarge to get more memory, but haven't yet
> gotten clarification on the best process for node replacement, per my
> other thread).
>
> Thanks,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <an...@alumni.caltech.edu>
>