You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2010/04/21 19:21:31 UTC

Re: Cassandra 0.5.1 restarts slow

[moving to user@]

0.6 fixes replaying faster than it can flush.

as for why it backs up in the first place before the restart, you can
either (a) throttle writes [set your timeout lower, make your clients
back off temporarily when it gets a timeoutexception] or (b) add
capacity.  (b) is recommended.

https://issues.apache.org/jira/browse/CASSANDRA-685 will mitigate this
but there is still no substitute for adding capacity to match demand.

On Tue, Apr 20, 2010 at 4:57 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Hi,
>
>  I have a cassandra cluster where a couple things are happening.  Every
> once in a while a node will start to get backed up.  Checking tpstats I
> see a very large value for ROW-MUTATION-STAGE.  Sometimes it will be able
> to clear it if I give it enough time, other times the vm OOMs.  With some
> nodes I also see this happen during restarts, I'll restart and have to
> wait 6-12 hours for the node to not be marked as 'Down'.
> I've seen
> http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
> and ended up with the following settings.
>
> KeysCachedFraction            : 0.01
> MemtableSizeInMB              : 100
> MemtableObjectCountInMillions : 0.5
> Heap                          : -Xmx5G
>
> I only have 2 CFs in this instance and entries are small so in most cases
> I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is
> about 60MB-120MB for the 2 CFs combined.
>
> Anyone have any pointers on where to look next?  These are m1.large EC2
> instances (I want to move to xlarge to get more memory, but haven't yet
> gotten clarification on the best process for node replacement, per my
> other thread).
>
> Thanks,
>
> -Anthony
>
> --
> ------------------------------------------------------------------------
> Anthony Molinaro                           <an...@alumni.caltech.edu>
>

Re: Cassandra 0.5.1 restarts slow

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
On Wed, Apr 21, 2010 at 01:24:45PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 1:11 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > Interesting, in the config I see
> >
> >  <!-- Time to wait for a reply from other nodes before failing the command -->
> >  <RpcTimeoutInMillis>5000</RpcTimeoutInMillis>
> >
> > So I thought that timeout was for inter-node communication not the thrift
> > API, but I see how you probably consider both inter-node traffic and
> > thrift traffic as clients.  Does this RPC Timeout apply to both?
> 
> rpctimeout applies to internal messages but if an operation times out
> at that level a Thrift exception will be passed to the client.

Ahh, I see, basically percolates back up the call chain.

> > Somewhat off-topic but relating to timeouts, is there any plans to tune
> > the timeouts for Gossip nodes?  EC2 network is horribly flakey, and I
> > often see node go Dead, the come back a few seconds later, so just
> > wondering if there's a way to tune the check to occur less frequently?
> 
> increase failuredetector.phiConvictThreshold.

Is that a property? (ie, do I set it with -Dfailuredetector.phiConvictThreshold)
What is the unit?  Are there other super secret properties that might
be useful for tuning?

Thanks,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Cassandra 0.5.1 restarts slow

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Apr 21, 2010 at 1:11 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
> Interesting, in the config I see
>
>  <!-- Time to wait for a reply from other nodes before failing the command -->
>  <RpcTimeoutInMillis>5000</RpcTimeoutInMillis>
>
> So I thought that timeout was for inter-node communication not the thrift
> API, but I see how you probably consider both inter-node traffic and
> thrift traffic as clients.  Does this RPC Timeout apply to both?

rpctimeout applies to internal messages but if an operation times out
at that level a Thrift exception will be passed to the client.

> Somewhat off-topic but relating to timeouts, is there any plans to tune
> the timeouts for Gossip nodes?  EC2 network is horribly flakey, and I
> often see node go Dead, the come back a few seconds later, so just
> wondering if there's a way to tune the check to occur less frequently?

increase failuredetector.phiConvictThreshold.

Re: Cassandra 0.5.1 restarts slow

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
On Wed, Apr 21, 2010 at 12:52:32PM -0500, Jonathan Ellis wrote:
> On Wed, Apr 21, 2010 at 12:45 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> >> as for why it backs up in the first place before the restart, you can
> >> either (a) throttle writes [set your timeout lower, make your clients
> >> back off temporarily when it gets a timeoutexception]
> >
> > What timeout is this?  Something in the thrift API or a cassandra
> > configuration?
> 
> the latter.  iirc it is "RPCTimeout"

Interesting, in the config I see

 <!-- Time to wait for a reply from other nodes before failing the command -->
 <RpcTimeoutInMillis>5000</RpcTimeoutInMillis>

So I thought that timeout was for inter-node communication not the thrift
API, but I see how you probably consider both inter-node traffic and
thrift traffic as clients.  Does this RPC Timeout apply to both?

Somewhat off-topic but relating to timeouts, is there any plans to tune
the timeouts for Gossip nodes?  EC2 network is horribly flakey, and I
often see node go Dead, the come back a few seconds later, so just
wondering if there's a way to tune the check to occur less frequently?

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>

Re: Cassandra 0.5.1 restarts slow

Posted by Jonathan Ellis <jb...@gmail.com>.
On Wed, Apr 21, 2010 at 12:45 PM, Anthony Molinaro
<an...@alumni.caltech.edu> wrote:
>> as for why it backs up in the first place before the restart, you can
>> either (a) throttle writes [set your timeout lower, make your clients
>> back off temporarily when it gets a timeoutexception]
>
> What timeout is this?  Something in the thrift API or a cassandra
> configuration?

the latter.  iirc it is "RPCTimeout"

Re: Cassandra 0.5.1 restarts slow

Posted by Anthony Molinaro <an...@alumni.caltech.edu>.
On Wed, Apr 21, 2010 at 12:21:31PM -0500, Jonathan Ellis wrote:
> [moving to user@]
> 
> 0.6 fixes replaying faster than it can flush.

Yeah, I noticed some of those fixes, and will probably take the leap into
0.6 if I can keep my cluster running (it's not doing too bad, I do about
400K reads and 250K writes per minute spread over 23 nodes), however some
of the m1.large instances get into this backed up state frequently. 
So I need to keep the cluster running first.

> as for why it backs up in the first place before the restart, you can
> either (a) throttle writes [set your timeout lower, make your clients
> back off temporarily when it gets a timeoutexception]

What timeout is this?  Something in the thrift API or a cassandra
configuration?

> or (b) add capacity.  (b) is recommended.

Yeah I've been doing that adding xlarge instances with raid0 disks which
work better, but I keep running into issues with the old instances which
hold up this work.  I'll keep chugging along and hopefully get things
sorted.

-Anthony

> 
> https://issues.apache.org/jira/browse/CASSANDRA-685 will mitigate this
> but there is still no substitute for adding capacity to match demand.
> 
> On Tue, Apr 20, 2010 at 4:57 PM, Anthony Molinaro
> <an...@alumni.caltech.edu> wrote:
> > Hi,
> >
> >  I have a cassandra cluster where a couple things are happening.  Every
> > once in a while a node will start to get backed up.  Checking tpstats I
> > see a very large value for ROW-MUTATION-STAGE.  Sometimes it will be able
> > to clear it if I give it enough time, other times the vm OOMs.  With some
> > nodes I also see this happen during restarts, I'll restart and have to
> > wait 6-12 hours for the node to not be marked as 'Down'.
> > I've seen
> > http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
> > and ended up with the following settings.
> >
> > KeysCachedFraction            : 0.01
> > MemtableSizeInMB              : 100
> > MemtableObjectCountInMillions : 0.5
> > Heap                          : -Xmx5G
> >
> > I only have 2 CFs in this instance and entries are small so in most cases
> > I hit MemtableObjectCountInMillions first and total MemtableSizeInMB is
> > about 60MB-120MB for the 2 CFs combined.
> >
> > Anyone have any pointers on where to look next?  These are m1.large EC2
> > instances (I want to move to xlarge to get more memory, but haven't yet
> > gotten clarification on the best process for node replacement, per my
> > other thread).
> >
> > Thanks,
> >
> > -Anthony
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <an...@alumni.caltech.edu>
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <an...@alumni.caltech.edu>