You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by "B. Todd Burruss" <bb...@real.com> on 2010/01/20 22:59:09 UTC

load balancing

feedback on trunk code ... i'm using code from trunk, 4 node cluster,
RF=3, W=Q, R=Q and did a nodeprobe loadbalance on the hot node.  the
cluster ran through the night and this morning i noticed a lot of
pending hinted-handoff-pool via tpstats.  looking thru the logs i notice
that the nodes are very frequently "InetAddress /192.168.132.102 is now
dead" then back UP .. flapping of the nodes.

is this triggering the hinted-handoff and is there a way to alleviate
this (other than lowering the load)?  it seems that the hinted handdoffs
have slowed or stopped the loadbalance.

i've been trying to load up the cluster with data, so i'm trying to do
2000+ puts a second with little or no reads.  it seems like the node
that is loadbalancing is bringing the performance of the cluster way
down.

RE: load balancing

Posted by Todd Burruss <bb...@real.com>.

:)

thx
________________________________________
From: Jonathan Ellis [jbellis@gmail.com]
Sent: Sunday, January 24, 2010 5:18 PM
To: cassandra-user@incubator.apache.org
Subject: Re: load balancing

On Sun, Jan 24, 2010 at 3:21 PM, Todd Burruss <bb...@real.com> wrote:
> one more note (and to reiterate) ...
>
> i believe a must have for cassandra is some better reporting (via JMX preferrably) on the state of a loadbalance / cleanup / removetoken etc.

I do too. :)  https://issues.apache.org/jira/browse/CASSANDRA-709

Re: load balancing

Posted by Jonathan Ellis <jb...@gmail.com>.

On Sun, Jan 24, 2010 at 3:21 PM, Todd Burruss <bb...@real.com> wrote:
> one more note (and to reiterate) ...
>
> i believe a must have for cassandra is some better reporting (via JMX preferrably) on the state of a loadbalance / cleanup / removetoken etc.

I do too. :)  https://issues.apache.org/jira/browse/CASSANDRA-709

RE: load balancing

Posted by Todd Burruss <bb...@real.com>.

one more note (and to reiterate) ...

i believe a must have for cassandra is some better reporting (via JMX preferrably) on the state of a loadbalance / cleanup / removetoken etc.  i have let my cluster run for more than a day and it doesn't appear to be finished, but the only indication i have that it is still running is because the performance is way off, but the load via my test clients hasn't changed.  and i still haven't seen the log message indicating it has finished.

i've also tried removetoken and cleanup with the same thought ... i don't know where the cluster is in the process, nor its state.  just performance has fallen way off.

something for 0.6 i hope?

thx

________________________________________
From: Todd Burruss
Sent: Friday, January 22, 2010 4:17 PM
To: cassandra-user@incubator.apache.org
Subject: Re: load balancing

i figured ;)

capacity planning is a bit difficult to figure out w/o a fair amount of
testing .. and i'm still not confident i know all the variables.



On Fri, 2010-01-22 at 07:33 -0800, Jonathan Ellis wrote:
> Then it sounds like the answer is "you need to leave some headroom for
> the i/o caused by loadbalance in your capacity planning," which you
> probably already knew. :)
>
> -Jonathan
>
> On Thu, Jan 21, 2010 at 3:48 PM, B. Todd Burruss <bb...@real.com> wrote:
> > i do have the fix.  just checked the code.
> >
> >
> > On Wed, 2010-01-20 at 19:16 -0800, Jonathan Ellis wrote:
> >> are you on a recent enough trunk to have the fix for
> >> https://issues.apache.org/jira/browse/CASSANDRA-715 ?  We had a
> >> regression for a while that caused HH delivery to infinite loop
> >> consuming a ton of CPU, which sounds like what you're seeing.
> >>
> >> On Wed, Jan 20, 2010 at 3:59 PM, B. Todd Burruss <bb...@real.com> wrote:
> >> > feedback on trunk code ... i'm using code from trunk, 4 node cluster,
> >> > RF=3, W=Q, R=Q and did a nodeprobe loadbalance on the hot node.  the
> >> > cluster ran through the night and this morning i noticed a lot of
> >> > pending hinted-handoff-pool via tpstats.  looking thru the logs i notice
> >> > that the nodes are very frequently "InetAddress /192.168.132.102 is now
> >> > dead" then back UP .. flapping of the nodes.
> >> >
> >> > is this triggering the hinted-handoff and is there a way to alleviate
> >> > this (other than lowering the load)?  it seems that the hinted handdoffs
> >> > have slowed or stopped the loadbalance.
> >> >
> >> > i've been trying to load up the cluster with data, so i'm trying to do
> >> > 2000+ puts a second with little or no reads.  it seems like the node
> >> > that is loadbalancing is bringing the performance of the cluster way
> >> > down.
> >> >
> >> >
> >> >
> >
> >
> >

Re: load balancing

Posted by "B. Todd Burruss" <bb...@real.com>.

i figured ;)

capacity planning is a bit difficult to figure out w/o a fair amount of
testing .. and i'm still not confident i know all the variables.



On Fri, 2010-01-22 at 07:33 -0800, Jonathan Ellis wrote: 
> Then it sounds like the answer is "you need to leave some headroom for
> the i/o caused by loadbalance in your capacity planning," which you
> probably already knew. :)
> 
> -Jonathan
> 
> On Thu, Jan 21, 2010 at 3:48 PM, B. Todd Burruss <bb...@real.com> wrote:
> > i do have the fix.  just checked the code.
> >
> >
> > On Wed, 2010-01-20 at 19:16 -0800, Jonathan Ellis wrote:
> >> are you on a recent enough trunk to have the fix for
> >> https://issues.apache.org/jira/browse/CASSANDRA-715 ?  We had a
> >> regression for a while that caused HH delivery to infinite loop
> >> consuming a ton of CPU, which sounds like what you're seeing.
> >>
> >> On Wed, Jan 20, 2010 at 3:59 PM, B. Todd Burruss <bb...@real.com> wrote:
> >> > feedback on trunk code ... i'm using code from trunk, 4 node cluster,
> >> > RF=3, W=Q, R=Q and did a nodeprobe loadbalance on the hot node.  the
> >> > cluster ran through the night and this morning i noticed a lot of
> >> > pending hinted-handoff-pool via tpstats.  looking thru the logs i notice
> >> > that the nodes are very frequently "InetAddress /192.168.132.102 is now
> >> > dead" then back UP .. flapping of the nodes.
> >> >
> >> > is this triggering the hinted-handoff and is there a way to alleviate
> >> > this (other than lowering the load)?  it seems that the hinted handdoffs
> >> > have slowed or stopped the loadbalance.
> >> >
> >> > i've been trying to load up the cluster with data, so i'm trying to do
> >> > 2000+ puts a second with little or no reads.  it seems like the node
> >> > that is loadbalancing is bringing the performance of the cluster way
> >> > down.
> >> >
> >> >
> >> >
> >
> >
> >

Re: load balancing

Posted by Jonathan Ellis <jb...@gmail.com>.

Then it sounds like the answer is "you need to leave some headroom for
the i/o caused by loadbalance in your capacity planning," which you
probably already knew. :)

-Jonathan

On Thu, Jan 21, 2010 at 3:48 PM, B. Todd Burruss <bb...@real.com> wrote:
> i do have the fix.  just checked the code.
>
>
> On Wed, 2010-01-20 at 19:16 -0800, Jonathan Ellis wrote:
>> are you on a recent enough trunk to have the fix for
>> https://issues.apache.org/jira/browse/CASSANDRA-715 ?  We had a
>> regression for a while that caused HH delivery to infinite loop
>> consuming a ton of CPU, which sounds like what you're seeing.
>>
>> On Wed, Jan 20, 2010 at 3:59 PM, B. Todd Burruss <bb...@real.com> wrote:
>> > feedback on trunk code ... i'm using code from trunk, 4 node cluster,
>> > RF=3, W=Q, R=Q and did a nodeprobe loadbalance on the hot node.  the
>> > cluster ran through the night and this morning i noticed a lot of
>> > pending hinted-handoff-pool via tpstats.  looking thru the logs i notice
>> > that the nodes are very frequently "InetAddress /192.168.132.102 is now
>> > dead" then back UP .. flapping of the nodes.
>> >
>> > is this triggering the hinted-handoff and is there a way to alleviate
>> > this (other than lowering the load)?  it seems that the hinted handdoffs
>> > have slowed or stopped the loadbalance.
>> >
>> > i've been trying to load up the cluster with data, so i'm trying to do
>> > 2000+ puts a second with little or no reads.  it seems like the node
>> > that is loadbalancing is bringing the performance of the cluster way
>> > down.
>> >
>> >
>> >
>
>
>

Re: load balancing

Posted by "B. Todd Burruss" <bb...@real.com>.

i do have the fix.  just checked the code.


On Wed, 2010-01-20 at 19:16 -0800, Jonathan Ellis wrote:
> are you on a recent enough trunk to have the fix for
> https://issues.apache.org/jira/browse/CASSANDRA-715 ?  We had a
> regression for a while that caused HH delivery to infinite loop
> consuming a ton of CPU, which sounds like what you're seeing.
> 
> On Wed, Jan 20, 2010 at 3:59 PM, B. Todd Burruss <bb...@real.com> wrote:
> > feedback on trunk code ... i'm using code from trunk, 4 node cluster,
> > RF=3, W=Q, R=Q and did a nodeprobe loadbalance on the hot node.  the
> > cluster ran through the night and this morning i noticed a lot of
> > pending hinted-handoff-pool via tpstats.  looking thru the logs i notice
> > that the nodes are very frequently "InetAddress /192.168.132.102 is now
> > dead" then back UP .. flapping of the nodes.
> >
> > is this triggering the hinted-handoff and is there a way to alleviate
> > this (other than lowering the load)?  it seems that the hinted handdoffs
> > have slowed or stopped the loadbalance.
> >
> > i've been trying to load up the cluster with data, so i'm trying to do
> > 2000+ puts a second with little or no reads.  it seems like the node
> > that is loadbalancing is bringing the performance of the cluster way
> > down.
> >
> >
> >

Re: load balancing

Posted by Jonathan Ellis <jb...@gmail.com>.

are you on a recent enough trunk to have the fix for
https://issues.apache.org/jira/browse/CASSANDRA-715 ?  We had a
regression for a while that caused HH delivery to infinite loop
consuming a ton of CPU, which sounds like what you're seeing.

On Wed, Jan 20, 2010 at 3:59 PM, B. Todd Burruss <bb...@real.com> wrote:
> feedback on trunk code ... i'm using code from trunk, 4 node cluster,
> RF=3, W=Q, R=Q and did a nodeprobe loadbalance on the hot node.  the
> cluster ran through the night and this morning i noticed a lot of
> pending hinted-handoff-pool via tpstats.  looking thru the logs i notice
> that the nodes are very frequently "InetAddress /192.168.132.102 is now
> dead" then back UP .. flapping of the nodes.
>
> is this triggering the hinted-handoff and is there a way to alleviate
> this (other than lowering the load)?  it seems that the hinted handdoffs
> have slowed or stopped the loadbalance.
>
> i've been trying to load up the cluster with data, so i'm trying to do
> 2000+ puts a second with little or no reads.  it seems like the node
> that is loadbalancing is bringing the performance of the cluster way
> down.
>
>
>