You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Alan Conway <ac...@redhat.com> on 2006/09/08 17:30:56 UTC

Dynamic balancing of queues.

Just got thinking about this - how to ensure work is distributed
efficiently.

Simple case: 1 q many consumers. Automatically balances itself because
consumers request messages at exactly the rate they can consume them.

To scale up we need more qs. The demo aims for balance by having an
exchange that tries to keep the queues to the same length.

When work is coming in faster than it's being processed the exchange can
balance the queues by adding messages to the short queues.

But what happens during slow periods when work is processed faster than
it comes in? The fast queues are emptying out and there's work trapped
on the slow queues. The exchange can't balance things by adding messages
because there are no new messages to add.

So let the exchange act as a consumer and *remove* messages from
over-full queues to re-queue them on under-full/empty ones. Now we have
a full dynamic balance in both growing and shrinking phases

This gives us the best balance but it incurs overhead when the broker
has to move message between queues. So perhaps a final optimization is
to have the exchange track not only the fullness of the queues but the
rate at which they empty. Then instead of aiming for every q having the
same length, it can (over time) aim for queues to be filled in
proportion to their throughput. I need to think about that some more
optimization strategies don't always perform the way you expect!

Thoughts?
Alan.

Re: Dynamic balancing of queues.

Posted by Hiram Chirino <hi...@hiramchirino.com>.

Also ActiveMQ :)

On 9/13/06, Gordon Sim <gs...@redhat.com> wrote:
> Carl Trieloff wrote:
>
> > I believe that the simple way is to look at clustering for scalability
> > and for fault tolerance as separate user decisions.
> >
> > Simplistically
> > a.) if every broker can be configured to have a passive companion for FT
> > b.) then a scalability cluster can be made up of single brokers
> > (Scalable not FT) or active-passive pairs ( FT + Scalable)
>
> Thats the approach sonic take also AFAIK.
>


-- 
Regards,
Hiram

Blog: http://hiramchirino.com

Re: Dynamic balancing of queues.

Posted by Gordon Sim <gs...@redhat.com>.

Carl Trieloff wrote:

> I believe that the simple way is to look at clustering for scalability 
> and for fault tolerance as separate user decisions.
> 
> Simplistically
> a.) if every broker can be configured to have a passive companion for FT
> b.) then a scalability cluster can be made up of single brokers 
> (Scalable not FT) or active-passive pairs ( FT + Scalable)

Thats the approach sonic take also AFAIK.

Re: Dynamic balancing of queues.

Posted by Carl Trieloff <cc...@redhat.com>.

> [As an aside, if we are talking here of clustering for scalability 
> rather than fault tolerance, I am not sure that active-passive logic 
> has a role. All the brokers would be active or there wouldn't be a 
> gain in scalability.]
>
>
I believe that the simple way is to look at clustering for scalability 
and for fault tolerance as separate user decisions.

Simplistically
a.) if every broker can be configured to have a passive companion for FT
b.) then a scalability cluster can be made up of single brokers 
(Scalable not FT) or active-passive pairs ( FT + Scalable)

Carl.

Re: Dynamic balancing of queues.

Posted by Gordon Sim <gs...@redhat.com>.

Carl Trieloff wrote:
> For load distributing ordering is (may I say) never a requirement. - 
> priority however should be honored. why can't load distribution
> be a special case of an exchange, allowing the clustering to be 
> simplified to just the active-passive logic?

Strict ordering may not be, but you wouldn't want to have too much 
variability(?). Doing load-balancing in the exchange by just routing 
messages to one of several queues I have no problem with. But extending 
that role to dequeue messages from longer queues and enqueue them to 
shorter ones didn't seem like the right design to me. A distributed 
queue seemed to more accurately reflect what is required and also be 
more generally applicable.

However, as Alan rightly points out, the success of any approach needs 
to be measured in how well it realises the desired increase in 
scalability. Getting a solution that works also involves a lot more than 
writing a sentence or two on what 'feels right' from a design 
perspective. On top of that I think I came in mid-way through the 
conversation, began by merely asking a question and got sucked in to 
thinking aloud. I therefore withdraw my opinions as being premature, 
speculative and based on an incomplete understanding of the problem space.

[As an aside, if we are talking here of clustering for scalability 
rather than fault tolerance, I am not sure that active-passive logic has 
a role. All the brokers would be active or there wouldn't be a gain in 
scalability.]

Re: Dynamic balancing of queues.

Posted by Carl Trieloff <cc...@redhat.com>.

Gordon Sim wrote:
> Alan Conway wrote:
>> On Mon, 2006-09-11 at 14:37 +0100, Gordon Sim wrote:
>>> Why do we need extra queues to scale up? 
>>
>> For a single broker you don't, the broker should be optimized to take
>> full advantage of local CPUs etc. with a single queue.
>>
>> I'm thinking about the large scale deployment where you need to
>> distribute load across multiple hosts (possibly on different networks)
>> because either the CPUs on the exchange host or it's network
>> connectivity are a bottleneck. 
>
> Ok, understood.
>
>>>> So let the exchange act as a consumer and *remove* messages from
>>>> over-full queues to re-queue them on under-full/empty ones. Now we 
>>>> have
>>>> a full dynamic balance in both growing and shrinking phases
>>> This complicates the exchange though as it now presumably needs to 
>>> periodically monitor the queue lengths (i.e. it becomes an active 
>>> entity rather than just reacting to publications routed through it). 
>>> Maybe an entirely separate re-balancing component would be cleaner?
>>
>> Maybe, need to think about that. My intuition is that the broker can do
>> a better job of this because it's a single place to keep the statistics,
>> but a distributed cleanup component might have advantages if network
>> bandwidth at the broker is the bottleneck.
>
> I'm not saying that component shouldn't be in the broker(s) just that 
> it isn't necessarily part of an exchange.
>
>>> I'm not really sure I understand the root problem here. i.e. why do 
>>> we want multiple queues of the same (or similar) length?
>
>> Trying to balance multiple queues only makes sense if there's a resource
>> problem with a everyone talking to a single queue - not enough memory,
>> not enough open file descriptors, performance degrades due to memory
>> requirements, network topology/firewalls etc. You can imagine situations
>> where a single broker with 1,000,000 consumers might not perform as well
>> as a federation of 1001 brokers each with 1000 consumers.
>
> My confusion here stemmed from not understanding that you were talking 
> about a group of co-operating brokers. (This is actually the use case 
> that the java clustering code currently in svn was designed for though 
> any actual improvements to scalability have not been confirmed through 
> testing).
>
> That being the case I would argue even more strongly against the 
> exchange removing messages from queues it has delivered them to and 
> redelivering them to shorter queues to load balance. For one thing 
> that would have implications for ordering. 
For load distributing ordering is (may I say) never a requirement. - 
priority however should be honored. why can't load distribution
be a special case of an exchange, allowing the clustering to be 
simplified to just the active-passive logic?
> A single logical queue that is in implementation distributed would 
> seem like a better fit from the design point of view (the clustering 
> code mentioned in the previous paragraph does something similar).
>
>> That said need to work hard optimizing the broker so that the single
>> queue solution can scale as far as possible before we get into more
>> complicated federations and the like. We should also look at some real
>> data before assuming that such federation will solve a real-world
>> problem, this stuff doesn't always work out the way you think it will!
>
> Agreed! The justification for the earlier work on java was purely to 
> get feature parity with an alternative implementation.

Re: Dynamic balancing of queues.

Posted by Gordon Sim <gs...@redhat.com>.

Alan Conway wrote:
> On Mon, 2006-09-11 at 14:37 +0100, Gordon Sim wrote:
>> Why do we need extra queues to scale up? 
> 
> For a single broker you don't, the broker should be optimized to take
> full advantage of local CPUs etc. with a single queue.
> 
> I'm thinking about the large scale deployment where you need to
> distribute load across multiple hosts (possibly on different networks)
> because either the CPUs on the exchange host or it's network
> connectivity are a bottleneck. 

Ok, understood.

>>> So let the exchange act as a consumer and *remove* messages from
>>> over-full queues to re-queue them on under-full/empty ones. Now we have
>>> a full dynamic balance in both growing and shrinking phases
>> This complicates the exchange though as it now presumably needs to 
>> periodically monitor the queue lengths (i.e. it becomes an active entity 
>> rather than just reacting to publications routed through it). Maybe an 
>> entirely separate re-balancing component would be cleaner?
> 
> Maybe, need to think about that. My intuition is that the broker can do
> a better job of this because it's a single place to keep the statistics,
> but a distributed cleanup component might have advantages if network
> bandwidth at the broker is the bottleneck.

I'm not saying that component shouldn't be in the broker(s) just that it 
isn't necessarily part of an exchange.

>> I'm not really sure I understand the root problem here. i.e. why do we 
>> want multiple queues of the same (or similar) length?

> Trying to balance multiple queues only makes sense if there's a resource
> problem with a everyone talking to a single queue - not enough memory,
> not enough open file descriptors, performance degrades due to memory
> requirements, network topology/firewalls etc. You can imagine situations
> where a single broker with 1,000,000 consumers might not perform as well
> as a federation of 1001 brokers each with 1000 consumers.

My confusion here stemmed from not understanding that you were talking 
about a group of co-operating brokers. (This is actually the use case 
that the java clustering code currently in svn was designed for though 
any actual improvements to scalability have not been confirmed through 
testing).

That being the case I would argue even more strongly against the 
exchange removing messages from queues it has delivered them to and 
redelivering them to shorter queues to load balance. For one thing that 
would have implications for ordering. A single logical queue that is in 
implementation distributed would seem like a better fit from the design 
point of view (the clustering code mentioned in the previous paragraph 
does something similar).

> That said need to work hard optimizing the broker so that the single
> queue solution can scale as far as possible before we get into more
> complicated federations and the like. We should also look at some real
> data before assuming that such federation will solve a real-world
> problem, this stuff doesn't always work out the way you think it will!

Agreed! The justification for the earlier work on java was purely to get 
feature parity with an alternative implementation.

Re: Dynamic balancing of queues.

Posted by Alan Conway <ac...@redhat.com>.

On Mon, 2006-09-11 at 14:37 +0100, Gordon Sim wrote:
> Why do we need extra queues to scale up? 

For a single broker you don't, the broker should be optimized to take
full advantage of local CPUs etc. with a single queue.

I'm thinking about the large scale deployment where you need to
distribute load across multiple hosts (possibly on different networks)
because either the CPUs on the exchange host or it's network
connectivity are a bottleneck. 

> > So let the exchange act as a consumer and *remove* messages from
> > over-full queues to re-queue them on under-full/empty ones. Now we have
> > a full dynamic balance in both growing and shrinking phases
> 
> This complicates the exchange though as it now presumably needs to 
> periodically monitor the queue lengths (i.e. it becomes an active entity 
> rather than just reacting to publications routed through it). Maybe an 
> entirely separate re-balancing component would be cleaner?

Maybe, need to think about that. My intuition is that the broker can do
a better job of this because it's a single place to keep the statistics,
but a distributed cleanup component might have advantages if network
bandwidth at the broker is the bottlneck.

> I'm not really sure I understand the root problem here. i.e. why do we 
> want multiple queues of the same (or similar) length?

Because you want to keep all the the consumers busy all the time. If you
have 10 queues and 5 of them empty out because they have fast consumers,
and you don't have any new work to put on them, then you don't want to
leave them idle while the other 5 slow consumers chew through the
remaining jobs.

The ideal solution to this is of course to have only 1 queue and let
each consumer take items at its own pace - that naturally balances
itself. 

Trying to balance multiple queues only makes sense if there's a resource
problem with a everyone talking to a single queue - not enough memory,
not enough open file descriptors, performance degrades due to memory
requirements, network topology/firewalls etc. You can imagine situations
where a single broker with 1,000,000 consumers might not perform as well
as a federation of 1001 brokers each with 1000 consumers.

That said need to work hard optimizing the broker so that the single
queue solution can scale as far as possible before we get into more
complicated federations and the like. We should also look at some real
data before assuming that such federation will solve a real-world
problem, this stuff doesn't always work out the way you think it will!

Re: Dynamic balancing of queues.

Posted by Gordon Sim <gs...@redhat.com>.

Alan Conway wrote:
> Just got thinking about this - how to ensure work is distributed
> efficiently.
> 
> Simple case: 1 q many consumers. Automatically balances itself because
> consumers request messages at exactly the rate they can consume them.
> 
> To scale up we need more qs. 

Why do we need extra queues to scale up? Perhaps separate queues could 
result in slightly less lock contention but other than that it doesn't 
seem to improve scalability unless you moved the separate queues to 
different broker processes. Queue length could also be kept down for 
each individual queue but the overall resources required wouldn't be any 
less. Or am I missing something obvious?

> The demo aims for balance by having an
> exchange that tries to keep the queues to the same length.
> 
> When work is coming in faster than it's being processed the exchange can
> balance the queues by adding messages to the short queues.
> 
> But what happens during slow periods when work is processed faster than
> it comes in? The fast queues are emptying out and there's work trapped
> on the slow queues. The exchange can't balance things by adding messages
> because there are no new messages to add.
> 
> So let the exchange act as a consumer and *remove* messages from
> over-full queues to re-queue them on under-full/empty ones. Now we have
> a full dynamic balance in both growing and shrinking phases

This complicates the exchange though as it now presumably needs to 
periodically monitor the queue lengths (i.e. it becomes an active entity 
rather than just reacting to publications routed through it). Maybe an 
entirely separate re-balancing component would be cleaner?

> This gives us the best balance but it incurs overhead when the broker
> has to move message between queues. So perhaps a final optimization is
> to have the exchange track not only the fullness of the queues but the
> rate at which they empty. Then instead of aiming for every q having the
> same length, it can (over time) aim for queues to be filled in
> proportion to their throughput. I need to think about that some more
> optimization strategies don't always perform the way you expect!

I'm not really sure I understand the root problem here. i.e. why do we 
want multiple queues of the same (or similar) length?