You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@activemq.apache.org by Colin MacNaughton <cm...@progress.com> on 2009/06/10 18:29:51 UTC

ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Hi Everyone,

As a follow on to my e-mail last week introducing the core broker
prototype that Hiram and I have been working on, I wanted to spin up a
thread on the flow control model that we're using. 

I'd be interested to hear in your thoughts on current shortcomings
associated with flow control / memory management in 5.3 so we can make
sure that the use cases are covered. Beyond that any additional input on
the design or implementation would be great ... are we on the right
track?

Cheers,
Colin


The text below is taken straight from the webgen in the project, my
apologies if it's a little verbose!

As a reminder the bits can be found at:
https://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow

The activemq-flow package is meant to be a standalone module that deals
generically with Resources and Flow's of elements that flow through and
between them. The current implementation is designed with the following
goals in mind:

    * SIMPLE: Want a fairly simple and consistent model for controlling
flow of messages and other data in the system to control memory and disk
space. The module must be able to handle fan-in/fan-out as well as
simpler 1 to 1 cases.
    * PERFORMANT: The flow control mechanism must be performant and
should not introduce much overhead in cases where downstream resources
are able to keep up. 
    * MODULARIZED: The module should be independent generic and
reusable.
    * FAIRNESS: We should be able to provide better fairness. If I've
got several producers putting messages on a queue, the flow controller
should not prefer one source over the other (unless configured to do so)
    * VISIBILITY: With a unified model in place we can instrument it to
provide visibility in the product (e.g. a visual graph of flows in the
system). When a customer says that they are not using PERSISTENT
messages yet we see 1000msgs/sec flowing through the recovery log....
    * ADMINISTRATION: We can explore the possibility of administratively
limiting message flows. E.g. I've done my production stress testing and
can successfully handle my anticipated load of 4000 msgs/sec on topic1
... I'd prefer to avoid the case where publishers go berserk and
overload my backend with messages).
    * POLICIES: We should be able to better instrument general flow
control policies. E.g. I want to tune for latency or throughput. If a
subscriber gets behind, I'd like the policy for messages on topic1 to be
that I drop the oldest messages instead of initiating flow control.

The Basics:

Each resource creates a FlowController for each of it's Flows which is
assigned a corresponding FlowLimiter. As elements (e.g. messages) pass
from one resource to another they are passed through the downstream
resource's FlowController which updates its Limiter. If propagation of
an element from one resource to another causes the downstream limiter to
become throttled the associated FlowController will block the source of
the element. The flow module is used heavily by the rest of the core for
memory and disk management.

    * Memory Management: Memory is managed based on the resources in
play -- the usage is computed by summing of the space allocated to each
of the resources' limiters. This strategy intentionally avoids a
centralized memory limit which leads to complicated logic to track when
a centralized limiter needs to be decremented and avoids contention
between multiple resources/threads accessing the limiter and also
reduces the potential for memory limiter related deadlocks. However, it
should be noted that this approach doesn't preclude implementing
centralized limiters in the future.
    * Flow Control: As messages propagate from one resource A to another
B, then if A overflows B's limit, B will block A and A can't release
it's limiter space until B unblocks it. This allowance for overflow into
downstream resources is a key concept in flow control performance and
ease of use. Provided that the upstream resource has already accounted
for the message's memory it can freely overflow any downstream limiter
providing it reserves space from elements that caused overflow.
    * Threading Model: Note that as a message propagates from A to B,
that the general contract is that A won't release it's memory if B
blocks it during the course of dispatch. This means that it is not safe
to perform a thread handoff during dispatch between two resources since
the thread dispatching A relies on the message making it to B (so that B
can block it) prior to A completing dispatch.
    * Management/Visibility: Another intended use of the activemq-flow
module is to assist in visibility e.g. provide an underlying map of
resources that can be exposed via tooling to see the relationships
between sources and sinks of messages and to find bottlenecks ... this
aspect has been downplayed for now as we have been focusing more on the
queueing/memory management model in the prototype, but eventually the
flow package itself will provide a handy way of providing visibility in
the system particularly in terms of finding performance bottlenecks.

FlowResource (FlowSink and FlowSource): A container for FlowControllers
providing some lifecycle related logic. The base resource class handles
interaction/registration with the FlowManager (below).

FlowManager: Registry for Flow's and FlowResources. The manager will
provide some hooks into system visibility. As mentioned above this
aspect has been downplayed somewhat for the present time.

FlowController: Wraps a FlowLimiter and actually implements common basic
block/resume logic between FlowControllers.

FlowLimiter: Defines the limits enforced by a FlowController. Currently
the package has size based limiter implementations, but eventually
should also support other common limiter types such as rate based
limiters. The limiter's are also extended at other points in the broker
(for example implementing a protocol based WindowLimiter). It is also
likely that we would want to introduce CompositeLimiters to combine
various limiter types.

Flow: The concept of a flow is not used very heavily right now. But a
Flow defines the stream of elements that can be blocked. In general the
prototype creates a single flow per resource, but in the future a source
may break it's elements down into more granular flows on which
downstream sinks may block it. One case where this is anticipated as
being useful is in networks of brokers where-in it may be desirable to
partition messages into more granular flows (e.g based on producer or
destination) to avoid blocking the broker-broker connection
uncessarily).

Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Posted by Hiram Chirino <ch...@gmail.com>.

On Wed, Jun 10, 2009 at 12:29 PM, Colin
MacNaughton<cm...@progress.com> wrote:
> The Basics:
>
> Each resource creates a FlowController for each of it's Flows which is

Might be worth defining what are considered 'resources' and give a few
examples of them.

> assigned a corresponding FlowLimiter. As elements (e.g. messages) pass
> from one resource to another they are passed through the downstream

Ditto with the 'downstream' term.  Although this one may be little
more self explanatory once folks understand resources and how there is
a direction to their handling of messages.

>    * Flow Control: As messages propagate from one resource A to another
> B, then if A overflows B's limit, B will block A and A can't release
> it's limiter space until B unblocks it. This allowance for overflow into
> downstream resources is a key concept in flow control performance and
> ease of use. Provided that the upstream resource has already accounted
> for the message's memory it can freely overflow any downstream limiter
> providing it reserves space from elements that caused overflow.

Additional note: Overflowing makes most sense in situations where the
message source is NOT trying to load balance across many 'sinks'.
Like in pub sub or exclusive consumer cases.  A FlowControler can also
accessed in a way that it does not overflow, which would make most
sense in shared queue scenario.

-- 
Regards,
Hiram

Blog: http://hiramchirino.com

Open Source SOA
http://fusesource.com/

Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Posted by Rob Davies <ra...@gmail.com>.

On 11 Jun 2009, at 17:06, Colin MacNaughton wrote:

> Hi Rob,
>
> In terms of configurable maximums with respect to destinations:
> Maximum memory allocation per PTP queue is supported, but for topics  
> the
> limits are actually tied to the Subscriptions receiving the message.  
> This is
> because these are the objects that actually map to the underlying  
> cursored
> queues that hold the messages and do the paging/limiting. Does that  
> make
> sense?
Makes a lot of sense from an implementation point of view - but  
doesn't translate too well for users to understand.
>
>
> In terms of overall disk/memory limits. The approach we're taking  
> would not
> explicitly define a single overall limit (at least not initially) --  
> rather
> the total maximum is based on the resources you create. E.g. as a  
> user you
> need to know how many queues and subscriptions you create and plan for
> memory/disk accordingly. This doesn't preclude trying to enforce  
> global
> limits later, but in my opinion doing so complicates the  
> implementation a
> fair amount, in terms of trying to intelligently balancing the  
> available
> space across the queues and also leads to additional contention on the
> shared limiter -- and worse can lead to resource related deadlocks  
> if we get
> it wrong. We can still do things like limit the maximum number/size of
> subscriptions/queues/connections etc.
Again - from an implementation point of view makes a lot of sense -  
but makes it difficult for users - who will have to pre-dimension the  
broker.
>
>
> Colin
>
> -----Original Message-----
> From: Rob Davies [mailto:rajdavies@gmail.com]
> Sent: Thursday, June 11, 2009 1:12 AM
> To: dev@activemq.apache.org
> Subject: Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control /  
> Memory
> Management
>
> Hi Colin,
>
> In 5.x flow control behaves as if its binary - off or on. When its off
> - messages can be offlined (for non-persistent messages this means
> being dumped to temporary storage) - but when its on - the producers
> slow and stop.
> Also - there can be cases when you get a temporary slow consumer (the
> consuming app may be doing a big gc) - which means with flow control
> off - messages get dumped to disk - and then the producers may never
> slow down enough again for the consumer to catch up. Flow control is
> difficult to implement for all cases - but we should allow for
> configuration of the following:
>
> * maximum overall broker memory
> * maximum memory allocation per destination
> * maximum storage allocation
> * maximum storage allocation per destination
> * maximum temporary storage allocation
> * maximum temporary storage allocation per destination
>
> when we start to hit a resource limit - we should aggressively gc
> messages that have expired, then either offline (an flow control when
> that limit is hit) or flow control.
> It would be great to have a combined policy where we can block a
> producer for a short time (seconds) then offline
> For non-persistent messages - we still need a policy where we can
> remove messages based on a selector (which would be in addition to
> expiring messages).
>
> cheers,
>
> Rob
>
> On 10 Jun 2009, at 17:29, Colin MacNaughton wrote:
>
>> Hi Everyone,
>>
>> As a follow on to my e-mail last week introducing the core broker
>> prototype that Hiram and I have been working on, I wanted to spin  
>> up a
>> thread on the flow control model that we're using.
>>
>> I'd be interested to hear in your thoughts on current shortcomings
>> associated with flow control / memory management in 5.3 so we can  
>> make
>> sure that the use cases are covered. Beyond that any additional
>> input on
>> the design or implementation would be great ... are we on the right
>> track?
>>
>> Cheers,
>> Colin
>>
>>
>> The text below is taken straight from the webgen in the project, my
>> apologies if it's a little verbose!
>>
>> As a reminder the bits can be found at:
>> https://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow
>>
>> The activemq-flow package is meant to be a standalone module that
>> deals
>> generically with Resources and Flow's of elements that flow through
>> and
>> between them. The current implementation is designed with the
>> following
>> goals in mind:
>>
>>   * SIMPLE: Want a fairly simple and consistent model for controlling
>> flow of messages and other data in the system to control memory and
>> disk
>> space. The module must be able to handle fan-in/fan-out as well as
>> simpler 1 to 1 cases.
>>   * PERFORMANT: The flow control mechanism must be performant and
>> should not introduce much overhead in cases where downstream  
>> resources
>> are able to keep up.
>>   * MODULARIZED: The module should be independent generic and
>> reusable.
>>   * FAIRNESS: We should be able to provide better fairness. If I've
>> got several producers putting messages on a queue, the flow  
>> controller
>> should not prefer one source over the other (unless configured to do
>> so)
>>   * VISIBILITY: With a unified model in place we can instrument it to
>> provide visibility in the product (e.g. a visual graph of flows in  
>> the
>> system). When a customer says that they are not using PERSISTENT
>> messages yet we see 1000msgs/sec flowing through the recovery log....
>>   * ADMINISTRATION: We can explore the possibility of
>> administratively
>> limiting message flows. E.g. I've done my production stress testing
>> and
>> can successfully handle my anticipated load of 4000 msgs/sec on  
>> topic1
>> ... I'd prefer to avoid the case where publishers go berserk and
>> overload my backend with messages).
>>   * POLICIES: We should be able to better instrument general flow
>> control policies. E.g. I want to tune for latency or throughput. If a
>> subscriber gets behind, I'd like the policy for messages on topic1
>> to be
>> that I drop the oldest messages instead of initiating flow control.
>>
>> The Basics:
>>
>> Each resource creates a FlowController for each of it's Flows which  
>> is
>> assigned a corresponding FlowLimiter. As elements (e.g. messages)  
>> pass
>> from one resource to another they are passed through the downstream
>> resource's FlowController which updates its Limiter. If propagation  
>> of
>> an element from one resource to another causes the downstream
>> limiter to
>> become throttled the associated FlowController will block the source
>> of
>> the element. The flow module is used heavily by the rest of the core
>> for
>> memory and disk management.
>>
>>   * Memory Management: Memory is managed based on the resources in
>> play -- the usage is computed by summing of the space allocated to
>> each
>> of the resources' limiters. This strategy intentionally avoids a
>> centralized memory limit which leads to complicated logic to track
>> when
>> a centralized limiter needs to be decremented and avoids contention
>> between multiple resources/threads accessing the limiter and also
>> reduces the potential for memory limiter related deadlocks. However,
>> it
>> should be noted that this approach doesn't preclude implementing
>> centralized limiters in the future.
>>   * Flow Control: As messages propagate from one resource A to
>> another
>> B, then if A overflows B's limit, B will block A and A can't release
>> it's limiter space until B unblocks it. This allowance for overflow
>> into
>> downstream resources is a key concept in flow control performance and
>> ease of use. Provided that the upstream resource has already  
>> accounted
>> for the message's memory it can freely overflow any downstream  
>> limiter
>> providing it reserves space from elements that caused overflow.
>>   * Threading Model: Note that as a message propagates from A to B,
>> that the general contract is that A won't release it's memory if B
>> blocks it during the course of dispatch. This means that it is not
>> safe
>> to perform a thread handoff during dispatch between two resources
>> since
>> the thread dispatching A relies on the message making it to B (so
>> that B
>> can block it) prior to A completing dispatch.
>>   * Management/Visibility: Another intended use of the activemq-flow
>> module is to assist in visibility e.g. provide an underlying map of
>> resources that can be exposed via tooling to see the relationships
>> between sources and sinks of messages and to find bottlenecks ...  
>> this
>> aspect has been downplayed for now as we have been focusing more on
>> the
>> queueing/memory management model in the prototype, but eventually the
>> flow package itself will provide a handy way of providing visibility
>> in
>> the system particularly in terms of finding performance bottlenecks.
>>
>> FlowResource (FlowSink and FlowSource): A container for
>> FlowControllers
>> providing some lifecycle related logic. The base resource class
>> handles
>> interaction/registration with the FlowManager (below).
>>
>> FlowManager: Registry for Flow's and FlowResources. The manager will
>> provide some hooks into system visibility. As mentioned above this
>> aspect has been downplayed somewhat for the present time.
>>
>> FlowController: Wraps a FlowLimiter and actually implements common
>> basic
>> block/resume logic between FlowControllers.
>>
>> FlowLimiter: Defines the limits enforced by a FlowController.
>> Currently
>> the package has size based limiter implementations, but eventually
>> should also support other common limiter types such as rate based
>> limiters. The limiter's are also extended at other points in the
>> broker
>> (for example implementing a protocol based WindowLimiter). It is also
>> likely that we would want to introduce CompositeLimiters to combine
>> various limiter types.
>>
>> Flow: The concept of a flow is not used very heavily right now. But a
>> Flow defines the stream of elements that can be blocked. In general
>> the
>> prototype creates a single flow per resource, but in the future a
>> source
>> may break it's elements down into more granular flows on which
>> downstream sinks may block it. One case where this is anticipated as
>> being useful is in networks of brokers where-in it may be desirable  
>> to
>> partition messages into more granular flows (e.g based on producer or
>> destination) to avoid blocking the broker-broker connection
>> uncessarily).
>>
>>
>
>

RE: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Posted by Colin MacNaughton <co...@gmail.com>.

Hi Rob, 

In terms of configurable maximums with respect to destinations:
Maximum memory allocation per PTP queue is supported, but for topics the
limits are actually tied to the Subscriptions receiving the message. This is
because these are the objects that actually map to the underlying cursored
queues that hold the messages and do the paging/limiting. Does that make
sense?

In terms of overall disk/memory limits. The approach we're taking would not
explicitly define a single overall limit (at least not initially) -- rather
the total maximum is based on the resources you create. E.g. as a user you
need to know how many queues and subscriptions you create and plan for
memory/disk accordingly. This doesn't preclude trying to enforce global
limits later, but in my opinion doing so complicates the implementation a
fair amount, in terms of trying to intelligently balancing the available
space across the queues and also leads to additional contention on the
shared limiter -- and worse can lead to resource related deadlocks if we get
it wrong. We can still do things like limit the maximum number/size of
subscriptions/queues/connections etc.

Colin 

-----Original Message-----
From: Rob Davies [mailto:rajdavies@gmail.com] 
Sent: Thursday, June 11, 2009 1:12 AM
To: dev@activemq.apache.org
Subject: Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory
Management

Hi Colin,

In 5.x flow control behaves as if its binary - off or on. When its off  
- messages can be offlined (for non-persistent messages this means  
being dumped to temporary storage) - but when its on - the producers  
slow and stop.
Also - there can be cases when you get a temporary slow consumer (the  
consuming app may be doing a big gc) - which means with flow control  
off - messages get dumped to disk - and then the producers may never  
slow down enough again for the consumer to catch up. Flow control is  
difficult to implement for all cases - but we should allow for  
configuration of the following:

* maximum overall broker memory
* maximum memory allocation per destination
* maximum storage allocation
* maximum storage allocation per destination
* maximum temporary storage allocation
* maximum temporary storage allocation per destination

when we start to hit a resource limit - we should aggressively gc  
messages that have expired, then either offline (an flow control when  
that limit is hit) or flow control.
It would be great to have a combined policy where we can block a  
producer for a short time (seconds) then offline
For non-persistent messages - we still need a policy where we can  
remove messages based on a selector (which would be in addition to  
expiring messages).

cheers,

Rob

On 10 Jun 2009, at 17:29, Colin MacNaughton wrote:

> Hi Everyone,
>
> As a follow on to my e-mail last week introducing the core broker
> prototype that Hiram and I have been working on, I wanted to spin up a
> thread on the flow control model that we're using.
>
> I'd be interested to hear in your thoughts on current shortcomings
> associated with flow control / memory management in 5.3 so we can make
> sure that the use cases are covered. Beyond that any additional  
> input on
> the design or implementation would be great ... are we on the right
> track?
>
> Cheers,
> Colin
>
>
> The text below is taken straight from the webgen in the project, my
> apologies if it's a little verbose!
>
> As a reminder the bits can be found at:
> https://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow
>
> The activemq-flow package is meant to be a standalone module that  
> deals
> generically with Resources and Flow's of elements that flow through  
> and
> between them. The current implementation is designed with the  
> following
> goals in mind:
>
>    * SIMPLE: Want a fairly simple and consistent model for controlling
> flow of messages and other data in the system to control memory and  
> disk
> space. The module must be able to handle fan-in/fan-out as well as
> simpler 1 to 1 cases.
>    * PERFORMANT: The flow control mechanism must be performant and
> should not introduce much overhead in cases where downstream resources
> are able to keep up.
>    * MODULARIZED: The module should be independent generic and
> reusable.
>    * FAIRNESS: We should be able to provide better fairness. If I've
> got several producers putting messages on a queue, the flow controller
> should not prefer one source over the other (unless configured to do  
> so)
>    * VISIBILITY: With a unified model in place we can instrument it to
> provide visibility in the product (e.g. a visual graph of flows in the
> system). When a customer says that they are not using PERSISTENT
> messages yet we see 1000msgs/sec flowing through the recovery log....
>    * ADMINISTRATION: We can explore the possibility of  
> administratively
> limiting message flows. E.g. I've done my production stress testing  
> and
> can successfully handle my anticipated load of 4000 msgs/sec on topic1
> ... I'd prefer to avoid the case where publishers go berserk and
> overload my backend with messages).
>    * POLICIES: We should be able to better instrument general flow
> control policies. E.g. I want to tune for latency or throughput. If a
> subscriber gets behind, I'd like the policy for messages on topic1  
> to be
> that I drop the oldest messages instead of initiating flow control.
>
> The Basics:
>
> Each resource creates a FlowController for each of it's Flows which is
> assigned a corresponding FlowLimiter. As elements (e.g. messages) pass
> from one resource to another they are passed through the downstream
> resource's FlowController which updates its Limiter. If propagation of
> an element from one resource to another causes the downstream  
> limiter to
> become throttled the associated FlowController will block the source  
> of
> the element. The flow module is used heavily by the rest of the core  
> for
> memory and disk management.
>
>    * Memory Management: Memory is managed based on the resources in
> play -- the usage is computed by summing of the space allocated to  
> each
> of the resources' limiters. This strategy intentionally avoids a
> centralized memory limit which leads to complicated logic to track  
> when
> a centralized limiter needs to be decremented and avoids contention
> between multiple resources/threads accessing the limiter and also
> reduces the potential for memory limiter related deadlocks. However,  
> it
> should be noted that this approach doesn't preclude implementing
> centralized limiters in the future.
>    * Flow Control: As messages propagate from one resource A to  
> another
> B, then if A overflows B's limit, B will block A and A can't release
> it's limiter space until B unblocks it. This allowance for overflow  
> into
> downstream resources is a key concept in flow control performance and
> ease of use. Provided that the upstream resource has already accounted
> for the message's memory it can freely overflow any downstream limiter
> providing it reserves space from elements that caused overflow.
>    * Threading Model: Note that as a message propagates from A to B,
> that the general contract is that A won't release it's memory if B
> blocks it during the course of dispatch. This means that it is not  
> safe
> to perform a thread handoff during dispatch between two resources  
> since
> the thread dispatching A relies on the message making it to B (so  
> that B
> can block it) prior to A completing dispatch.
>    * Management/Visibility: Another intended use of the activemq-flow
> module is to assist in visibility e.g. provide an underlying map of
> resources that can be exposed via tooling to see the relationships
> between sources and sinks of messages and to find bottlenecks ... this
> aspect has been downplayed for now as we have been focusing more on  
> the
> queueing/memory management model in the prototype, but eventually the
> flow package itself will provide a handy way of providing visibility  
> in
> the system particularly in terms of finding performance bottlenecks.
>
> FlowResource (FlowSink and FlowSource): A container for  
> FlowControllers
> providing some lifecycle related logic. The base resource class  
> handles
> interaction/registration with the FlowManager (below).
>
> FlowManager: Registry for Flow's and FlowResources. The manager will
> provide some hooks into system visibility. As mentioned above this
> aspect has been downplayed somewhat for the present time.
>
> FlowController: Wraps a FlowLimiter and actually implements common  
> basic
> block/resume logic between FlowControllers.
>
> FlowLimiter: Defines the limits enforced by a FlowController.  
> Currently
> the package has size based limiter implementations, but eventually
> should also support other common limiter types such as rate based
> limiters. The limiter's are also extended at other points in the  
> broker
> (for example implementing a protocol based WindowLimiter). It is also
> likely that we would want to introduce CompositeLimiters to combine
> various limiter types.
>
> Flow: The concept of a flow is not used very heavily right now. But a
> Flow defines the stream of elements that can be blocked. In general  
> the
> prototype creates a single flow per resource, but in the future a  
> source
> may break it's elements down into more granular flows on which
> downstream sinks may block it. One case where this is anticipated as
> being useful is in networks of brokers where-in it may be desirable to
> partition messages into more granular flows (e.g based on producer or
> destination) to avoid blocking the broker-broker connection
> uncessarily).
>
>

RE: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Posted by Colin MacNaughton <co...@gmail.com>.

Hi Gary, 

Putting maximums on these blocks may not be possible in cases where we are
not willing to discard messages on a queue. Ultimately the publisher rate is
tied to the disk sync rate. I think the best we can do in this case is to
try to smooth out the publisher's profile. The point of the store queue is
to allow the store writing thread to batch up several writes into a single
FileDescriptor.sync() which increases throughput. Choosing too small a store
queue size will ultimately result in low throughput though potentially
smoother publish rate. Choosing too large a queue can make the publisher's
bursty, but will likely have closer to optimal throughput (e.g. accept large
burst of messages but long disk sync times). 

In the end I lean towards a larger store queue since ultimately a publish
rate higher than a consumer rate isn't sustainable anyway, and we should
really be optimizing for a bursty publisher case with highest possible disk
throughput. In cases where smoothing out the publisher rate is desired
(perhaps prolonged but finite bursts), we can introduce rate based limiters
for the queues/publishers in question. 

-----Original Message-----
From: Gary Tully [mailto:gary.tully@gmail.com] 
Sent: Monday, June 15, 2009 5:22 AM
To: dev@activemq.apache.org
Subject: Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory
Management

On the 'temporary blocking', we need to think of ways to place maximums on
these blocks.

A 5.x trait is that flushing buffers to disk on reaching a memory limit is a
single op, so a producer or consumer could be blocked for N writes (where N
can be quite large)
The flow model can be more linear in this regard.
The difficulty may be in combining flow limiters that work with single entry
queues on one side and limiters that want to batch writes on the other.
When blocking is needed, it may be sufficient to have the enqueue delay for
only half of a batch write. Or max 0.5 of a batch.
The same would hold for paging out, a flow could resume when some portion or
the limit is paged out on demand.

2009/6/11 Hiram Chirino <ch...@gmail.com>

> So Colin was focusing on just the flow controller package.  Most of
> the logic of moving messages 'offline' is in the activemq-queue
> module.
>
> That package it self needs a good write up.  But in general the goal
> is have queues which support the following options which can be
> enabled/disabled:
>
> * Paging: will the queue even attempt to spool to disk?  In some high
> performance scenarios you may never want to spool messages to disk and
> instead prefer the producers to block when memory limits are reached.
> * Page Out Place Holders: If you don't page them out, then the queue
> will keep a list of pointers to the location of the message on disk in
> memory at all times.  Keeping this list in memory will speed up
> accessing paged out messages as their location on disk is already
> known.  Should be enabled for large queues so that even the message
> order and locations are determined by cursoring the persistence store.
> * Throttle Sources To Memory Limit : When disabled, the queue will
> behave very much like disabling flow control in 5.x.  Fast producer's
> messages will be spooled to disk to avoid blocking the source.
>
>  BTW the above is implemented in the CursoredQueue class if anyone is
> interested.
>
> We will need to review how best to implement that 'combined policy',
> but in the new architecture, I do think your going to see more
> 'temporary' producer blocking.  For example, even with the 'Throttle
> Sources To Memory Limit' option disabled, the producers flow
> controller may block as the queue is trying to persist messages to the
> message store.  This is because unlike 5.x, the message store is also
> flow controlled resource that is access async.
>
>
> On Thu, Jun 11, 2009 at 4:11 AM, Rob Davies<ra...@gmail.com> wrote:
> > Hi Colin,
> >
> > In 5.x flow control behaves as if its binary - off or on. When its off -
> > messages can be offlined (for non-persistent messages this means being
> > dumped to temporary storage) - but when its on - the producers slow and
> > stop.
> > Also - there can be cases when you get a temporary slow consumer (the
> > consuming app may be doing a big gc) - which means with flow control off
> -
> > messages get dumped to disk - and then the producers may never slow down
> > enough again for the consumer to catch up. Flow control is difficult to
> > implement for all cases - but we should allow for configuration of the
> > following:
> >
> > * maximum overall broker memory
> > * maximum memory allocation per destination
> > * maximum storage allocation
> > * maximum storage allocation per destination
> > * maximum temporary storage allocation
> > * maximum temporary storage allocation per destination
> >
> > when we start to hit a resource limit - we should aggressively gc
> messages
> > that have expired, then either offline (an flow control when that limit
> is
> > hit) or flow control.
> > It would be great to have a combined policy where we can block a
producer
> > for a short time (seconds) then offline
> > For non-persistent messages - we still need a policy where we can remove
> > messages based on a selector (which would be in addition to expiring
> > messages).
> >
> > cheers,
> >
> > Rob
> >
>

-- 
http://blog.garytully.com

Open Source Integration
http://fusesource.com

Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Posted by Gary Tully <ga...@gmail.com>.

On the 'temporary blocking', we need to think of ways to place maximums on
these blocks.

A 5.x trait is that flushing buffers to disk on reaching a memory limit is a
single op, so a producer or consumer could be blocked for N writes (where N
can be quite large)
The flow model can be more linear in this regard.
The difficulty may be in combining flow limiters that work with single entry
queues on one side and limiters that want to batch writes on the other.
When blocking is needed, it may be sufficient to have the enqueue delay for
only half of a batch write. Or max 0.5 of a batch.
The same would hold for paging out, a flow could resume when some portion or
the limit is paged out on demand.

2009/6/11 Hiram Chirino <ch...@gmail.com>

> So Colin was focusing on just the flow controller package.  Most of
> the logic of moving messages 'offline' is in the activemq-queue
> module.
>
> That package it self needs a good write up.  But in general the goal
> is have queues which support the following options which can be
> enabled/disabled:
>
> * Paging: will the queue even attempt to spool to disk?  In some high
> performance scenarios you may never want to spool messages to disk and
> instead prefer the producers to block when memory limits are reached.
> * Page Out Place Holders: If you don't page them out, then the queue
> will keep a list of pointers to the location of the message on disk in
> memory at all times.  Keeping this list in memory will speed up
> accessing paged out messages as their location on disk is already
> known.  Should be enabled for large queues so that even the message
> order and locations are determined by cursoring the persistence store.
> * Throttle Sources To Memory Limit : When disabled, the queue will
> behave very much like disabling flow control in 5.x.  Fast producer's
> messages will be spooled to disk to avoid blocking the source.
>
>  BTW the above is implemented in the CursoredQueue class if anyone is
> interested.
>
> We will need to review how best to implement that 'combined policy',
> but in the new architecture, I do think your going to see more
> 'temporary' producer blocking.  For example, even with the 'Throttle
> Sources To Memory Limit' option disabled, the producers flow
> controller may block as the queue is trying to persist messages to the
> message store.  This is because unlike 5.x, the message store is also
> flow controlled resource that is access async.
>
>
> On Thu, Jun 11, 2009 at 4:11 AM, Rob Davies<ra...@gmail.com> wrote:
> > Hi Colin,
> >
> > In 5.x flow control behaves as if its binary - off or on. When its off -
> > messages can be offlined (for non-persistent messages this means being
> > dumped to temporary storage) - but when its on - the producers slow and
> > stop.
> > Also - there can be cases when you get a temporary slow consumer (the
> > consuming app may be doing a big gc) - which means with flow control off
> -
> > messages get dumped to disk - and then the producers may never slow down
> > enough again for the consumer to catch up. Flow control is difficult to
> > implement for all cases - but we should allow for configuration of the
> > following:
> >
> > * maximum overall broker memory
> > * maximum memory allocation per destination
> > * maximum storage allocation
> > * maximum storage allocation per destination
> > * maximum temporary storage allocation
> > * maximum temporary storage allocation per destination
> >
> > when we start to hit a resource limit - we should aggressively gc
> messages
> > that have expired, then either offline (an flow control when that limit
> is
> > hit) or flow control.
> > It would be great to have a combined policy where we can block a producer
> > for a short time (seconds) then offline
> > For non-persistent messages - we still need a policy where we can remove
> > messages based on a selector (which would be in addition to expiring
> > messages).
> >
> > cheers,
> >
> > Rob
> >
>



-- 
http://blog.garytully.com

Open Source Integration
http://fusesource.com

Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Posted by Hiram Chirino <ch...@gmail.com>.

So Colin was focusing on just the flow controller package.  Most of
the logic of moving messages 'offline' is in the activemq-queue
module.

That package it self needs a good write up.  But in general the goal
is have queues which support the following options which can be
enabled/disabled:

* Paging: will the queue even attempt to spool to disk?  In some high
performance scenarios you may never want to spool messages to disk and
instead prefer the producers to block when memory limits are reached.
* Page Out Place Holders: If you don't page them out, then the queue
will keep a list of pointers to the location of the message on disk in
memory at all times.  Keeping this list in memory will speed up
accessing paged out messages as their location on disk is already
known.  Should be enabled for large queues so that even the message
order and locations are determined by cursoring the persistence store.
* Throttle Sources To Memory Limit : When disabled, the queue will
behave very much like disabling flow control in 5.x.  Fast producer's
messages will be spooled to disk to avoid blocking the source.

 BTW the above is implemented in the CursoredQueue class if anyone is
interested.

We will need to review how best to implement that 'combined policy',
but in the new architecture, I do think your going to see more
'temporary' producer blocking.  For example, even with the 'Throttle
Sources To Memory Limit' option disabled, the producers flow
controller may block as the queue is trying to persist messages to the
message store.  This is because unlike 5.x, the message store is also
flow controlled resource that is access async.

On Thu, Jun 11, 2009 at 4:11 AM, Rob Davies<ra...@gmail.com> wrote:
> Hi Colin,
>
> In 5.x flow control behaves as if its binary - off or on. When its off -
> messages can be offlined (for non-persistent messages this means being
> dumped to temporary storage) - but when its on - the producers slow and
> stop.
> Also - there can be cases when you get a temporary slow consumer (the
> consuming app may be doing a big gc) - which means with flow control off -
> messages get dumped to disk - and then the producers may never slow down
> enough again for the consumer to catch up. Flow control is difficult to
> implement for all cases - but we should allow for configuration of the
> following:
>
> * maximum overall broker memory
> * maximum memory allocation per destination
> * maximum storage allocation
> * maximum storage allocation per destination
> * maximum temporary storage allocation
> * maximum temporary storage allocation per destination
>
> when we start to hit a resource limit - we should aggressively gc messages
> that have expired, then either offline (an flow control when that limit is
> hit) or flow control.
> It would be great to have a combined policy where we can block a producer
> for a short time (seconds) then offline
> For non-persistent messages - we still need a policy where we can remove
> messages based on a selector (which would be in addition to expiring
> messages).
>
> cheers,
>
> Rob
>

Re: ActiveMQ 6.0 Broker Core Prototype -- Flow Control / Memory Management

Posted by Rob Davies <ra...@gmail.com>.

Hi Colin,

In 5.x flow control behaves as if its binary - off or on. When its off  
- messages can be offlined (for non-persistent messages this means  
being dumped to temporary storage) - but when its on - the producers  
slow and stop.
Also - there can be cases when you get a temporary slow consumer (the  
consuming app may be doing a big gc) - which means with flow control  
off - messages get dumped to disk - and then the producers may never  
slow down enough again for the consumer to catch up. Flow control is  
difficult to implement for all cases - but we should allow for  
configuration of the following:

* maximum overall broker memory
* maximum memory allocation per destination
* maximum storage allocation
* maximum storage allocation per destination
* maximum temporary storage allocation
* maximum temporary storage allocation per destination

when we start to hit a resource limit - we should aggressively gc  
messages that have expired, then either offline (an flow control when  
that limit is hit) or flow control.
It would be great to have a combined policy where we can block a  
producer for a short time (seconds) then offline
For non-persistent messages - we still need a policy where we can  
remove messages based on a selector (which would be in addition to  
expiring messages).

cheers,

Rob

On 10 Jun 2009, at 17:29, Colin MacNaughton wrote:

> Hi Everyone,
>
> As a follow on to my e-mail last week introducing the core broker
> prototype that Hiram and I have been working on, I wanted to spin up a
> thread on the flow control model that we're using.
>
> I'd be interested to hear in your thoughts on current shortcomings
> associated with flow control / memory management in 5.3 so we can make
> sure that the use cases are covered. Beyond that any additional  
> input on
> the design or implementation would be great ... are we on the right
> track?
>
> Cheers,
> Colin
>
>
> The text below is taken straight from the webgen in the project, my
> apologies if it's a little verbose!
>
> As a reminder the bits can be found at:
> https://svn.apache.org/repos/asf/activemq/sandbox/activemq-flow
>
> The activemq-flow package is meant to be a standalone module that  
> deals
> generically with Resources and Flow's of elements that flow through  
> and
> between them. The current implementation is designed with the  
> following
> goals in mind:
>
>    * SIMPLE: Want a fairly simple and consistent model for controlling
> flow of messages and other data in the system to control memory and  
> disk
> space. The module must be able to handle fan-in/fan-out as well as
> simpler 1 to 1 cases.
>    * PERFORMANT: The flow control mechanism must be performant and
> should not introduce much overhead in cases where downstream resources
> are able to keep up.
>    * MODULARIZED: The module should be independent generic and
> reusable.
>    * FAIRNESS: We should be able to provide better fairness. If I've
> got several producers putting messages on a queue, the flow controller
> should not prefer one source over the other (unless configured to do  
> so)
>    * VISIBILITY: With a unified model in place we can instrument it to
> provide visibility in the product (e.g. a visual graph of flows in the
> system). When a customer says that they are not using PERSISTENT
> messages yet we see 1000msgs/sec flowing through the recovery log....
>    * ADMINISTRATION: We can explore the possibility of  
> administratively
> limiting message flows. E.g. I've done my production stress testing  
> and
> can successfully handle my anticipated load of 4000 msgs/sec on topic1
> ... I'd prefer to avoid the case where publishers go berserk and
> overload my backend with messages).
>    * POLICIES: We should be able to better instrument general flow
> control policies. E.g. I want to tune for latency or throughput. If a
> subscriber gets behind, I'd like the policy for messages on topic1  
> to be
> that I drop the oldest messages instead of initiating flow control.
>
> The Basics:
>
> Each resource creates a FlowController for each of it's Flows which is
> assigned a corresponding FlowLimiter. As elements (e.g. messages) pass
> from one resource to another they are passed through the downstream
> resource's FlowController which updates its Limiter. If propagation of
> an element from one resource to another causes the downstream  
> limiter to
> become throttled the associated FlowController will block the source  
> of
> the element. The flow module is used heavily by the rest of the core  
> for
> memory and disk management.
>
>    * Memory Management: Memory is managed based on the resources in
> play -- the usage is computed by summing of the space allocated to  
> each
> of the resources' limiters. This strategy intentionally avoids a
> centralized memory limit which leads to complicated logic to track  
> when
> a centralized limiter needs to be decremented and avoids contention
> between multiple resources/threads accessing the limiter and also
> reduces the potential for memory limiter related deadlocks. However,  
> it
> should be noted that this approach doesn't preclude implementing
> centralized limiters in the future.
>    * Flow Control: As messages propagate from one resource A to  
> another
> B, then if A overflows B's limit, B will block A and A can't release
> it's limiter space until B unblocks it. This allowance for overflow  
> into
> downstream resources is a key concept in flow control performance and
> ease of use. Provided that the upstream resource has already accounted
> for the message's memory it can freely overflow any downstream limiter
> providing it reserves space from elements that caused overflow.
>    * Threading Model: Note that as a message propagates from A to B,
> that the general contract is that A won't release it's memory if B
> blocks it during the course of dispatch. This means that it is not  
> safe
> to perform a thread handoff during dispatch between two resources  
> since
> the thread dispatching A relies on the message making it to B (so  
> that B
> can block it) prior to A completing dispatch.
>    * Management/Visibility: Another intended use of the activemq-flow
> module is to assist in visibility e.g. provide an underlying map of
> resources that can be exposed via tooling to see the relationships
> between sources and sinks of messages and to find bottlenecks ... this
> aspect has been downplayed for now as we have been focusing more on  
> the
> queueing/memory management model in the prototype, but eventually the
> flow package itself will provide a handy way of providing visibility  
> in
> the system particularly in terms of finding performance bottlenecks.
>
> FlowResource (FlowSink and FlowSource): A container for  
> FlowControllers
> providing some lifecycle related logic. The base resource class  
> handles
> interaction/registration with the FlowManager (below).
>
> FlowManager: Registry for Flow's and FlowResources. The manager will
> provide some hooks into system visibility. As mentioned above this
> aspect has been downplayed somewhat for the present time.
>
> FlowController: Wraps a FlowLimiter and actually implements common  
> basic
> block/resume logic between FlowControllers.
>
> FlowLimiter: Defines the limits enforced by a FlowController.  
> Currently
> the package has size based limiter implementations, but eventually
> should also support other common limiter types such as rate based
> limiters. The limiter's are also extended at other points in the  
> broker
> (for example implementing a protocol based WindowLimiter). It is also
> likely that we would want to introduce CompositeLimiters to combine
> various limiter types.
>
> Flow: The concept of a flow is not used very heavily right now. But a
> Flow defines the stream of elements that can be blocked. In general  
> the
> prototype creates a single flow per resource, but in the future a  
> source
> may break it's elements down into more granular flows on which
> downstream sinks may block it. One case where this is anticipated as
> being useful is in networks of brokers where-in it may be desirable to
> partition messages into more granular flows (e.g based on producer or
> destination) to avoid blocking the broker-broker connection
> uncessarily).
>
>