You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by fadams <fr...@blueyonder.co.uk> on 2011/07/01 13:22:46 UTC

Opinions sought on handling some edge cases.

If one starts to play with qpid in any significant way there seems to be a
number of edge cases that commonly crop up. I've got a few scenarios and
it'd be really interesting to hear opinions and how others have
solved/worked around them.

Some of the scenarios are somewhat interrelated I guess. Sorry, it's ended
up getting quite long.......

1. "rogue" (possibly just slow) consumers killing all data flows.
In this scenario we may have a producer publishing to topic or headers
exchange and a number of consumers. If one of these consumers fails or slows
down and stops consuming eventually its queue will fill up and an exception
will be thrown. This will either directly hit the producer, or in a
federated environment the link will "blow". In either case data stops
flowing to ALL consumers, which is clearly undesirable.

I think that the "standard" solution to this is to use circular/ring queues
however a) the default policy is reject and b) ring queues need to fit into
memory. I'll cover these below in other scenarios.

Is the current "perceived wisdom" that ring queues are the best/only way to
prevent slow consumers killing data flow for all?

I have been thinking around this. Given that qpid 0.10 supports a
queueThresholdExceeded it should be possible to write a QMF client to unbind
queues that are filling up and potentially do other useful things.

2. default policy is reject
Given scenario 1 it's perhaps a pity that the default policy is reject.
Indeed I don't believe that it's possible to change the default policy on
the broker which means that in an operational environment one has to rely on
subscribers to explicitly set the policy to ring. This seems risky to me!!!

I believe that it's possible to enforce this using ACL, but this then
requires authentication to be enabled (in our environment we were hoping to
go with a self service, trust and verify - e.g. audit based approach).
Possibly we'll need to rethink that. Has anyone else had experience here?

Again, I guess using queueThresholdExceeded to unbind queues filling up
might help - I wouldn't then need to enforce a particular policy I could
simply implement what amounts to a fuse.

I'm interested to hear debate on what people think is the best strategy.

3. ring queues need to fit into memory.
So the position that I'm taking "architecturally" in our system is that it's
not the role of the data distribution system to buffer to protect against
poorly designed end consumers (a bit of elastic to cope with burstiness is
OK) so I'm expecting them to be adequately scaled and provide
clustering/failover such that end consumers really ought not to be slow
consumers unless things get really bad. So for the most part either ring
queues or the fuse I described above ought to be adequate.

However I've got a federated topology and in some cases the WAN link between
the source broker to the destination broker might not be exactly reliable.

Here's where scenario 3 causes pain. If the WAN goes down the queue on the
source broker starts to fill and eventually old data gets overwritten by
new.

If I use a circular buffer the maximum buffering capacity is very much
dependent on available memory. With a persistent queue things are about as
bad as the maximum size I believe is 128GB but I'll eventually fill it and I
can't make it behave in a ring manner. I guess if I was willing to throw
cash at the problem I could buy a box with lots of memory and have more than
128GB stored, either way there's a bit of a problem.

I was wondering about using queueThresholdExceeded again. I guess that it
may be possible to detect when the queue hits it's limit and automatically
start a client (clearly on the appropriate side of the WAN) to pull off the
messages that are backing up and write them out to disc. If I record the
queue name I think that once I detect the WAN connection reestablished I
could have my protection client write back to the queue via the direct
exchange.

Does this sound do-able? Can anyone suggest a better solution to this type
of problem.

It's a bit of a shame to be having to write clients to do this sort of thing
though. The qpid persistence mechanism is pretty cool and very efficient,
but it does seem quite limited. Perhaps this isn't as common an edge case as
I imagine?

4. Exceptions with asynchronous producers.
So asynchronous producers are cool, but if an exception is thrown e.g. on a
resource limit exceeded how do I work out exactly what has been sent to the
broker. What I mean is on the client side I call send() and that returns
when the data reaches the client runtime NOT when it has successfully hit
the broker.

Now I can make things synchronous, but that hoses performance and I could
use transactions, as if commit returns I know my data has hit the broker,
but if tx size is small performance gets hit much like synchronous and if tx
size is too high throughput can get "lumpy"

Is it possibly to find out how many messages are pending beyond "send" in
the client runtime so I know from whence I need to resend my messages. I'm
particularly interested in the Java JMS API as this doesn't have some of the
subtle nuances/control of the C++ messaging API (but I'm interested in how
to do it from C++ too).

--
View this message in context: http://apache-qpid-users.2158936.n2.nabble.com/Opinions-sought-on-handling-some-edge-cases-tp6537357p6537357.html
Sent from the Apache Qpid users mailing list archive at Nabble.com.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project: http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

Re: Opinions sought on handling some edge cases.

Posted by Gordon Sim <gs...@redhat.com>.

On 07/01/2011 12:22 PM, fadams wrote:
> If one starts to play with qpid in any significant way there seems to be a
> number of edge cases that commonly crop up. I've got a few scenarios and
> it'd be really interesting to hear opinions and how others have
> solved/worked around them.
>
> Some of the scenarios are somewhat interrelated I guess. Sorry, it's ended
> up getting quite long.......
>
> 1. "rogue" (possibly just slow) consumers killing all data flows.
> In this scenario we may have a producer publishing to topic or headers
> exchange and a number of consumers. If one of these consumers fails or slows
> down and stops consuming eventually its queue will fill up and an exception
> will be thrown. This will either directly hit the producer, or in a
> federated environment the link will "blow". In either case data stops
> flowing to ALL consumers, which is clearly undesirable.
>
> I think that the "standard" solution to this is to use circular/ring queues
> however a) the default policy is reject and b) ring queues need to fit into
> memory. I'll cover these below in other scenarios.
>
> Is the current "perceived wisdom" that ring queues are the best/only way to
> prevent slow consumers killing data flow for all?

Given the current implementation, I think it probably is.

However a very good suggestion was for a policy that deleted the queue 
when it hit the configured limit 
(https://issues.apache.org/jira/browse/QPID-3247). That would allow the 
subscribers to be notified of that fact that they had failed to keep up.

I'll try and get that in for 0.14 (unless anyone else wants to 
volunteer). It shouldn't be too much work and would be a valuable addition.

> I have been thinking around this. Given that qpid 0.10 supports a
> queueThresholdExceeded it should be possible to write a QMF client to unbind
> queues that are filling up and potentially do other useful things.
>
> 2. default policy is reject
> Given scenario 1 it's perhaps a pity that the default policy is reject.
> Indeed I don't believe that it's possible to change the default policy on
> the broker which means that in an operational environment one has to rely on
> subscribers to explicitly set the policy to ring. This seems risky to me!!!
>
> I believe that it's possible to enforce this using ACL, but this then
> requires authentication to be enabled (in our environment we were hoping to
> go with a self service, trust and verify - e.g. audit based approach).
> Possibly we'll need to rethink that. Has anyone else had experience here?
>
> Again, I guess using queueThresholdExceeded to unbind queues filling up
> might help - I wouldn't then need to enforce a particular policy I could
> simply implement what amounts to a fuse.
>
> I'm interested to hear debate on what people think is the best strategy.

Having a means to set the default policies for the broker does seem 
reasonable. Also I think having default policies associated with 
particular exchanges could be useful. The exchange is the entity on 
which the topic concept is based. Having a way to indicate how slow 
consumers should be handled at that level seems logical. Of course the 
way AMQP works at present would mean that clients would then have to 
query the exchange for the policy and set it on the queues they bind in 
(the exchange could also potentially refuse to bind queues that did not 
match the specified policy).

> 3. ring queues need to fit into memory.
> So the position that I'm taking "architecturally" in our system is that it's
> not the role of the data distribution system to buffer to protect against
> poorly designed end consumers (a bit of elastic to cope with burstiness is
> OK) so I'm expecting them to be adequately scaled and provide
> clustering/failover such that end consumers really ought not to be slow
> consumers unless things get really bad. So for the most part either ring
> queues or the fuse I described above ought to be adequate.
>
> However I've got a federated topology and in some cases the WAN link between
> the source broker to the destination broker might not be exactly reliable.
>
> Here's where scenario 3 causes pain. If the WAN goes down the queue on the
> source broker starts to fill and eventually old data gets overwritten by
> new.
>
> If I use a circular buffer the maximum buffering capacity is very much
> dependent on available memory. With a persistent queue things are about as
> bad as the maximum size I believe is 128GB but I'll eventually fill it and I
> can't make it behave in a ring manner. I guess if I was willing to throw
> cash at the problem I could buy a box with lots of memory and have more than
> 128GB stored, either way there's a bit of a problem.
>
> I was wondering about using queueThresholdExceeded again. I guess that it
> may be possible to detect when the queue hits it's limit and automatically
> start a client (clearly on the appropriate side of the WAN) to pull off the
> messages that are backing up and write them out to disc. If I record the
> queue name I think that once I detect the WAN connection reestablished I
> could have my protection client write back to the queue via the direct
> exchange.
>
> Does this sound do-able? Can anyone suggest a better solution to this type
> of problem.
>
> It's a bit of a shame to be having to write clients to do this sort of thing
> though. The qpid persistence mechanism is pretty cool and very efficient,
> but it does seem quite limited. Perhaps this isn't as common an edge case as
> I imagine?

I think it is reasonably common. We really need an efficient paging 
solution to allow for queues that are larger than available memory 
regardless of whether they contain lots of small message or a few large 
messages, and don't reduce throughput to the point of exacerbating the 
problem they are trying to fix.

The 'flow to disk' policy was a cheap hack that took a swipe at this 
problem but frankly failed to do anything very useful.

> 4. Exceptions with asynchronous producers.
> So asynchronous producers are cool, but if an exception is thrown e.g. on a
> resource limit exceeded how do I work out exactly what has been sent to the
> broker. What I mean is on the client side I call send() and that returns
> when the data reaches the client runtime NOT when it has successfully hit
> the broker.
>
> Now I can make things synchronous, but that hoses performance and I could
> use transactions, as if commit returns I know my data has hit the broker,
> but if tx size is small performance gets hit much like synchronous and if tx
> size is too high throughput can get "lumpy"
>
> Is it possibly to find out how many messages are pending beyond "send" in
> the client runtime so I know from whence I need to resend my messages. I'm
> particularly interested in the Java JMS API as this doesn't have some of the
> subtle nuances/control of the C++ messaging API (but I'm interested in how
> to do it from C++ too).

In c++ you can track the count of unsettled messages for each sender. 
Messages settle in order meaning that an unsettled count of 5 indicates 
that only the last 5 messages are unconfirmed.

The JMS API doesn't allow for tracking of acknowledged, asynchronous 
publications. The only way to signal completion is through blocking. One 
way to workaround this is to publish occasional 'tracer' messages that 
come back to the sender and therefore act as synchronisation points. I 
could also envisage acknowledgements/confirmations from the broker being 
exposed as pseudo messages on some special consumer. The question there 
would however be whether that non-standard mechanism would negate the 
benefit of sticking with JMS and if so whether an alternative or 
augmented API with a little more control would be useful.

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org