You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@qpid.apache.org by Richard Peter <ri...@raytheon.com> on 2011/06/08 14:08:23 UTC

C Broker Availability Problem

Hi,

The issue I'm having is when a client producer sends message based on 
user interaction.  The message causes a screen to pop up on another 
workstation.  Usually the pop up is instantaneous, sometimes though it 
takes up to 2 minutes for the message to get to the other workstation.  
The message is a JMS text message containing 9 characters, so fairly 
small message.  We have tried tuning the worker-threads thinking it was 
an availability issue.  This single message is more important than all 
the other traffic our qpid is handling.  Is there a way to give priority 
to one queue over another?  There is a large amount of traffic being 
handled by the broker, but not sure how the design is setup to handle 
when they are many more sessions/queues than worker-threads.  Does a 
thread send all messages to a consumer before moving on to the next 
queue?  Or is the only way to ensure availability to further increase 
worker-threads?  I've had the threads as high as 100, but the load on 
the system made the problem worse.  Our setup is below.

We are using version 0.8 of the C broker and java client.  The broker 
has roughly 100 queues.  Each queue has at least two consumers, 1 each 
from separate servers in a cluster.  We then also have 20 clients 
listens to 4 topics and 5 clients listening to 1 queue (the important 
one mentioned above).  So in general out broker has roughly 300 sessions 
open at any given time.  Almost all of the queues are durable.  The 
topics are not durable, nor are subscribers durable.  All but one 
clients in the scenario are java clients, with 1 c client.  The servers 
also use the java client.  The following is connection url used by most 
of the clients (its embedded in spring xml, thus the escaped &.

amqp://guest:guest@/program?brokerlist='tcp://${broker.addr}?retries='0'&amp;tcp_nodelay='true'&amp;connecttimeout='5000''&amp;maxprefetch='0'&amp;sync_publish='all'&amp;failover='nofailover' 


I only recently turned on tcp_nodelay and sync_publish, thinking that 
perhaps the message was occasionally getting stuck.  These are the 
setting from our conf file for the broker:

auth=no
worker-threads=50
data-dir=/somepath/qpid/data
store-dir=/somepath/qpid/messageStore
pid-dir=/somepath/qpid/var/lock
num-jfiles=16
jfile-size-pgs=24
tcp-nodelay=true

Many of the queues are sized larger than the default through a queue 
creator script.  The sizes range up to a max file count of 32 and file 
size of 48.  The server running qpid is a 8 cpu system with 2g of 
memory, some of the offices have a 16 cpu system with 8g of memory.  The 
server size does not make a difference in the errors.

Part of the theory for availability being the issue was that the clients 
kept timing out on heartbeat.  So we disabled the heartbeat.  We also 
occasionally see
INFO  2011-06-06 17:47:42,501 [IoReceiver - somemachine/someip:5672] 
JmsPooledSession: EDEX: DEFAULT - Failed to close session
org.apache.qpid.transport.SessionException: timed out waiting for sync: 
complete = 30115, point = 30116
     at org.apache.qpid.transport.Session.sync(Session.java:744)
     at org.apache.qpid.transport.Session.sync(Session.java:713)
     at 
org.apache.qpid.client.AMQSession_0_10.sendClose(AMQSession_0_10.java:427)
     at org.apache.qpid.client.AMQSession.close(AMQSession.java:700)
     at org.apache.qpid.client.AMQSession.close(AMQSession.java:666)
     at org.apache.qpid.client.AMQSession.close(AMQSession.java:525)
     at 
somepackage.jms.JmsPooledSession.closeInternal(JmsPooledSession.java:164)
     at 
somepackage.jms.JmsPooledConnection.disconnect(JmsPooledConnection.java:152)
     at 
somepackage.jms.JmsPooledConnection.onException(JmsPooledConnection.java:127)
     at 
org.apache.qpid.client.AMQConnectionDelegate_0_10.closed(AMQConnectionDelegate_0_10.java:270)
     at org.apache.qpid.transport.Connection.closed(Connection.java:529)
     at 
org.apache.qpid.transport.network.Assembler.closed(Assembler.java:113)
     at 
org.apache.qpid.transport.network.InputHandler.closed(InputHandler.java:202)
     at 
org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:150)
     at java.lang.Thread.run(Thread.java:619)

The gap between complete and point used to be much larger before adding 
the sync_publish setting.  There are no errors in the qpid broker log.  
The only thing in the log is along the lines of the following 2 messages:

  qpidd[19149]: 2011-06-08 11:50:03 warning 
ManagementAgent::periodicProcessing task overran 1 times by 6ms (taking 
5098421ns) on average.
  qpidd[19149]: 2011-06-08 11:50:16 warning  task overran 3 times by 2ms 
(taking 27955ns) on average.

Thanks,
Richard Peter

Re: C Broker Availability Problem

Posted by Richard Peter <ri...@raytheon.com>.

On 06/09/2011 08:17 AM, Gordon Sim wrote:
> On 06/08/2011 01:08 PM, Richard Peter wrote:
>> Hi,
>>
>> The issue I'm having is when a client producer sends message based on
>> user interaction. The message causes a screen to pop up on another
>> workstation. Usually the pop up is instantaneous, sometimes though it
>> takes up to 2 minutes for the message to get to the other workstation.
>
> Have you noticed latencies that large for any other messages in the 
> system? What's the max queue depth on the queue that message travels 
> through? Is it usually empty? 
We have noticed it on other queues, this one is the most easily tracked 
as it is the one only with human interaction.  This queue it is almost 
always empty.


On 06/09/2011 08:17 AM, Gordon Sim wrote:
>> The message is a JMS text message containing 9 characters, so fairly
>> small message. We have tried tuning the worker-threads thinking it was
>> an availability issue. This single message is more important than all
>> the other traffic our qpid is handling. Is there a way to give priority
>> to one queue over another? There is a large amount of traffic being
>> handled by the broker,
>
> What is your estimated peak total throughput? 
Roughly 9 million messages a day go through the broker.  Though roughly 
500k goes through during a 1 hour period 4 times a day.  The rest is 
fairly constant throughout the day.  So most hours have roughly 290k 
which probably 3/4 of that around 15 minutes before the hour to 15 after 
the hour.  And most these message are around 10k, some get as large as 
1meg though.

On 06/09/2011 08:17 AM, Gordon Sim wrote:
>
>> but not sure how the design is setup to handle
>> when they are many more sessions/queues than worker-threads. Does a
>> thread send all messages to a consumer before moving on to the next
>> queue? Or is the only way to ensure availability to further increase
>> worker-threads? I've had the threads as high as 100, but the load on the
>> system made the problem worse. Our setup is below.
>>
>> We are using version 0.8 of the C broker and java client. The broker has
>> roughly 100 queues. Each queue has at least two consumers, 1 each from
>> separate servers in a cluster. We then also have 20 clients listens to 4
>> topics and 5 clients listening to 1 queue (the important one mentioned
>> above). So in general out broker has roughly 300 sessions open at any
>> given time.
>
> Is each session on its own connection? Or are connections shared? If 
> shared, how many connections are there? 
On the servers there is a connection/session pooling mechanism.  It 
ranges from dedicated session to connection, to no more than 5 sessions 
on a given connection.  In general a given consumer will always forward 
on to a specific producer so they are on the same session.  On the 
client there is only one connection for all 4 topic sessions as well as 
publishing to the queue that needs to be immediately processed.


At this point I'm considering standing up a second qpid instance.  The 
second instance would only handle high priority traffic so that it is 
not overloaded with all the other traffic.





---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org

Re: C Broker Availability Problem

Posted by Gordon Sim <gs...@redhat.com>.

On 06/08/2011 01:08 PM, Richard Peter wrote:
> Hi,
>
> The issue I'm having is when a client producer sends message based on
> user interaction. The message causes a screen to pop up on another
> workstation. Usually the pop up is instantaneous, sometimes though it
> takes up to 2 minutes for the message to get to the other workstation.

Have you noticed latencies that large for any other messages in the 
system? What's the max queue depth on the queue that message travels 
through? Is it usually empty?

> The message is a JMS text message containing 9 characters, so fairly
> small message. We have tried tuning the worker-threads thinking it was
> an availability issue. This single message is more important than all
> the other traffic our qpid is handling. Is there a way to give priority
> to one queue over another? There is a large amount of traffic being
> handled by the broker,

What is your estimated peak total throughput?

> but not sure how the design is setup to handle
> when they are many more sessions/queues than worker-threads. Does a
> thread send all messages to a consumer before moving on to the next
> queue? Or is the only way to ensure availability to further increase
> worker-threads? I've had the threads as high as 100, but the load on the
> system made the problem worse. Our setup is below.
>
> We are using version 0.8 of the C broker and java client. The broker has
> roughly 100 queues. Each queue has at least two consumers, 1 each from
> separate servers in a cluster. We then also have 20 clients listens to 4
> topics and 5 clients listening to 1 queue (the important one mentioned
> above). So in general out broker has roughly 300 sessions open at any
> given time.

Is each session on its own connection? Or are connections shared? If 
shared, how many connections are there?

> Almost all of the queues are durable. The topics are not
> durable, nor are subscribers durable. All but one clients in the
> scenario are java clients, with 1 c client. The servers also use the
> java client. The following is connection url used by most of the clients
> (its embedded in spring xml, thus the escaped &.
>
> amqp://guest:guest@/program?brokerlist='tcp://${broker.addr}?retries='0'&amp;tcp_nodelay='true'&amp;connecttimeout='5000''&amp;maxprefetch='0'&amp;sync_publish='all'&amp;failover='nofailover'
>
>
> I only recently turned on tcp_nodelay and sync_publish, thinking that
> perhaps the message was occasionally getting stuck. These are the
> setting from our conf file for the broker:
>
> auth=no
> worker-threads=50
> data-dir=/somepath/qpid/data
> store-dir=/somepath/qpid/messageStore
> pid-dir=/somepath/qpid/var/lock
> num-jfiles=16
> jfile-size-pgs=24
> tcp-nodelay=true
>
> Many of the queues are sized larger than the default through a queue
> creator script. The sizes range up to a max file count of 32 and file
> size of 48. The server running qpid is a 8 cpu system with 2g of memory,
> some of the offices have a 16 cpu system with 8g of memory. The server
> size does not make a difference in the errors.
>
> Part of the theory for availability being the issue was that the clients
> kept timing out on heartbeat. So we disabled the heartbeat. We also
> occasionally see
> INFO 2011-06-06 17:47:42,501 [IoReceiver - somemachine/someip:5672]
> JmsPooledSession: EDEX: DEFAULT - Failed to close session
> org.apache.qpid.transport.SessionException: timed out waiting for sync:
> complete = 30115, point = 30116
> at org.apache.qpid.transport.Session.sync(Session.java:744)
> at org.apache.qpid.transport.Session.sync(Session.java:713)
> at
> org.apache.qpid.client.AMQSession_0_10.sendClose(AMQSession_0_10.java:427)
> at org.apache.qpid.client.AMQSession.close(AMQSession.java:700)
> at org.apache.qpid.client.AMQSession.close(AMQSession.java:666)
> at org.apache.qpid.client.AMQSession.close(AMQSession.java:525)
> at
> somepackage.jms.JmsPooledSession.closeInternal(JmsPooledSession.java:164)
> at
> somepackage.jms.JmsPooledConnection.disconnect(JmsPooledConnection.java:152)
>
> at
> somepackage.jms.JmsPooledConnection.onException(JmsPooledConnection.java:127)
>
> at
> org.apache.qpid.client.AMQConnectionDelegate_0_10.closed(AMQConnectionDelegate_0_10.java:270)
>
> at org.apache.qpid.transport.Connection.closed(Connection.java:529)
> at org.apache.qpid.transport.network.Assembler.closed(Assembler.java:113)
> at
> org.apache.qpid.transport.network.InputHandler.closed(InputHandler.java:202)
>
> at org.apache.qpid.transport.network.io.IoReceiver.run(IoReceiver.java:150)
> at java.lang.Thread.run(Thread.java:619)
>
> The gap between complete and point used to be much larger before adding
> the sync_publish setting. There are no errors in the qpid broker log.

That looks like it might be 
https://issues.apache.org/jira/browse/QPID-3259, though I would expect 
some error in the broker log as well.

> The only thing in the log is along the lines of the following 2 messages:
>
> qpidd[19149]: 2011-06-08 11:50:03 warning
> ManagementAgent::periodicProcessing task overran 1 times by 6ms (taking
> 5098421ns) on average.
> qpidd[19149]: 2011-06-08 11:50:16 warning task overran 3 times by 2ms
> (taking 27955ns) on average.
>
> Thanks,
> Richard Peter
>
>


---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:users-subscribe@qpid.apache.org