You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@qpid.apache.org by Sam Hendley <sh...@greenenergycorp.com> on 2011/01/10 16:59:19 UTC
Qpid 0.8 c++ broker memory usage

Does this list have a no-attachments policy? I sent the attached 2 emails,
both with attachments and it looks like they never went out to the list
(according to archive), i got a "rejected post" for the second attempt today
(3 days after mailing). Weird thing is as far as my inbox is concerned the
message was sent to the list so I thought i hadn't given enough details in
the first email so I sent another one (the rejected one). Its weird that the
"rejected post" came so long after I sent it, is a human driven process? A
quick rejection would be better, i've spent the last 2 weeks wondering
whether anyone would respond :).

Anyways, my original issue is below, I have removed all attachments hoping
this will get it past the filter. I can pastebin them or send them
individually if they will be helpful.

Thanks

Sam

---------- Forwarded message ----------
From: Sam Hendley <sh...@greenenergycorp.com>
Date: Fri, Jan 7, 2011 at 1:04 PM
Subject: Re: Qpid 0.8 c++ broker memory usage
To: users@qpid.apache.org


I'm sorry to repost this but since I originally sent it in the midst of the
holidays I imagine alot of people may have not noticed it. I have continued
to look at the qpid memory issue and I still cannot come up with a decent
explanation of what I am doing wrong to cause this increase in memory
usage.

I have tried using every qpid-* tool at my disposal and can't find anything
unusual. I am attaching 2 files collected 1 hour apart on one of our running
systems. The last line shows the "ps -aux" output and you can see a 20mb
increase in 'dirty' memory over that one hour. This bloat keeps up at that
rate pretty steadily (way past the starting VM size) and eventually chokes
the system requiring a restart of qpidd.

I am at a loss on where to proceed from here, can anyone recommend steps I
might take to diagnose what I am doing wrong? It appears to be due to having
a single listener bound with the routing key '#', if I disable that listener
the memory growth doesn't happen. Clearly this can't be a widespread problem
or everyone would be restarting qpid every few days.

Thanks again,
Sam

BTW:
Script to generate file:
(qpid-stat -q -S msgIn && qpid-stat -e -S msgIn && qpid-stat -c -S msgIn &&
qpid-stat -u -S delivered && sudo ps aux | grep qpid | grep -v grep ) >
later.log


On Wed, Dec 29, 2010 at 3:58 PM, Sam Hendley
<sh...@greenenergycorp.com>wrote:

> We have been doing soak testing on our systems for the last few weeks and
> keep having issues where qpidd is continiously increasing its memory usage,
> eventually paging out the OS and crippling the system. The growth is very
> slow, somewhere between 4 and 20 bytes a second. The broker is not heavily
> loaded, perhaps 100 msgs a second, none of them larger than a hundred bytes.
> As far as I can tell (using the qpid-* tools) the messages are being
> consumed as fast as possible, the queued messages counter is always 0.
>
> The main message flow is many messages published to a topic exchange with
> different routing keys. I have a single listener bound with the all key "#":
>
> sam@reef-deploy:~/qpidc-0.8$ qpid-stat -q -S msgIn -L 4 && qpid-stat -e -S
> msgIn -L 4
> Queues
>   queue                                   dur  autoDel  excl  msg   msgIn
>  msgOut  bytes  bytesIn  bytesOut  cons  bind
>
>   =======================================================================================================================
>   "0fd2fc12-5d12-4722-b352-74e557b5af88"       Y        Y        0  3.12k
>  3.12k      0    137k     137k        1     2
>   "f0ca48a8-13cc-49fe-9f94-8b2f592bef70"       Y        Y        0  3.12k
>  3.12k      0    335k     335k        1     2
>   "a4491db9-a8c6-4acb-91d4-5c792ac961fe"       Y        Y        0  3.12k
>  3.12k      0    182k     182k        1     2
>   reply-reef-deploy.3606.1                     Y        Y        0    71
>   71       0   64.9k    64.9k        1     2
> Exchanges
>   exchange           type    dur  bind  msgIn  msgOut  msgDrop  byteIn
>  byteOut  byteDrop
>
>   =========================================================================================
>   amq.direct         direct  Y      22  3.50k  3.50k      0      500k
>  500k       0
>   measurement_batch  topic           2  3.20k  3.20k      2      344k
>  344k     167
>   measurement        topic           1  3.20k  3.20k      0      140k
>  140k       0
>   qpid.management    topic           2   401      0     401      351k
>  0      351k
>
>
>
> When I watch the memory usage it seems to go up at around 4 bytes a second,
> after a few days of running it will have pushed the rest of the system into
> swap. This publishing step is something I can disable and have found that
> the "leak" goes away if I stop the publishing or disable the subscriber so I
> am pretty convinced its this part of the system that is causing the problem.
> I am using the same bind and listen code for both the "measurement listener"
> and the "measurement_batch" handlers so if the binding code was bad (like
> not acknowledging) it should occur regardless of whether I am publishing the
> "measurements" or not.
>
> I am at a bit of a loss as to what to look for next, as far as I can tell
> everything is working as it should be but the memory keeps going up. I have
> included a snippet of the logs from running the broker with '-t' logging
> turned on but theres nothing untoward I can see in this file. Clearly a
> memory leak of this magnitude would have been noticed by now so I am
> assuming I am doing something wrong but I am out of ideas on what could be
> wrong, the only difference I can figure out between the "measurement_batch"
> service and the "measurement" service is the routing key being "#". Why that
> should cause a slow gain in memory in the broker is where I get stuck.
>
> Thanks for any advice you can give, happy holidays.
> Sam
>
> Details:
>
> The broker is qpid 0.8 compiled against boost 1.41 on Ubuntu Lucid. The
> client is the java 0.6 client.
> Command line is:
> sudo /usr/local/sbin/qpidd --port 5672 --auth no --data-dir
> /var/db/qpidd.5672 --pid-dir /var/db/qpidd.5672 --log-to-file
> /var/log/qpidd.5672.log
>
>
> I have also tried using the pmap utility to see what if I can determine
> where the memory is being allocated from. I am including 2 pmap -x outputs,
> seperated by about 100 seconds. Doing a diff on the 2 files produces:
> 6,10c6,10
> < 0000000001b78000    3888       -       -       - rw---    [ anon ]
> < 00007f7934000000    1640       -       -       - rw---    [ anon ]
> < 00007f793419a000   63896       -       -       - -----    [ anon ]
> < 00007f793c000000    2472       -       -       - rw---    [ anon ]
> < 00007f793c26a000   63064       -       -       - -----    [ anon ]
> ---
> > 0000000001b78000    4148       -       -       - rw---    [ anon ]
> > 00007f7934000000    1908       -       -       - rw---    [ anon ]
> > 00007f79341dd000   63628       -       -       - -----    [ anon ]
> > 00007f793c000000    2800       -       -       - rw---    [ anon ]
> > 00007f793c2bc000   62736       -       -       - -----    [ anon ]
> 109c109
> < total kB          208852       -       -       -
> ---
> > total kB          209112       -       -       -
>
> I don't really know what I am looking at here but it looks to me like 2 of
> those memory hunks are getting "used". The other 2 large chunks (63 Mb)
> appear to have moved (probably meaning they were grown or shrunk). I have
> not collected these files when the listeners are removed (and the memory
> usage is stable), if someone thinks that would be valuable I can do that.
>