You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@activemq.apache.org by Aaron Mulder <am...@alumni.princeton.edu> on 2009/04/28 05:04:47 UTC

Performance of high number of topics or selectors

Let's say I have 100,000 stock tickers (or whatever) I want to track,
each with a separate client.

There seem to be two approaches -- putting them all in one topic and
having each of the clients use a selector (ticker equivalent in a
header property), or putting messages for each header on it's own
topic and subscribing each consumer to one dedicated topic.

When I try this, the performance stinks.  With the selector approach
it takes as long as like 10 minutes for 100,000 clients to connect
(using the VM transport), I need more than 256 MB of heap (512 was
fine but 384 might have been OK), and the message send/receive time
seems to go up more or less linearly with the number of clients (like,
.5 seconds for 1000 clients, 5s for 10,000 clients, 75s for 100,000
clients, all for 1000 messages).  My actual selector used two header
properties, but you get the idea.

With the topic approach, which I assumed would perform better on
account of eliminating the selectors, the persistent store blew out
file handles and crashed, so I disabled persistence for the test.  (Is
there a disk-based store that doesn't keep an open file for each
topic?).  With that out of the way, it seemed OK up to 10,000 clients
(and up to 3x as fast for message send/receive), but isn't really able
to connect 100,000 clients each to a different topic.  Like, the first
40K took 20 minutes, and the it pretty much hit the limit of 512 heap
and the next 2K took 15 minutes of which at least 12 was in GC, etc.
I'm not sure how high I'm willing to push the heap, but over 1 GB
seems out of line when I haven't sent any messages yet.

Anyway, this wasn't really what I had hoped for.  Is there some good
way to customize the configuration to either handle ~100K selectors
really optimally, or ~100K topics really optimally?

Or would it just be better to have one topic where every client gets
every message and manually discards the ones it doesn't care about?  I
haven't tried that yet.

Thanks,
       Aaron

Re: Performance of high number of topics or selectors

Posted by Gary Tully <ga...@gmail.com>.

There are some hard limits in terms of fds and memory utilisation both for
destinations and connections. The simplest approach to achieve scalability
at these levels (100k topics) is to divide and conquer using a network of
brokers <http://activemq.apache.org/networks-of-brokers.html> and a random
broker connection
strategy.<http://activemq.apache.org/failover-transport-reference.html>
In this way the topics will be dispersed across the brokers in the network
with cross talk and duplication happening where there producers and
consumers are on separate brokers.

To avoid the cross talk, you could further partition the network of brokers
using exclusion filters and provide a location mechanism that can identify
the broker for a particular topic.



2009/4/28 Aaron Mulder <am...@alumni.princeton.edu>

> Well, the send to everyone and manually filter turned out to be a
> really bad idea -- I guess the message gets copied too many times or
> whatever.  It gets an OOM error with a 512 MB heap for 10,000 clients.
>  So I'm still looking for a better way, whether it be general approach
> or tuning to handle high number of selectors or topics.
>
> Thanks,
>     Aaron
>
> On Mon, Apr 27, 2009 at 11:04 PM, Aaron Mulder
> <am...@alumni.princeton.edu> wrote:
> > Let's say I have 100,000 stock tickers (or whatever) I want to track,
> > each with a separate client.
> >
> > There seem to be two approaches -- putting them all in one topic and
> > having each of the clients use a selector (ticker equivalent in a
> > header property), or putting messages for each header on it's own
> > topic and subscribing each consumer to one dedicated topic.
> >
> > When I try this, the performance stinks.  With the selector approach
> > it takes as long as like 10 minutes for 100,000 clients to connect
> > (using the VM transport), I need more than 256 MB of heap (512 was
> > fine but 384 might have been OK), and the message send/receive time
> > seems to go up more or less linearly with the number of clients (like,
> > .5 seconds for 1000 clients, 5s for 10,000 clients, 75s for 100,000
> > clients, all for 1000 messages).  My actual selector used two header
> > properties, but you get the idea.
> >
> > With the topic approach, which I assumed would perform better on
> > account of eliminating the selectors, the persistent store blew out
> > file handles and crashed, so I disabled persistence for the test.  (Is
> > there a disk-based store that doesn't keep an open file for each
> > topic?).  With that out of the way, it seemed OK up to 10,000 clients
> > (and up to 3x as fast for message send/receive), but isn't really able
> > to connect 100,000 clients each to a different topic.  Like, the first
> > 40K took 20 minutes, and the it pretty much hit the limit of 512 heap
> > and the next 2K took 15 minutes of which at least 12 was in GC, etc.
> > I'm not sure how high I'm willing to push the heap, but over 1 GB
> > seems out of line when I haven't sent any messages yet.
> >
> > Anyway, this wasn't really what I had hoped for.  Is there some good
> > way to customize the configuration to either handle ~100K selectors
> > really optimally, or ~100K topics really optimally?
> >
> > Or would it just be better to have one topic where every client gets
> > every message and manually discards the ones it doesn't care about?  I
> > haven't tried that yet.
> >
> > Thanks,
> >       Aaron
> >
>



-- 
http://blog.garytully.com

Open Source SOA
http://FUSESource.com

Re: Performance of high number of topics or selectors

Posted by Aaron Mulder <am...@alumni.princeton.edu>.

Well, the send to everyone and manually filter turned out to be a
really bad idea -- I guess the message gets copied too many times or
whatever.  It gets an OOM error with a 512 MB heap for 10,000 clients.
 So I'm still looking for a better way, whether it be general approach
or tuning to handle high number of selectors or topics.

Thanks,
    Aaron

On Mon, Apr 27, 2009 at 11:04 PM, Aaron Mulder
<am...@alumni.princeton.edu> wrote:
> Let's say I have 100,000 stock tickers (or whatever) I want to track,
> each with a separate client.
>
> There seem to be two approaches -- putting them all in one topic and
> having each of the clients use a selector (ticker equivalent in a
> header property), or putting messages for each header on it's own
> topic and subscribing each consumer to one dedicated topic.
>
> When I try this, the performance stinks.  With the selector approach
> it takes as long as like 10 minutes for 100,000 clients to connect
> (using the VM transport), I need more than 256 MB of heap (512 was
> fine but 384 might have been OK), and the message send/receive time
> seems to go up more or less linearly with the number of clients (like,
> .5 seconds for 1000 clients, 5s for 10,000 clients, 75s for 100,000
> clients, all for 1000 messages).  My actual selector used two header
> properties, but you get the idea.
>
> With the topic approach, which I assumed would perform better on
> account of eliminating the selectors, the persistent store blew out
> file handles and crashed, so I disabled persistence for the test.  (Is
> there a disk-based store that doesn't keep an open file for each
> topic?).  With that out of the way, it seemed OK up to 10,000 clients
> (and up to 3x as fast for message send/receive), but isn't really able
> to connect 100,000 clients each to a different topic.  Like, the first
> 40K took 20 minutes, and the it pretty much hit the limit of 512 heap
> and the next 2K took 15 minutes of which at least 12 was in GC, etc.
> I'm not sure how high I'm willing to push the heap, but over 1 GB
> seems out of line when I haven't sent any messages yet.
>
> Anyway, this wasn't really what I had hoped for.  Is there some good
> way to customize the configuration to either handle ~100K selectors
> really optimally, or ~100K topics really optimally?
>
> Or would it just be better to have one topic where every client gets
> every message and manually discards the ones it doesn't care about?  I
> haven't tried that yet.
>
> Thanks,
>       Aaron
>