You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@qpid.apache.org by Alexis Richardson <al...@cohesiveft.com> on 2008/01/11 15:59:49 UTC

interoperability

Hello Qpid folks,

I've noticed some traffic recently on interoperability and testing,
and relatedly on performance.  There appears to be a lot of confusion
on this point.

1.  We very much want to see seamless interoperabilty for all brokers
of any given spec.  This is good for AMQP.
2.  This should include all Python tests if possible, unless clearly
marked otherwise for a good reason.
3.  People will then be able to make like for like performance
comparisons, because tests will work with any broker on 'swap in and
out' basis.  This is also good for AMQP.

On this last point (3) it is worth noting that writing
business-relevant tests is hard.  It's one thing to pipe millions of
one byte messages through a socket, and another to implement a real
use case.  We reported an OPRA case in December which has the merit of
being 'real world'.  Make of this what you will.  The case was written
up by Intel in a press release - they did the tests - you can see the
link below.

Also included below are the full details of what these numbers mean,
as posted to the RabbitMQ mailing list verbatim.  Feel free to read
and comment here or on the RabbitMQ list.

Make no mistake, brokers (like CPU) will get faster and faster
especially once this stuff is implemented in hardware.  For software
that is well written and can scale, the limiting factor is how many
messages you can process per core.  Provided AMQP implementations
interoperate to the standard, the only thing that customers will need
to do is to pick and choose which implementation suits their use case
the best, e.g. for reliability, scalability, ease of use, or raw
speed.

Best wishes

alexis

RabbitMQ



---------- Forwarded message ----------
From: Alexis Richardson <al...@cohesiveft.com>
Date: Dec 1, 2007 10:26 AM
Subject: rabbitmq performance
To: RabbitMQ Discuss <ra...@lists.rabbitmq.com>


Hi everyone,

Recently we did some load testing of RabbitMQ, working with Intel.
Their press release is reported here:

http://www.intelfasterfs.com/trading/articles/071128-intellowlatency.aspx

The use case was a simulated OPRA data feed using combination of:

- Pantor FAST client (essentially a codec) combined with an AMQP
client written in C (yes we are hoping to get this into the community)
- RabbitMQ AMQP broker version 1.2 on the server

Please don't read too much into the latency numbers here: the timed
path included two network hops as well as message processing at the
broker; also, somewhat annoyingly, the numbers are averaged over
multiple scenarios.  We wanted to look at throughput because OPRA
feeds are heading to 1 million ticks per second and it's a good load
testing case.

We shall publish more info soon but the numbers are as follows:

1. Ingress of about 1.3 million OPRA messages per second
2. Replicated out to four clients at once (unicast pub/sub not multicast)
3. So simultaneously, egress of about 5 million OPRA messages per second

The broker cluster was on one multicore box with 16 cores.  The network
was a full TCP/IP stack, and a standard 1GigE network (= the bottleneck).

The set up was:

1 Client Box --> 1 Server Box --> 1 Client Box

We used Intel's 16 core Caneland box for the server and the FAST/AMQP
client was delivered by Pantor, working with us.

How come the numbers are so high?  Well, one reason is that we used
FAST, which is a codec.  Each OPRA message was FAST-compressed
and batched into a block or 'datagram' of 16 compressed OPRA messages.
This is gradually becoming normal practice in the world of market data
feeds because the loads are high and people do not have enough
bandwidth to cope.

So in our test, each datagram contained 16 OPRA messages, and was
sent as one 256 byte AMQP message.

So the throughput can also be seen as:

1. Ingress of 80,000 AMQP messages per second (256b per message)
2. Replicated out to four clients at once (unicast pub/sub)
3. So simultaneously, egress of 320,000 AMQP messages per second (256b
per message)

I.e the real load is about 400,000 mps.

There are several ways to get these numbers higher:

- tune RabbitMQ for speed
- use multicast
- use Infiniband
- use faster cores

We just did some more tests using Intel's 45nm cores which look
promising in this regard.

The point is: for most use cases you can get good performance using
COTS hardware.  This means you can spend your valuable project
investment dollars on making the user experience better instead of
messing about with deep tech.

We think scalability, stability and ease of use are more important than
raw speed.  If you try to run RabbitMQ and do not see what you
expect along any of these metrics, please let us know and we'll help you.

alexis

-- 
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)

Re: interoperability

Posted by Alexis Richardson <al...@cohesiveft.com>.

Rupert

On Jan 11, 2008 3:21 PM, Rupert Smith <ru...@googlemail.com> wrote:
> Thanks Alexis,
>
> The numbers now make sense to me with all the details.

No worries, happy to help.


> One thing I am curious about... How well did RabbitMQ scale across the 16
> cores? In theory Erlang HIPE scheduler should scale well (and at no extra
> effort to the programmer too) did it live up to it?

It seems to scale pretty well and certainly without any effort on our parts.

>From my fading memory, we did about 45k AMQP messages (300 byte/msg),
per second per core for the first couple of cores then the rate drops
a little adding 30k per core thereafter.  But one month later we ran
the tests on faster cores which got better numbers and scaled a bit
better.

Bear in mind that this was only a first round of testing with Intel.
We had done similar testing on our own back in June.  In both cases we
did not have quite enough time to do everything we wanted to.  Perhaps
this is the way of all tests.

In other words - YMMV.

alexis



>
>
>
> On 11/01/2008, Alexis Richardson <al...@cohesiveft.com> wrote:
> > Hello Qpid folks,
> >
> > I've noticed some traffic recently on interoperability and testing,
> > and relatedly on performance.  There appears to be a lot of confusion
> > on this point.
> >
> > 1.  We very much want to see seamless interoperabilty for all brokers
> > of any given spec.  This is good for AMQP.
> > 2.  This should include all Python tests if possible, unless clearly
> > marked otherwise for a good reason.
> > 3.  People will then be able to make like for like performance
> > comparisons, because tests will work with any broker on 'swap in and
> > out' basis.  This is also good for AMQP.
> >
> > On this last point (3) it is worth noting that writing
> > business-relevant tests is hard.  It's one thing to pipe millions of
> > one byte messages through a socket, and another to implement a real
> > use case.  We reported an OPRA case in December which has the merit of
> > being 'real world'.  Make of this what you will.  The case was written
> > up by Intel in a press release - they did the tests - you can see the
> > link below.
> >
> > Also included below are the full details of what these numbers mean,
> > as posted to the RabbitMQ mailing list verbatim.  Feel free to read
> > and comment here or on the RabbitMQ list.
> >
> > Make no mistake, brokers (like CPU) will get faster and faster
> > especially once this stuff is implemented in hardware.  For software
> > that is well written and can scale, the limiting factor is how many
> > messages you can process per core.  Provided AMQP implementations
> > interoperate to the standard, the only thing that customers will need
> > to do is to pick and choose which implementation suits their use case
> > the best, e.g. for reliability, scalability, ease of use, or raw
> > speed.
> >
> > Best wishes
> >
> > alexis
> >
> > RabbitMQ
> >
> >
> >
> > ---------- Forwarded message ----------
> > From: Alexis Richardson < alexis.richardson@cohesiveft.com>
> > Date: Dec 1, 2007 10:26 AM
> > Subject: rabbitmq performance
> > To: RabbitMQ Discuss <rabbitmq-discuss@lists.rabbitmq.com >
> >
> >
> > Hi everyone,
> >
> > Recently we did some load testing of RabbitMQ, working with Intel.
> > Their press release is reported here:
> >
> > http://www.intelfasterfs.com/trading/articles/071128-intellowlatency.aspx
> >
> > The use case was a simulated OPRA data feed using combination of:
> >
> > - Pantor FAST client (essentially a codec) combined with an AMQP
> > client written in C (yes we are hoping to get this into the community)
> > - RabbitMQ AMQP broker version 1.2 on the server
> >
> > Please don't read too much into the latency numbers here: the timed
> > path included two network hops as well as message processing at the
> > broker; also, somewhat annoyingly, the numbers are averaged over
> > multiple scenarios.  We wanted to look at throughput because OPRA
> > feeds are heading to 1 million ticks per second and it's a good load
> > testing case.
> >
> > We shall publish more info soon but the numbers are as follows:
> >
> > 1. Ingress of about 1.3 million OPRA messages per second
> > 2. Replicated out to four clients at once (unicast pub/sub not multicast)
> > 3. So simultaneously, egress of about 5 million OPRA messages per second
> >
> > The broker cluster was on one multicore box with 16 cores.  The network
> > was a full TCP/IP stack, and a standard 1GigE network (= the bottleneck).
> >
> > The set up was:
> >
> > 1 Client Box --> 1 Server Box --> 1 Client Box
> >
> > We used Intel's 16 core Caneland box for the server and the FAST/AMQP
> > client was delivered by Pantor, working with us.
> >
> > How come the numbers are so high?  Well, one reason is that we used
> > FAST, which is a codec.  Each OPRA message was FAST-compressed
> > and batched into a block or 'datagram' of 16 compressed OPRA messages.
> > This is gradually becoming normal practice in the world of market data
> > feeds because the loads are high and people do not have enough
> > bandwidth to cope.
> >
> > So in our test, each datagram contained 16 OPRA messages, and was
> > sent as one 256 byte AMQP message.
> >
> > So the throughput can also be seen as:
> >
> > 1. Ingress of 80,000 AMQP messages per second (256b per message)
> > 2. Replicated out to four clients at once (unicast pub/sub)
> > 3. So simultaneously, egress of 320,000 AMQP messages per second (256b
> > per message)
> >
> > I.e the real load is about 400,000 mps.
> >
> > There are several ways to get these numbers higher:
> >
> > - tune RabbitMQ for speed
> > - use multicast
> > - use Infiniband
> > - use faster cores
> >
> > We just did some more tests using Intel's 45nm cores which look
> > promising in this regard.
> >
> > The point is: for most use cases you can get good performance using
> > COTS hardware.  This means you can spend your valuable project
> > investment dollars on making the user experience better instead of
> > messing about with deep tech.
> >
> > We think scalability, stability and ease of use are more important than
> > raw speed.  If you try to run RabbitMQ and do not see what you
> > expect along any of these metrics, please let us know and we'll help you.
> >
> > alexis
> >
> > --
> > Alexis Richardson
> > +44 20 7617 7339 (UK)
> > +44 77 9865 2911 (cell)
> > +1 650 206 2517 (US)
> >
>
>



-- 
Alexis Richardson
+44 20 7617 7339 (UK)
+44 77 9865 2911 (cell)
+1 650 206 2517 (US)

Re: interoperability

Posted by Rupert Smith <ru...@googlemail.com>.

Thanks Alexis,

The numbers now make sense to me with all the details.

One thing I am curious about... How well did RabbitMQ scale across the 16
cores? In theory Erlang HIPE scheduler should scale well (and at no extra
effort to the programmer too) did it live up to it?

Rupert

On 11/01/2008, Alexis Richardson <al...@cohesiveft.com> wrote:
>
> Hello Qpid folks,
>
> I've noticed some traffic recently on interoperability and testing,
> and relatedly on performance.  There appears to be a lot of confusion
> on this point.
>
> 1.  We very much want to see seamless interoperabilty for all brokers
> of any given spec.  This is good for AMQP.
> 2.  This should include all Python tests if possible, unless clearly
> marked otherwise for a good reason.
> 3.  People will then be able to make like for like performance
> comparisons, because tests will work with any broker on 'swap in and
> out' basis.  This is also good for AMQP.
>
> On this last point (3) it is worth noting that writing
> business-relevant tests is hard.  It's one thing to pipe millions of
> one byte messages through a socket, and another to implement a real
> use case.  We reported an OPRA case in December which has the merit of
> being 'real world'.  Make of this what you will.  The case was written
> up by Intel in a press release - they did the tests - you can see the
> link below.
>
> Also included below are the full details of what these numbers mean,
> as posted to the RabbitMQ mailing list verbatim.  Feel free to read
> and comment here or on the RabbitMQ list.
>
> Make no mistake, brokers (like CPU) will get faster and faster
> especially once this stuff is implemented in hardware.  For software
> that is well written and can scale, the limiting factor is how many
> messages you can process per core.  Provided AMQP implementations
> interoperate to the standard, the only thing that customers will need
> to do is to pick and choose which implementation suits their use case
> the best, e.g. for reliability, scalability, ease of use, or raw
> speed.
>
> Best wishes
>
> alexis
>
> RabbitMQ
>
>
>
> ---------- Forwarded message ----------
> From: Alexis Richardson <al...@cohesiveft.com>
> Date: Dec 1, 2007 10:26 AM
> Subject: rabbitmq performance
> To: RabbitMQ Discuss <ra...@lists.rabbitmq.com>
>
>
> Hi everyone,
>
> Recently we did some load testing of RabbitMQ, working with Intel.
> Their press release is reported here:
>
> http://www.intelfasterfs.com/trading/articles/071128-intellowlatency.aspx
>
> The use case was a simulated OPRA data feed using combination of:
>
> - Pantor FAST client (essentially a codec) combined with an AMQP
> client written in C (yes we are hoping to get this into the community)
> - RabbitMQ AMQP broker version 1.2 on the server
>
> Please don't read too much into the latency numbers here: the timed
> path included two network hops as well as message processing at the
> broker; also, somewhat annoyingly, the numbers are averaged over
> multiple scenarios.  We wanted to look at throughput because OPRA
> feeds are heading to 1 million ticks per second and it's a good load
> testing case.
>
> We shall publish more info soon but the numbers are as follows:
>
> 1. Ingress of about 1.3 million OPRA messages per second
> 2. Replicated out to four clients at once (unicast pub/sub not multicast)
> 3. So simultaneously, egress of about 5 million OPRA messages per second
>
> The broker cluster was on one multicore box with 16 cores.  The network
> was a full TCP/IP stack, and a standard 1GigE network (= the bottleneck).
>
> The set up was:
>
> 1 Client Box --> 1 Server Box --> 1 Client Box
>
> We used Intel's 16 core Caneland box for the server and the FAST/AMQP
> client was delivered by Pantor, working with us.
>
> How come the numbers are so high?  Well, one reason is that we used
> FAST, which is a codec.  Each OPRA message was FAST-compressed
> and batched into a block or 'datagram' of 16 compressed OPRA messages.
> This is gradually becoming normal practice in the world of market data
> feeds because the loads are high and people do not have enough
> bandwidth to cope.
>
> So in our test, each datagram contained 16 OPRA messages, and was
> sent as one 256 byte AMQP message.
>
> So the throughput can also be seen as:
>
> 1. Ingress of 80,000 AMQP messages per second (256b per message)
> 2. Replicated out to four clients at once (unicast pub/sub)
> 3. So simultaneously, egress of 320,000 AMQP messages per second (256b
> per message)
>
> I.e the real load is about 400,000 mps.
>
> There are several ways to get these numbers higher:
>
> - tune RabbitMQ for speed
> - use multicast
> - use Infiniband
> - use faster cores
>
> We just did some more tests using Intel's 45nm cores which look
> promising in this regard.
>
> The point is: for most use cases you can get good performance using
> COTS hardware.  This means you can spend your valuable project
> investment dollars on making the user experience better instead of
> messing about with deep tech.
>
> We think scalability, stability and ease of use are more important than
> raw speed.  If you try to run RabbitMQ and do not see what you
> expect along any of these metrics, please let us know and we'll help you.
>
> alexis
>
> --
> Alexis Richardson
> +44 20 7617 7339 (UK)
> +44 77 9865 2911 (cell)
> +1 650 206 2517 (US)
>