You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Piotr Kozikowski <pi...@liveramp.com> on 2013/04/09 01:42:32 UTC

Analysis of producer performance

Hi,

At LiveRamp we are considering replacing Scribe with Kafka, and as a first
step we run some tests to evaluate producer performance. You can find our
preliminary results here:
https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/. We
hope this will be useful for some folks, and If anyone has comments or
suggestions about what to do differently to obtain better results your
feedback will be very welcome.

Thanks,

Piotr

Re: Analysis of producer performance

Posted by Jun Rao <ju...@gmail.com>.
Piotr,

Not sure where the updated numbers are. but what you described may make
sense. In the no ack mode, if the broker is saturated, it will put back
pressure to the producer. Eventually, the producer will slow down because
the socket buffer is full. One big difference btw 0.8 and 0.7 is that the
0.8 broker has the overhead of recompressing compressed messages. If you
only have one partition in your test, all producers have to synchronize on
the log when doing recompressing, which could limit the throughput. To
improve the throughput, you can try using more partitions, turning off
compression or using a cheaper compression codec like snappy.

Thanks,

Jun


On Fri, Apr 12, 2013 at 4:08 PM, Piotr Kozikowski <pi...@liveramp.com>wrote:

> Hi all,
>
> I posted an update on the post (
> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/) to
> test the effect of disabling ack messages from brokers. It appears this
> only makes a big difference (~2x improvement ) when using synthetic log
> messages, but only a modest 12% improvement when using real production
> messages. This is using GZIP compression. The way I interpret this is that
> just turning acks off is not enough to mimic the 0.7 behavior because GZIP
> consumes significant CPU time and since the brokers now need to decompress
> data, there is a hit on throughput even without acks. Does this sound
> reasonable?
>
> Thanks,
>
> Piotr
>
> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com>
> wrote:
>
> > Hi,
> >
> > At LiveRamp we are considering replacing Scribe with Kafka, and as a
> first
> > step we run some tests to evaluate producer performance. You can find our
> > preliminary results here:
> > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
> > We hope this will be useful for some folks, and If anyone has comments or
> > suggestions about what to do differently to obtain better results your
> > feedback will be very welcome.
> >
> > Thanks,
> >
> > Piotr
> >
>

Re: Analysis of producer performance

Posted by Piotr Kozikowski <pi...@liveramp.com>.
Hi all,

I posted an update on the post (
https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/) to
test the effect of disabling ack messages from brokers. It appears this
only makes a big difference (~2x improvement ) when using synthetic log
messages, but only a modest 12% improvement when using real production
messages. This is using GZIP compression. The way I interpret this is that
just turning acks off is not enough to mimic the 0.7 behavior because GZIP
consumes significant CPU time and since the brokers now need to decompress
data, there is a hit on throughput even without acks. Does this sound
reasonable?

Thanks,

Piotr

On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com> wrote:

> Hi,
>
> At LiveRamp we are considering replacing Scribe with Kafka, and as a first
> step we run some tests to evaluate producer performance. You can find our
> preliminary results here:
> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
> We hope this will be useful for some folks, and If anyone has comments or
> suggestions about what to do differently to obtain better results your
> feedback will be very welcome.
>
> Thanks,
>
> Piotr
>

Re: Analysis of producer performance

Posted by Guy Doulberg <gu...@conduit.com>.
Hi Jun

Great presentation,  great feature


On 04/09/2013 07:48 AM, Jun Rao wrote:
> Piotr,
>
> Thanks for sharing this. Very interesting and useful study. A few comments:
>
> 1. For existing 0.7 users, we have a migration tool that mirrors data from
> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
> upgrading consumers first, followed by producers.
>
> 2. Have you looked at the Kafka ApacheCon slides (
> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)? Towards
> the end, there are some performance numbers too. The figure for throughput
> vs #producer is different from what you have. Not sure if this is because
> that you have turned on compression.
>
> 3. Not sure that I understand the difference btw the first 2 graphs in the
> latency section. What's different btw the 2 tests?
>
> 4. Post 0.8, we plan to improve the producer side throughput by
> implementing non-blocking socket on the client side.
>
> Jun
>
>
> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com> wrote:
>
>> Hi,
>>
>> At LiveRamp we are considering replacing Scribe with Kafka, and as a first
>> step we run some tests to evaluate producer performance. You can find our
>> preliminary results here:
>> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/. We
>> hope this will be useful for some folks, and If anyone has comments or
>> suggestions about what to do differently to obtain better results your
>> feedback will be very welcome.
>>
>> Thanks,
>>
>> Piotr
>>


Re: Analysis of producer performance

Posted by Jun Rao <ju...@gmail.com>.
Another way to handle this is to provision enough client and broker servers
so that the peak load can be handled without spooling.

Thanks,

Jun


On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <pi...@liveramp.com>wrote:

> Jun,
>
> When talking about "catastrophic consequences" I was actually only
> referring to the producer side. in our use case (logging requests from
> webapp servers), a spike in traffic would force us to either tolerate a
> dramatic increase in the response time, or drop messages, both of which are
> really undesirable. Hence the need to absorb spikes with some system on top
> of Kafka, unless the spooling feature mentioned by Wing (
> https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This is
> assuming there are a lot more producer machines than broker nodes, so each
> producer would absorb a small part of the extra load from the spike.
>
> Piotr
>
> On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > Piotr,
> >
> > Actually, could you clarify what "catastrophic consequences" did you see
> on
> > the broker side? Do clients timeout due to longer serving time or
> something
> > else?
> >
> > Going forward, we plan to add per client quotas (KAFKA-656) to prevent
> the
> > brokers from being overwhelmed by a runaway client.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > otis_gospodnetic@yahoo.com> wrote:
> >
> > > Hi,
> > >
> > > Is there anything one can do to "defend" from:
> > >
> > > "Trying to push more data than the brokers can handle for any sustained
> > > period of time has catastrophic consequences, regardless of what
> timeout
> > > settings are used. In our use case this means that we need to either
> > ensure
> > > we have spare capacity for spikes, or use something on top of Kafka to
> > > absorb spikes."
> > >
> > > ?
> > > Thanks,
> > > Otis
> > > ----
> > > Performance Monitoring for Solr / ElasticSearch / HBase -
> > > http://sematext.com/spm
> > >
> > >
> > >
> > >
> > >
> > > >________________________________
> > > > From: Piotr Kozikowski <pi...@liveramp.com>
> > > >To: users@kafka.apache.org
> > > >Sent: Tuesday, April 9, 2013 1:23 PM
> > > >Subject: Re: Analysis of producer performance
> > > >
> > > >Jun,
> > > >
> > > >Thank you for your comments. I'll reply point by point for clarity.
> > > >
> > > >1. We were aware of the migration tool but since we haven't used Kafka
> > for
> > > >production yet we just started using the 0.8 version directly.
> > > >
> > > >2. I hadn't seen those particular slides, very interesting. I'm not
> sure
> > > >we're testing the same thing though. In our case we vary the number of
> > > >physical machines, but each one has 10 threads accessing a pool of
> Kafka
> > > >producer objects and in theory a single machine is enough to saturate
> > the
> > > >brokers (which our test mostly confirms). Also, assuming that the
> slides
> > > >are based on the built-in producer performance tool, I know that we
> > > started
> > > >getting very different numbers once we switched to use "real" (actual
> > > >production log) messages. Compression may also be a factor in case it
> > > >wasn't configured the same way in those tests.
> > > >
> > > >3. In the latency section, there are two tests, one for average and
> > > another
> > > >for maximum latency. Each one has two graphs presenting the exact same
> > > data
> > > >but at different levels of zoom. The first one is to observe small
> > > >variations of latency when target throughput <= actual throughput. The
> > > >second is to observe the overall shape of the graph once latency
> starts
> > > >growing when target throughput > actual throughput. I hope that makes
> > > sense.
> > > >
> > > >4. That sounds great, looking forward to it.
> > > >
> > > >Piotr
> > > >
> > > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > > >
> > > >> Piotr,
> > > >>
> > > >> Thanks for sharing this. Very interesting and useful study. A few
> > > comments:
> > > >>
> > > >> 1. For existing 0.7 users, we have a migration tool that mirrors
> data
> > > from
> > > >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
> > > >> upgrading consumers first, followed by producers.
> > > >>
> > > >> 2. Have you looked at the Kafka ApacheCon slides (
> > > >> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)?
> > > Towards
> > > >> the end, there are some performance numbers too. The figure for
> > > throughput
> > > >> vs #producer is different from what you have. Not sure if this is
> > > because
> > > >> that you have turned on compression.
> > > >>
> > > >> 3. Not sure that I understand the difference btw the first 2 graphs
> in
> > > the
> > > >> latency section. What's different btw the 2 tests?
> > > >>
> > > >> 4. Post 0.8, we plan to improve the producer side throughput by
> > > >> implementing non-blocking socket on the client side.
> > > >>
> > > >> Jun
> > > >>
> > > >>
> > > >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <
> piotr@liveramp.com>
> > > >> wrote:
> > > >>
> > > >> > Hi,
> > > >> >
> > > >> > At LiveRamp we are considering replacing Scribe with Kafka, and
> as a
> > > >> first
> > > >> > step we run some tests to evaluate producer performance. You can
> > find
> > > our
> > > >> > preliminary results here:
> > > >> >
> > > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
> .
> > > >> We
> > > >> > hope this will be useful for some folks, and If anyone has
> comments
> > or
> > > >> > suggestions about what to do differently to obtain better results
> > your
> > > >> > feedback will be very welcome.
> > > >> >
> > > >> > Thanks,
> > > >> >
> > > >> > Piotr
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > >
> >
>

Re: Analysis of producer performance

Posted by Piotr Kozikowski <pi...@liveramp.com>.
Jun,

When talking about "catastrophic consequences" I was actually only
referring to the producer side. in our use case (logging requests from
webapp servers), a spike in traffic would force us to either tolerate a
dramatic increase in the response time, or drop messages, both of which are
really undesirable. Hence the need to absorb spikes with some system on top
of Kafka, unless the spooling feature mentioned by Wing (
https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This is
assuming there are a lot more producer machines than broker nodes, so each
producer would absorb a small part of the extra load from the spike.

Piotr

On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <ju...@gmail.com> wrote:

> Piotr,
>
> Actually, could you clarify what "catastrophic consequences" did you see on
> the broker side? Do clients timeout due to longer serving time or something
> else?
>
> Going forward, we plan to add per client quotas (KAFKA-656) to prevent the
> brokers from being overwhelmed by a runaway client.
>
> Thanks,
>
> Jun
>
>
> On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
>
> > Hi,
> >
> > Is there anything one can do to "defend" from:
> >
> > "Trying to push more data than the brokers can handle for any sustained
> > period of time has catastrophic consequences, regardless of what timeout
> > settings are used. In our use case this means that we need to either
> ensure
> > we have spare capacity for spikes, or use something on top of Kafka to
> > absorb spikes."
> >
> > ?
> > Thanks,
> > Otis
> > ----
> > Performance Monitoring for Solr / ElasticSearch / HBase -
> > http://sematext.com/spm
> >
> >
> >
> >
> >
> > >________________________________
> > > From: Piotr Kozikowski <pi...@liveramp.com>
> > >To: users@kafka.apache.org
> > >Sent: Tuesday, April 9, 2013 1:23 PM
> > >Subject: Re: Analysis of producer performance
> > >
> > >Jun,
> > >
> > >Thank you for your comments. I'll reply point by point for clarity.
> > >
> > >1. We were aware of the migration tool but since we haven't used Kafka
> for
> > >production yet we just started using the 0.8 version directly.
> > >
> > >2. I hadn't seen those particular slides, very interesting. I'm not sure
> > >we're testing the same thing though. In our case we vary the number of
> > >physical machines, but each one has 10 threads accessing a pool of Kafka
> > >producer objects and in theory a single machine is enough to saturate
> the
> > >brokers (which our test mostly confirms). Also, assuming that the slides
> > >are based on the built-in producer performance tool, I know that we
> > started
> > >getting very different numbers once we switched to use "real" (actual
> > >production log) messages. Compression may also be a factor in case it
> > >wasn't configured the same way in those tests.
> > >
> > >3. In the latency section, there are two tests, one for average and
> > another
> > >for maximum latency. Each one has two graphs presenting the exact same
> > data
> > >but at different levels of zoom. The first one is to observe small
> > >variations of latency when target throughput <= actual throughput. The
> > >second is to observe the overall shape of the graph once latency starts
> > >growing when target throughput > actual throughput. I hope that makes
> > sense.
> > >
> > >4. That sounds great, looking forward to it.
> > >
> > >Piotr
> > >
> > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > >> Piotr,
> > >>
> > >> Thanks for sharing this. Very interesting and useful study. A few
> > comments:
> > >>
> > >> 1. For existing 0.7 users, we have a migration tool that mirrors data
> > from
> > >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
> > >> upgrading consumers first, followed by producers.
> > >>
> > >> 2. Have you looked at the Kafka ApacheCon slides (
> > >> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)?
> > Towards
> > >> the end, there are some performance numbers too. The figure for
> > throughput
> > >> vs #producer is different from what you have. Not sure if this is
> > because
> > >> that you have turned on compression.
> > >>
> > >> 3. Not sure that I understand the difference btw the first 2 graphs in
> > the
> > >> latency section. What's different btw the 2 tests?
> > >>
> > >> 4. Post 0.8, we plan to improve the producer side throughput by
> > >> implementing non-blocking socket on the client side.
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > At LiveRamp we are considering replacing Scribe with Kafka, and as a
> > >> first
> > >> > step we run some tests to evaluate producer performance. You can
> find
> > our
> > >> > preliminary results here:
> > >> >
> > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
> > >> We
> > >> > hope this will be useful for some folks, and If anyone has comments
> or
> > >> > suggestions about what to do differently to obtain better results
> your
> > >> > feedback will be very welcome.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Piotr
> > >> >
> > >>
> > >
> > >
> > >
> >
>

Re: Analysis of producer performance

Posted by Jun Rao <ju...@gmail.com>.
Piotr,

Actually, could you clarify what "catastrophic consequences" did you see on
the broker side? Do clients timeout due to longer serving time or something
else?

Going forward, we plan to add per client quotas (KAFKA-656) to prevent the
brokers from being overwhelmed by a runaway client.

Thanks,

Jun


On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hi,
>
> Is there anything one can do to "defend" from:
>
> "Trying to push more data than the brokers can handle for any sustained
> period of time has catastrophic consequences, regardless of what timeout
> settings are used. In our use case this means that we need to either ensure
> we have spare capacity for spikes, or use something on top of Kafka to
> absorb spikes."
>
> ?
> Thanks,
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>
>
>
>
>
> >________________________________
> > From: Piotr Kozikowski <pi...@liveramp.com>
> >To: users@kafka.apache.org
> >Sent: Tuesday, April 9, 2013 1:23 PM
> >Subject: Re: Analysis of producer performance
> >
> >Jun,
> >
> >Thank you for your comments. I'll reply point by point for clarity.
> >
> >1. We were aware of the migration tool but since we haven't used Kafka for
> >production yet we just started using the 0.8 version directly.
> >
> >2. I hadn't seen those particular slides, very interesting. I'm not sure
> >we're testing the same thing though. In our case we vary the number of
> >physical machines, but each one has 10 threads accessing a pool of Kafka
> >producer objects and in theory a single machine is enough to saturate the
> >brokers (which our test mostly confirms). Also, assuming that the slides
> >are based on the built-in producer performance tool, I know that we
> started
> >getting very different numbers once we switched to use "real" (actual
> >production log) messages. Compression may also be a factor in case it
> >wasn't configured the same way in those tests.
> >
> >3. In the latency section, there are two tests, one for average and
> another
> >for maximum latency. Each one has two graphs presenting the exact same
> data
> >but at different levels of zoom. The first one is to observe small
> >variations of latency when target throughput <= actual throughput. The
> >second is to observe the overall shape of the graph once latency starts
> >growing when target throughput > actual throughput. I hope that makes
> sense.
> >
> >4. That sounds great, looking forward to it.
> >
> >Piotr
> >
> >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> >> Piotr,
> >>
> >> Thanks for sharing this. Very interesting and useful study. A few
> comments:
> >>
> >> 1. For existing 0.7 users, we have a migration tool that mirrors data
> from
> >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
> >> upgrading consumers first, followed by producers.
> >>
> >> 2. Have you looked at the Kafka ApacheCon slides (
> >> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)?
> Towards
> >> the end, there are some performance numbers too. The figure for
> throughput
> >> vs #producer is different from what you have. Not sure if this is
> because
> >> that you have turned on compression.
> >>
> >> 3. Not sure that I understand the difference btw the first 2 graphs in
> the
> >> latency section. What's different btw the 2 tests?
> >>
> >> 4. Post 0.8, we plan to improve the producer side throughput by
> >> implementing non-blocking socket on the client side.
> >>
> >> Jun
> >>
> >>
> >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > At LiveRamp we are considering replacing Scribe with Kafka, and as a
> >> first
> >> > step we run some tests to evaluate producer performance. You can find
> our
> >> > preliminary results here:
> >> >
> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
> >> We
> >> > hope this will be useful for some folks, and If anyone has comments or
> >> > suggestions about what to do differently to obtain better results your
> >> > feedback will be very welcome.
> >> >
> >> > Thanks,
> >> >
> >> > Piotr
> >> >
> >>
> >
> >
> >
>

Re: Analysis of producer performance

Posted by Yiu Wing TSANG <yw...@gmail.com>.
Piotr,

Thanks for your posts and after your comments about the need of "spooling",
I just find out this link:

http://grokbase.com/t/kafka/dev/133939nbvg/jira-commented-kafka-156-messages-should-not-be-dropped-when-brokers-are-unavailable

You are right that we need to do the spooling system by ourself until the
above issue is fixed.

I wonder if everybody is doing this spooling system by themselves?

As I just pushed kafka to our production system for "non-critical"
purposes, but eventually I want to make sure kafka will not lose message,
any idea to share to do this spooling system?

Wing


On Thu, Apr 11, 2013 at 4:10 AM, Piotr Kozikowski <pi...@liveramp.com>wrote:

> Otis,
>
> That's actually a question we are trying to answer. In our current
> production system, Scribe does spooling to local disk, so each producer
> node becomes a local broker until the actual brokers are able to receive
> all messages again. It looks like unless a similar feature is added to
> Kafka we will have to come up with our own spooling system.
>
> -Piotr
>
> On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> otis_gospodnetic@yahoo.com> wrote:
>
> > Hi,
> >
> > Is there anything one can do to "defend" from:
> >
> > "Trying to push more data than the brokers can handle for any sustained
> > period of time has catastrophic consequences, regardless of what timeout
> > settings are used. In our use case this means that we need to either
> ensure
> > we have spare capacity for spikes, or use something on top of Kafka to
> > absorb spikes."
> >
> > ?
> > Thanks,
> > Otis
> > ----
> > Performance Monitoring for Solr / ElasticSearch / HBase -
> > http://sematext.com/spm
> >
> >
> >
> >
> >
> > >________________________________
> > > From: Piotr Kozikowski <pi...@liveramp.com>
> > >To: users@kafka.apache.org
> > >Sent: Tuesday, April 9, 2013 1:23 PM
> > >Subject: Re: Analysis of producer performance
> > >
> > >Jun,
> > >
> > >Thank you for your comments. I'll reply point by point for clarity.
> > >
> > >1. We were aware of the migration tool but since we haven't used Kafka
> for
> > >production yet we just started using the 0.8 version directly.
> > >
> > >2. I hadn't seen those particular slides, very interesting. I'm not sure
> > >we're testing the same thing though. In our case we vary the number of
> > >physical machines, but each one has 10 threads accessing a pool of Kafka
> > >producer objects and in theory a single machine is enough to saturate
> the
> > >brokers (which our test mostly confirms). Also, assuming that the slides
> > >are based on the built-in producer performance tool, I know that we
> > started
> > >getting very different numbers once we switched to use "real" (actual
> > >production log) messages. Compression may also be a factor in case it
> > >wasn't configured the same way in those tests.
> > >
> > >3. In the latency section, there are two tests, one for average and
> > another
> > >for maximum latency. Each one has two graphs presenting the exact same
> > data
> > >but at different levels of zoom. The first one is to observe small
> > >variations of latency when target throughput <= actual throughput. The
> > >second is to observe the overall shape of the graph once latency starts
> > >growing when target throughput > actual throughput. I hope that makes
> > sense.
> > >
> > >4. That sounds great, looking forward to it.
> > >
> > >Piotr
> > >
> > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > >> Piotr,
> > >>
> > >> Thanks for sharing this. Very interesting and useful study. A few
> > comments:
> > >>
> > >> 1. For existing 0.7 users, we have a migration tool that mirrors data
> > from
> > >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
> > >> upgrading consumers first, followed by producers.
> > >>
> > >> 2. Have you looked at the Kafka ApacheCon slides (
> > >> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)?
> > Towards
> > >> the end, there are some performance numbers too. The figure for
> > throughput
> > >> vs #producer is different from what you have. Not sure if this is
> > because
> > >> that you have turned on compression.
> > >>
> > >> 3. Not sure that I understand the difference btw the first 2 graphs in
> > the
> > >> latency section. What's different btw the 2 tests?
> > >>
> > >> 4. Post 0.8, we plan to improve the producer side throughput by
> > >> implementing non-blocking socket on the client side.
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com>
> > >> wrote:
> > >>
> > >> > Hi,
> > >> >
> > >> > At LiveRamp we are considering replacing Scribe with Kafka, and as a
> > >> first
> > >> > step we run some tests to evaluate producer performance. You can
> find
> > our
> > >> > preliminary results here:
> > >> >
> > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
> > >> We
> > >> > hope this will be useful for some folks, and If anyone has comments
> or
> > >> > suggestions about what to do differently to obtain better results
> your
> > >> > feedback will be very welcome.
> > >> >
> > >> > Thanks,
> > >> >
> > >> > Piotr
> > >> >
> > >>
> > >
> > >
> > >
> >
>

Re: Analysis of producer performance

Posted by Piotr Kozikowski <pi...@liveramp.com>.
Otis,

That's actually a question we are trying to answer. In our current
production system, Scribe does spooling to local disk, so each producer
node becomes a local broker until the actual brokers are able to receive
all messages again. It looks like unless a similar feature is added to
Kafka we will have to come up with our own spooling system.

-Piotr

On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
otis_gospodnetic@yahoo.com> wrote:

> Hi,
>
> Is there anything one can do to "defend" from:
>
> "Trying to push more data than the brokers can handle for any sustained
> period of time has catastrophic consequences, regardless of what timeout
> settings are used. In our use case this means that we need to either ensure
> we have spare capacity for spikes, or use something on top of Kafka to
> absorb spikes."
>
> ?
> Thanks,
> Otis
> ----
> Performance Monitoring for Solr / ElasticSearch / HBase -
> http://sematext.com/spm
>
>
>
>
>
> >________________________________
> > From: Piotr Kozikowski <pi...@liveramp.com>
> >To: users@kafka.apache.org
> >Sent: Tuesday, April 9, 2013 1:23 PM
> >Subject: Re: Analysis of producer performance
> >
> >Jun,
> >
> >Thank you for your comments. I'll reply point by point for clarity.
> >
> >1. We were aware of the migration tool but since we haven't used Kafka for
> >production yet we just started using the 0.8 version directly.
> >
> >2. I hadn't seen those particular slides, very interesting. I'm not sure
> >we're testing the same thing though. In our case we vary the number of
> >physical machines, but each one has 10 threads accessing a pool of Kafka
> >producer objects and in theory a single machine is enough to saturate the
> >brokers (which our test mostly confirms). Also, assuming that the slides
> >are based on the built-in producer performance tool, I know that we
> started
> >getting very different numbers once we switched to use "real" (actual
> >production log) messages. Compression may also be a factor in case it
> >wasn't configured the same way in those tests.
> >
> >3. In the latency section, there are two tests, one for average and
> another
> >for maximum latency. Each one has two graphs presenting the exact same
> data
> >but at different levels of zoom. The first one is to observe small
> >variations of latency when target throughput <= actual throughput. The
> >second is to observe the overall shape of the graph once latency starts
> >growing when target throughput > actual throughput. I hope that makes
> sense.
> >
> >4. That sounds great, looking forward to it.
> >
> >Piotr
> >
> >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> >> Piotr,
> >>
> >> Thanks for sharing this. Very interesting and useful study. A few
> comments:
> >>
> >> 1. For existing 0.7 users, we have a migration tool that mirrors data
> from
> >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
> >> upgrading consumers first, followed by producers.
> >>
> >> 2. Have you looked at the Kafka ApacheCon slides (
> >> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)?
> Towards
> >> the end, there are some performance numbers too. The figure for
> throughput
> >> vs #producer is different from what you have. Not sure if this is
> because
> >> that you have turned on compression.
> >>
> >> 3. Not sure that I understand the difference btw the first 2 graphs in
> the
> >> latency section. What's different btw the 2 tests?
> >>
> >> 4. Post 0.8, we plan to improve the producer side throughput by
> >> implementing non-blocking socket on the client side.
> >>
> >> Jun
> >>
> >>
> >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com>
> >> wrote:
> >>
> >> > Hi,
> >> >
> >> > At LiveRamp we are considering replacing Scribe with Kafka, and as a
> >> first
> >> > step we run some tests to evaluate producer performance. You can find
> our
> >> > preliminary results here:
> >> >
> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
> >> We
> >> > hope this will be useful for some folks, and If anyone has comments or
> >> > suggestions about what to do differently to obtain better results your
> >> > feedback will be very welcome.
> >> >
> >> > Thanks,
> >> >
> >> > Piotr
> >> >
> >>
> >
> >
> >
>

Re: Analysis of producer performance

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi,

Is there anything one can do to "defend" from:

"Trying to push more data than the brokers can handle for any sustained period of time has catastrophic consequences, regardless of what timeout settings are used. In our use case this means that we need to either ensure we have spare capacity for spikes, or use something on top of Kafka to absorb spikes."

?
Thanks,
Otis
----
Performance Monitoring for Solr / ElasticSearch / HBase - http://sematext.com/spmĀ 





>________________________________
> From: Piotr Kozikowski <pi...@liveramp.com>
>To: users@kafka.apache.org 
>Sent: Tuesday, April 9, 2013 1:23 PM
>Subject: Re: Analysis of producer performance
> 
>Jun,
>
>Thank you for your comments. I'll reply point by point for clarity.
>
>1. We were aware of the migration tool but since we haven't used Kafka for
>production yet we just started using the 0.8 version directly.
>
>2. I hadn't seen those particular slides, very interesting. I'm not sure
>we're testing the same thing though. In our case we vary the number of
>physical machines, but each one has 10 threads accessing a pool of Kafka
>producer objects and in theory a single machine is enough to saturate the
>brokers (which our test mostly confirms). Also, assuming that the slides
>are based on the built-in producer performance tool, I know that we started
>getting very different numbers once we switched to use "real" (actual
>production log) messages. Compression may also be a factor in case it
>wasn't configured the same way in those tests.
>
>3. In the latency section, there are two tests, one for average and another
>for maximum latency. Each one has two graphs presenting the exact same data
>but at different levels of zoom. The first one is to observe small
>variations of latency when target throughput <= actual throughput. The
>second is to observe the overall shape of the graph once latency starts
>growing when target throughput > actual throughput. I hope that makes sense.
>
>4. That sounds great, looking forward to it.
>
>Piotr
>
>On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
>
>> Piotr,
>>
>> Thanks for sharing this. Very interesting and useful study. A few comments:
>>
>> 1. For existing 0.7 users, we have a migration tool that mirrors data from
>> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
>> upgrading consumers first, followed by producers.
>>
>> 2. Have you looked at the Kafka ApacheCon slides (
>> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)? Towards
>> the end, there are some performance numbers too. The figure for throughput
>> vs #producer is different from what you have. Not sure if this is because
>> that you have turned on compression.
>>
>> 3. Not sure that I understand the difference btw the first 2 graphs in the
>> latency section. What's different btw the 2 tests?
>>
>> 4. Post 0.8, we plan to improve the producer side throughput by
>> implementing non-blocking socket on the client side.
>>
>> Jun
>>
>>
>> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com>
>> wrote:
>>
>> > Hi,
>> >
>> > At LiveRamp we are considering replacing Scribe with Kafka, and as a
>> first
>> > step we run some tests to evaluate producer performance. You can find our
>> > preliminary results here:
>> > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
>> We
>> > hope this will be useful for some folks, and If anyone has comments or
>> > suggestions about what to do differently to obtain better results your
>> > feedback will be very welcome.
>> >
>> > Thanks,
>> >
>> > Piotr
>> >
>>
>
>
>

Re: Analysis of producer performance

Posted by Piotr Kozikowski <pi...@liveramp.com>.
Jun,

Thank you for your comments. I'll reply point by point for clarity.

1. We were aware of the migration tool but since we haven't used Kafka for
production yet we just started using the 0.8 version directly.

2. I hadn't seen those particular slides, very interesting. I'm not sure
we're testing the same thing though. In our case we vary the number of
physical machines, but each one has 10 threads accessing a pool of Kafka
producer objects and in theory a single machine is enough to saturate the
brokers (which our test mostly confirms). Also, assuming that the slides
are based on the built-in producer performance tool, I know that we started
getting very different numbers once we switched to use "real" (actual
production log) messages. Compression may also be a factor in case it
wasn't configured the same way in those tests.

3. In the latency section, there are two tests, one for average and another
for maximum latency. Each one has two graphs presenting the exact same data
but at different levels of zoom. The first one is to observe small
variations of latency when target throughput <= actual throughput. The
second is to observe the overall shape of the graph once latency starts
growing when target throughput > actual throughput. I hope that makes sense.

4. That sounds great, looking forward to it.

Piotr

On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:

> Piotr,
>
> Thanks for sharing this. Very interesting and useful study. A few comments:
>
> 1. For existing 0.7 users, we have a migration tool that mirrors data from
> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
> upgrading consumers first, followed by producers.
>
> 2. Have you looked at the Kafka ApacheCon slides (
> http://www.slideshare.net/junrao/kafka-replication-apachecon2013)? Towards
> the end, there are some performance numbers too. The figure for throughput
> vs #producer is different from what you have. Not sure if this is because
> that you have turned on compression.
>
> 3. Not sure that I understand the difference btw the first 2 graphs in the
> latency section. What's different btw the 2 tests?
>
> 4. Post 0.8, we plan to improve the producer side throughput by
> implementing non-blocking socket on the client side.
>
> Jun
>
>
> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com>
> wrote:
>
> > Hi,
> >
> > At LiveRamp we are considering replacing Scribe with Kafka, and as a
> first
> > step we run some tests to evaluate producer performance. You can find our
> > preliminary results here:
> > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/.
> We
> > hope this will be useful for some folks, and If anyone has comments or
> > suggestions about what to do differently to obtain better results your
> > feedback will be very welcome.
> >
> > Thanks,
> >
> > Piotr
> >
>

Re: Analysis of producer performance

Posted by Jun Rao <ju...@gmail.com>.
Piotr,

Thanks for sharing this. Very interesting and useful study. A few comments:

1. For existing 0.7 users, we have a migration tool that mirrors data from
an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8 by
upgrading consumers first, followed by producers.

2. Have you looked at the Kafka ApacheCon slides (
http://www.slideshare.net/junrao/kafka-replication-apachecon2013)? Towards
the end, there are some performance numbers too. The figure for throughput
vs #producer is different from what you have. Not sure if this is because
that you have turned on compression.

3. Not sure that I understand the difference btw the first 2 graphs in the
latency section. What's different btw the 2 tests?

4. Post 0.8, we plan to improve the producer side throughput by
implementing non-blocking socket on the client side.

Jun


On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <pi...@liveramp.com> wrote:

> Hi,
>
> At LiveRamp we are considering replacing Scribe with Kafka, and as a first
> step we run some tests to evaluate producer performance. You can find our
> preliminary results here:
> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/. We
> hope this will be useful for some folks, and If anyone has comments or
> suggestions about what to do differently to obtain better results your
> feedback will be very welcome.
>
> Thanks,
>
> Piotr
>