You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Philip O'Toole <ph...@loggly.com> on 2013/04/12 17:21:50 UTC

Re: Analysis of producer performance -- and Producer-Kafka reliability

This is just my opinion of course (who else's could it be? :-)) but I think
from an engineering point of view, one must spend one's time making the
Producer-Kafka connection solid, if it is mission-critical.

Kafka is all about getting messages to disk, and assuming your disks are
solid (and 0.8 has replication) those messages are safe. To then try to
build a system to cope with the Kafka brokers being unavailable seems like
you're setting yourself for infinite regress. And to write code in the
Producer to spool to disk seems even more pointless. If you're that
worried, why not run a dedicated Kafka broker on the same node as the
Producer, and connect over localhost? To turn around and write code to
spool to disk, because the primary system that *spools to disk* is down
seems to be missing the point.

That said, even by going over local-host, I guess the network connection
could go down. In that case, Producers should buffer in RAM, and start
sending some major alerts to the Operations team. But this should almost
*never happen*. If it is happening regularly *something is fundamentally
wrong with your system design*. Those Producers should also refuse any more
incoming traffic and await intervention. Even bringing up "netcat -l" and
letting it suck in the data and write it to disk would work then.
Alternatives include having Producers connect to a load-balancer with
multiple Kafka brokers behind it, which helps you deal with any one Kafka
broker failing. Or just have your Producers connect directly to multiple
Kafka brokers, and switch over as needed if any one broker goes down.

I don't know if the standard Kafka producer that ships with Kafka supports
buffering in RAM in an emergency. We wrote our own that does, with a focus
on speed and simplicity, but I expect it will very rarely, if ever, buffer
in RAM.

Building and using semi-reliable system after semi-reliable system, and
chaining them all together, hoping to be more tolerant of failure is not
necessarily a good approach. Instead, identifying that one system that is
critical, and ensuring that it remains up (redundant installations,
redundant disks, redundant network connections etc) is a better approach
IMHO.

Philip


On Fri, Apr 12, 2013 at 7:54 AM, Jun Rao <ju...@gmail.com> wrote:

> Another way to handle this is to provision enough client and broker servers
> so that the peak load can be handled without spooling.
>
> Thanks,
>
> Jun
>
>
> On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <piotr@liveramp.com
> >wrote:
>
> > Jun,
> >
> > When talking about "catastrophic consequences" I was actually only
> > referring to the producer side. in our use case (logging requests from
> > webapp servers), a spike in traffic would force us to either tolerate a
> > dramatic increase in the response time, or drop messages, both of which
> are
> > really undesirable. Hence the need to absorb spikes with some system on
> top
> > of Kafka, unless the spooling feature mentioned by Wing (
> > https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This is
> > assuming there are a lot more producer machines than broker nodes, so
> each
> > producer would absorb a small part of the extra load from the spike.
> >
> > Piotr
> >
> > On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Piotr,
> > >
> > > Actually, could you clarify what "catastrophic consequences" did you
> see
> > on
> > > the broker side? Do clients timeout due to longer serving time or
> > something
> > > else?
> > >
> > > Going forward, we plan to add per client quotas (KAFKA-656) to prevent
> > the
> > > brokers from being overwhelmed by a runaway client.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > > otis_gospodnetic@yahoo.com> wrote:
> > >
> > > > Hi,
> > > >
> > > > Is there anything one can do to "defend" from:
> > > >
> > > > "Trying to push more data than the brokers can handle for any
> sustained
> > > > period of time has catastrophic consequences, regardless of what
> > timeout
> > > > settings are used. In our use case this means that we need to either
> > > ensure
> > > > we have spare capacity for spikes, or use something on top of Kafka
> to
> > > > absorb spikes."
> > > >
> > > > ?
> > > > Thanks,
> > > > Otis
> > > > ----
> > > > Performance Monitoring for Solr / ElasticSearch / HBase -
> > > > http://sematext.com/spm
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > >________________________________
> > > > > From: Piotr Kozikowski <pi...@liveramp.com>
> > > > >To: users@kafka.apache.org
> > > > >Sent: Tuesday, April 9, 2013 1:23 PM
> > > > >Subject: Re: Analysis of producer performance
> > > > >
> > > > >Jun,
> > > > >
> > > > >Thank you for your comments. I'll reply point by point for clarity.
> > > > >
> > > > >1. We were aware of the migration tool but since we haven't used
> Kafka
> > > for
> > > > >production yet we just started using the 0.8 version directly.
> > > > >
> > > > >2. I hadn't seen those particular slides, very interesting. I'm not
> > sure
> > > > >we're testing the same thing though. In our case we vary the number
> of
> > > > >physical machines, but each one has 10 threads accessing a pool of
> > Kafka
> > > > >producer objects and in theory a single machine is enough to
> saturate
> > > the
> > > > >brokers (which our test mostly confirms). Also, assuming that the
> > slides
> > > > >are based on the built-in producer performance tool, I know that we
> > > > started
> > > > >getting very different numbers once we switched to use "real"
> (actual
> > > > >production log) messages. Compression may also be a factor in case
> it
> > > > >wasn't configured the same way in those tests.
> > > > >
> > > > >3. In the latency section, there are two tests, one for average and
> > > > another
> > > > >for maximum latency. Each one has two graphs presenting the exact
> same
> > > > data
> > > > >but at different levels of zoom. The first one is to observe small
> > > > >variations of latency when target throughput <= actual throughput.
> The
> > > > >second is to observe the overall shape of the graph once latency
> > starts
> > > > >growing when target throughput > actual throughput. I hope that
> makes
> > > > sense.
> > > > >
> > > > >4. That sounds great, looking forward to it.
> > > > >
> > > > >Piotr
> > > > >
> > > > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > > > >
> > > > >> Piotr,
> > > > >>
> > > > >> Thanks for sharing this. Very interesting and useful study. A few
> > > > comments:
> > > > >>
> > > > >> 1. For existing 0.7 users, we have a migration tool that mirrors
> > data
> > > > from
> > > > >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to 0.8
> by
> > > > >> upgrading consumers first, followed by producers.
> > > > >>
> > > > >> 2. Have you looked at the Kafka ApacheCon slides (
> > > > >> http://www.slideshare.net/junrao/kafka-replication-apachecon2013
> )?
> > > > Towards
> > > > >> the end, there are some performance numbers too. The figure for
> > > > throughput
> > > > >> vs #producer is different from what you have. Not sure if this is
> > > > because
> > > > >> that you have turned on compression.
> > > > >>
> > > > >> 3. Not sure that I understand the difference btw the first 2
> graphs
> > in
> > > > the
> > > > >> latency section. What's different btw the 2 tests?
> > > > >>
> > > > >> 4. Post 0.8, we plan to improve the producer side throughput by
> > > > >> implementing non-blocking socket on the client side.
> > > > >>
> > > > >> Jun
> > > > >>
> > > > >>
> > > > >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <
> > piotr@liveramp.com>
> > > > >> wrote:
> > > > >>
> > > > >> > Hi,
> > > > >> >
> > > > >> > At LiveRamp we are considering replacing Scribe with Kafka, and
> > as a
> > > > >> first
> > > > >> > step we run some tests to evaluate producer performance. You can
> > > find
> > > > our
> > > > >> > preliminary results here:
> > > > >> >
> > > >
> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
> > .
> > > > >> We
> > > > >> > hope this will be useful for some folks, and If anyone has
> > comments
> > > or
> > > > >> > suggestions about what to do differently to obtain better
> results
> > > your
> > > > >> > feedback will be very welcome.
> > > > >> >
> > > > >> > Thanks,
> > > > >> >
> > > > >> > Piotr
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Analysis of producer performance -- and Producer-Kafka reliability

Posted by David Arthur <mu...@gmail.com>.

It seems there are two underlying things here: storing messages to 
stable storage, and making messages available to consumers (i.e., 
storing messages on the broker). One can be achieved simply and reliably 
by spooling to local disk, the other requires network and is inherently 
less reliable. Buffering messages in memory does not help with the first 
one since they are in volatile storage, but it does help with the second 
one in the event of a network partition.

I could imagine a producer running in "ultra-reliability" mode where it 
uses a local log file as a buffer where all messages written to and read 
from. One issue with this, though, is that now you have to worry about 
the performance and capacity of the disks on your producers (which can 
be numerous compared to brokers). As for performance, the data being 
written by producers is already in active memory, so writing it to a 
disk then doing a zero-copy transfer to the network should be pretty 
fast (maybe?).

Or, Kafka can remain more "protocol-ish" and less "application-y" and 
just give you errors when brokers are unavailable and let your 
application deal with it. This is basically what TCP/HTTP/etc do. HTTP 
servers don't say "hold on, there's a problem, let me try that request 
again in a second.."

Interesting discussion, btw :)
-David

On 4/15/13 2:18 PM, Piotr Kozikowski wrote:
> Philip,
>
> We would not use spooling to local disk on the producer to deal with
> problems with the connection to the brokers, but rather to absorb temporary
> spikes in traffic that would overwhelm the brokers. This is assuming that
> 1) those spikes are relatively short, but when they come they require much
> higher throughput than normal (otherwise we'd just have a capacity problem
> and would need more brokers), and 2) the spikes are long enough for just a
> RAM buffer to be dangerous. If the brokers did go down, spooling to disk
> would give us more time to react, but that's not the primary reason for
> wanting the feature.
>
> -Piotr
>
> On Fri, Apr 12, 2013 at 8:21 AM, Philip O'Toole <ph...@loggly.com> wrote:
>
>> This is just my opinion of course (who else's could it be? :-)) but I think
>> from an engineering point of view, one must spend one's time making the
>> Producer-Kafka connection solid, if it is mission-critical.
>>
>> Kafka is all about getting messages to disk, and assuming your disks are
>> solid (and 0.8 has replication) those messages are safe. To then try to
>> build a system to cope with the Kafka brokers being unavailable seems like
>> you're setting yourself for infinite regress. And to write code in the
>> Producer to spool to disk seems even more pointless. If you're that
>> worried, why not run a dedicated Kafka broker on the same node as the
>> Producer, and connect over localhost? To turn around and write code to
>> spool to disk, because the primary system that *spools to disk* is down
>> seems to be missing the point.
>>
>> That said, even by going over local-host, I guess the network connection
>> could go down. In that case, Producers should buffer in RAM, and start
>> sending some major alerts to the Operations team. But this should almost
>> *never happen*. If it is happening regularly *something is fundamentally
>> wrong with your system design*. Those Producers should also refuse any more
>> incoming traffic and await intervention. Even bringing up "netcat -l" and
>> letting it suck in the data and write it to disk would work then.
>> Alternatives include having Producers connect to a load-balancer with
>> multiple Kafka brokers behind it, which helps you deal with any one Kafka
>> broker failing. Or just have your Producers connect directly to multiple
>> Kafka brokers, and switch over as needed if any one broker goes down.
>>
>> I don't know if the standard Kafka producer that ships with Kafka supports
>> buffering in RAM in an emergency. We wrote our own that does, with a focus
>> on speed and simplicity, but I expect it will very rarely, if ever, buffer
>> in RAM.
>>
>> Building and using semi-reliable system after semi-reliable system, and
>> chaining them all together, hoping to be more tolerant of failure is not
>> necessarily a good approach. Instead, identifying that one system that is
>> critical, and ensuring that it remains up (redundant installations,
>> redundant disks, redundant network connections etc) is a better approach
>> IMHO.
>>
>> Philip
>>
>>
>> On Fri, Apr 12, 2013 at 7:54 AM, Jun Rao <ju...@gmail.com> wrote:
>>
>>> Another way to handle this is to provision enough client and broker
>> servers
>>> so that the peak load can be handled without spooling.
>>>
>>> Thanks,
>>>
>>> Jun
>>>
>>>
>>> On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <piotr@liveramp.com
>>>> wrote:
>>>> Jun,
>>>>
>>>> When talking about "catastrophic consequences" I was actually only
>>>> referring to the producer side. in our use case (logging requests from
>>>> webapp servers), a spike in traffic would force us to either tolerate a
>>>> dramatic increase in the response time, or drop messages, both of which
>>> are
>>>> really undesirable. Hence the need to absorb spikes with some system on
>>> top
>>>> of Kafka, unless the spooling feature mentioned by Wing (
>>>> https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This
>> is
>>>> assuming there are a lot more producer machines than broker nodes, so
>>> each
>>>> producer would absorb a small part of the extra load from the spike.
>>>>
>>>> Piotr
>>>>
>>>> On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <ju...@gmail.com> wrote:
>>>>
>>>>> Piotr,
>>>>>
>>>>> Actually, could you clarify what "catastrophic consequences" did you
>>> see
>>>> on
>>>>> the broker side? Do clients timeout due to longer serving time or
>>>> something
>>>>> else?
>>>>>
>>>>> Going forward, we plan to add per client quotas (KAFKA-656) to
>> prevent
>>>> the
>>>>> brokers from being overwhelmed by a runaway client.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jun
>>>>>
>>>>>
>>>>> On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
>>>>> otis_gospodnetic@yahoo.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is there anything one can do to "defend" from:
>>>>>>
>>>>>> "Trying to push more data than the brokers can handle for any
>>> sustained
>>>>>> period of time has catastrophic consequences, regardless of what
>>>> timeout
>>>>>> settings are used. In our use case this means that we need to
>> either
>>>>> ensure
>>>>>> we have spare capacity for spikes, or use something on top of Kafka
>>> to
>>>>>> absorb spikes."
>>>>>>
>>>>>> ?
>>>>>> Thanks,
>>>>>> Otis
>>>>>> ----
>>>>>> Performance Monitoring for Solr / ElasticSearch / HBase -
>>>>>> http://sematext.com/spm
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> ________________________________
>>>>>>> From: Piotr Kozikowski <pi...@liveramp.com>
>>>>>>> To: users@kafka.apache.org
>>>>>>> Sent: Tuesday, April 9, 2013 1:23 PM
>>>>>>> Subject: Re: Analysis of producer performance
>>>>>>>
>>>>>>> Jun,
>>>>>>>
>>>>>>> Thank you for your comments. I'll reply point by point for
>> clarity.
>>>>>>> 1. We were aware of the migration tool but since we haven't used
>>> Kafka
>>>>> for
>>>>>>> production yet we just started using the 0.8 version directly.
>>>>>>>
>>>>>>> 2. I hadn't seen those particular slides, very interesting. I'm
>> not
>>>> sure
>>>>>>> we're testing the same thing though. In our case we vary the
>> number
>>> of
>>>>>>> physical machines, but each one has 10 threads accessing a pool of
>>>> Kafka
>>>>>>> producer objects and in theory a single machine is enough to
>>> saturate
>>>>> the
>>>>>>> brokers (which our test mostly confirms). Also, assuming that the
>>>> slides
>>>>>>> are based on the built-in producer performance tool, I know that
>> we
>>>>>> started
>>>>>>> getting very different numbers once we switched to use "real"
>>> (actual
>>>>>>> production log) messages. Compression may also be a factor in case
>>> it
>>>>>>> wasn't configured the same way in those tests.
>>>>>>>
>>>>>>> 3. In the latency section, there are two tests, one for average
>> and
>>>>>> another
>>>>>>> for maximum latency. Each one has two graphs presenting the exact
>>> same
>>>>>> data
>>>>>>> but at different levels of zoom. The first one is to observe small
>>>>>>> variations of latency when target throughput <= actual throughput.
>>> The
>>>>>>> second is to observe the overall shape of the graph once latency
>>>> starts
>>>>>>> growing when target throughput > actual throughput. I hope that
>>> makes
>>>>>> sense.
>>>>>>> 4. That sounds great, looking forward to it.
>>>>>>>
>>>>>>> Piotr
>>>>>>>
>>>>>>> On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Piotr,
>>>>>>>>
>>>>>>>> Thanks for sharing this. Very interesting and useful study. A
>> few
>>>>>> comments:
>>>>>>>> 1. For existing 0.7 users, we have a migration tool that mirrors
>>>> data
>>>>>> from
>>>>>>>> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to
>> 0.8
>>> by
>>>>>>>> upgrading consumers first, followed by producers.
>>>>>>>>
>>>>>>>> 2. Have you looked at the Kafka ApacheCon slides (
>>>>>>>>
>> http://www.slideshare.net/junrao/kafka-replication-apachecon2013
>>> )?
>>>>>> Towards
>>>>>>>> the end, there are some performance numbers too. The figure for
>>>>>> throughput
>>>>>>>> vs #producer is different from what you have. Not sure if this
>> is
>>>>>> because
>>>>>>>> that you have turned on compression.
>>>>>>>>
>>>>>>>> 3. Not sure that I understand the difference btw the first 2
>>> graphs
>>>> in
>>>>>> the
>>>>>>>> latency section. What's different btw the 2 tests?
>>>>>>>>
>>>>>>>> 4. Post 0.8, we plan to improve the producer side throughput by
>>>>>>>> implementing non-blocking socket on the client side.
>>>>>>>>
>>>>>>>> Jun
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <
>>>> piotr@liveramp.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> At LiveRamp we are considering replacing Scribe with Kafka,
>> and
>>>> as a
>>>>>>>> first
>>>>>>>>> step we run some tests to evaluate producer performance. You
>> can
>>>>> find
>>>>>> our
>>>>>>>>> preliminary results here:
>>>>>>>>>
>>> https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
>>>> .
>>>>>>>> We
>>>>>>>>> hope this will be useful for some folks, and If anyone has
>>>> comments
>>>>> or
>>>>>>>>> suggestions about what to do differently to obtain better
>>> results
>>>>> your
>>>>>>>>> feedback will be very welcome.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Piotr
>>>>>>>>>
>>>>>>>
>>>>>>>

Re: Analysis of producer performance -- and Producer-Kafka reliability

Posted by Piotr Kozikowski <pi...@liveramp.com>.

Philip,

We would not use spooling to local disk on the producer to deal with
problems with the connection to the brokers, but rather to absorb temporary
spikes in traffic that would overwhelm the brokers. This is assuming that
1) those spikes are relatively short, but when they come they require much
higher throughput than normal (otherwise we'd just have a capacity problem
and would need more brokers), and 2) the spikes are long enough for just a
RAM buffer to be dangerous. If the brokers did go down, spooling to disk
would give us more time to react, but that's not the primary reason for
wanting the feature.

-Piotr

On Fri, Apr 12, 2013 at 8:21 AM, Philip O'Toole <ph...@loggly.com> wrote:

> This is just my opinion of course (who else's could it be? :-)) but I think
> from an engineering point of view, one must spend one's time making the
> Producer-Kafka connection solid, if it is mission-critical.
>
> Kafka is all about getting messages to disk, and assuming your disks are
> solid (and 0.8 has replication) those messages are safe. To then try to
> build a system to cope with the Kafka brokers being unavailable seems like
> you're setting yourself for infinite regress. And to write code in the
> Producer to spool to disk seems even more pointless. If you're that
> worried, why not run a dedicated Kafka broker on the same node as the
> Producer, and connect over localhost? To turn around and write code to
> spool to disk, because the primary system that *spools to disk* is down
> seems to be missing the point.
>
> That said, even by going over local-host, I guess the network connection
> could go down. In that case, Producers should buffer in RAM, and start
> sending some major alerts to the Operations team. But this should almost
> *never happen*. If it is happening regularly *something is fundamentally
> wrong with your system design*. Those Producers should also refuse any more
> incoming traffic and await intervention. Even bringing up "netcat -l" and
> letting it suck in the data and write it to disk would work then.
> Alternatives include having Producers connect to a load-balancer with
> multiple Kafka brokers behind it, which helps you deal with any one Kafka
> broker failing. Or just have your Producers connect directly to multiple
> Kafka brokers, and switch over as needed if any one broker goes down.
>
> I don't know if the standard Kafka producer that ships with Kafka supports
> buffering in RAM in an emergency. We wrote our own that does, with a focus
> on speed and simplicity, but I expect it will very rarely, if ever, buffer
> in RAM.
>
> Building and using semi-reliable system after semi-reliable system, and
> chaining them all together, hoping to be more tolerant of failure is not
> necessarily a good approach. Instead, identifying that one system that is
> critical, and ensuring that it remains up (redundant installations,
> redundant disks, redundant network connections etc) is a better approach
> IMHO.
>
> Philip
>
>
> On Fri, Apr 12, 2013 at 7:54 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Another way to handle this is to provision enough client and broker
> servers
> > so that the peak load can be handled without spooling.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <piotr@liveramp.com
> > >wrote:
> >
> > > Jun,
> > >
> > > When talking about "catastrophic consequences" I was actually only
> > > referring to the producer side. in our use case (logging requests from
> > > webapp servers), a spike in traffic would force us to either tolerate a
> > > dramatic increase in the response time, or drop messages, both of which
> > are
> > > really undesirable. Hence the need to absorb spikes with some system on
> > top
> > > of Kafka, unless the spooling feature mentioned by Wing (
> > > https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This
> is
> > > assuming there are a lot more producer machines than broker nodes, so
> > each
> > > producer would absorb a small part of the extra load from the spike.
> > >
> > > Piotr
> > >
> > > On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > Piotr,
> > > >
> > > > Actually, could you clarify what "catastrophic consequences" did you
> > see
> > > on
> > > > the broker side? Do clients timeout due to longer serving time or
> > > something
> > > > else?
> > > >
> > > > Going forward, we plan to add per client quotas (KAFKA-656) to
> prevent
> > > the
> > > > brokers from being overwhelmed by a runaway client.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > > > otis_gospodnetic@yahoo.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is there anything one can do to "defend" from:
> > > > >
> > > > > "Trying to push more data than the brokers can handle for any
> > sustained
> > > > > period of time has catastrophic consequences, regardless of what
> > > timeout
> > > > > settings are used. In our use case this means that we need to
> either
> > > > ensure
> > > > > we have spare capacity for spikes, or use something on top of Kafka
> > to
> > > > > absorb spikes."
> > > > >
> > > > > ?
> > > > > Thanks,
> > > > > Otis
> > > > > ----
> > > > > Performance Monitoring for Solr / ElasticSearch / HBase -
> > > > > http://sematext.com/spm
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >________________________________
> > > > > > From: Piotr Kozikowski <pi...@liveramp.com>
> > > > > >To: users@kafka.apache.org
> > > > > >Sent: Tuesday, April 9, 2013 1:23 PM
> > > > > >Subject: Re: Analysis of producer performance
> > > > > >
> > > > > >Jun,
> > > > > >
> > > > > >Thank you for your comments. I'll reply point by point for
> clarity.
> > > > > >
> > > > > >1. We were aware of the migration tool but since we haven't used
> > Kafka
> > > > for
> > > > > >production yet we just started using the 0.8 version directly.
> > > > > >
> > > > > >2. I hadn't seen those particular slides, very interesting. I'm
> not
> > > sure
> > > > > >we're testing the same thing though. In our case we vary the
> number
> > of
> > > > > >physical machines, but each one has 10 threads accessing a pool of
> > > Kafka
> > > > > >producer objects and in theory a single machine is enough to
> > saturate
> > > > the
> > > > > >brokers (which our test mostly confirms). Also, assuming that the
> > > slides
> > > > > >are based on the built-in producer performance tool, I know that
> we
> > > > > started
> > > > > >getting very different numbers once we switched to use "real"
> > (actual
> > > > > >production log) messages. Compression may also be a factor in case
> > it
> > > > > >wasn't configured the same way in those tests.
> > > > > >
> > > > > >3. In the latency section, there are two tests, one for average
> and
> > > > > another
> > > > > >for maximum latency. Each one has two graphs presenting the exact
> > same
> > > > > data
> > > > > >but at different levels of zoom. The first one is to observe small
> > > > > >variations of latency when target throughput <= actual throughput.
> > The
> > > > > >second is to observe the overall shape of the graph once latency
> > > starts
> > > > > >growing when target throughput > actual throughput. I hope that
> > makes
> > > > > sense.
> > > > > >
> > > > > >4. That sounds great, looking forward to it.
> > > > > >
> > > > > >Piotr
> > > > > >
> > > > > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > > > > >
> > > > > >> Piotr,
> > > > > >>
> > > > > >> Thanks for sharing this. Very interesting and useful study. A
> few
> > > > > comments:
> > > > > >>
> > > > > >> 1. For existing 0.7 users, we have a migration tool that mirrors
> > > data
> > > > > from
> > > > > >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to
> 0.8
> > by
> > > > > >> upgrading consumers first, followed by producers.
> > > > > >>
> > > > > >> 2. Have you looked at the Kafka ApacheCon slides (
> > > > > >>
> http://www.slideshare.net/junrao/kafka-replication-apachecon2013
> > )?
> > > > > Towards
> > > > > >> the end, there are some performance numbers too. The figure for
> > > > > throughput
> > > > > >> vs #producer is different from what you have. Not sure if this
> is
> > > > > because
> > > > > >> that you have turned on compression.
> > > > > >>
> > > > > >> 3. Not sure that I understand the difference btw the first 2
> > graphs
> > > in
> > > > > the
> > > > > >> latency section. What's different btw the 2 tests?
> > > > > >>
> > > > > >> 4. Post 0.8, we plan to improve the producer side throughput by
> > > > > >> implementing non-blocking socket on the client side.
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <
> > > piotr@liveramp.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > At LiveRamp we are considering replacing Scribe with Kafka,
> and
> > > as a
> > > > > >> first
> > > > > >> > step we run some tests to evaluate producer performance. You
> can
> > > > find
> > > > > our
> > > > > >> > preliminary results here:
> > > > > >> >
> > > > >
> > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
> > > .
> > > > > >> We
> > > > > >> > hope this will be useful for some folks, and If anyone has
> > > comments
> > > > or
> > > > > >> > suggestions about what to do differently to obtain better
> > results
> > > > your
> > > > > >> > feedback will be very welcome.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Piotr
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Analysis of producer performance -- and Producer-Kafka reliability

Posted by Philip O'Toole <ph...@loggly.com>.

>>But it shouldn't almost never happen.

Obviously I mean it should almost never happen. Not shouldn't.

Philip

Re: Analysis of producer performance -- and Producer-Kafka reliability

Posted by Philip O'Toole <ph...@loggly.com>.

On Fri, Apr 12, 2013 at 8:27 AM, S Ahmed <sa...@gmail.com> wrote:

> Interesting topic.
>
> How would buffering in RAM help in reality though (just trying to work
> through the scenerio in my head):
>
> producer tries to connect to a broker, it fails, so it appends the message
> to a in-memory store.  If the broker is down for say 20 minutes and then
> comes back online, won't this create problems now when the producer creates
> a new message, and it has 20 minutes of backlog, and the broker now is
> handling more load (assuming you are sending those in-memory messages using
> a different thread)?
>

This is why you over-provision the capacity of your Producers and Kafka
cluster. Engineer it that way, taking these requirements into account. If
done correctly, your Producer system should stream that backlog to your
Kafka cluster in much, much less time than 20 minutes. So your system now
has the characteristic that it rarely fails, but if it does, latency may
increase for a little while. But no data is lost and, just as importantly,
the system is *easy to understand and maintain*. Try doing that where the
Producers talk to Kafka, but talk to this other system if Kafka is down,
and do something else if the network is down, and do something else if the
disk is down etc etc.

If your system is incapable of catching up, it hasn't been correctly
designed. Yes, it'll cost more to overprovision, but that's what
engineering is about -- making trade-offs. Only really high-end systems
need more fault-tolerance IMHO. And they are expensive.

Just to be clear, I am not saying buffering in RAM is desirable. It isn't.
But it shouldn't almost never happen. If it is happening even more than
rarely, something is fundamentally wrong with the system.

Philip


>
>
>
>
> On Fri, Apr 12, 2013 at 11:21 AM, Philip O'Toole <ph...@loggly.com>
> wrote:
>
> > This is just my opinion of course (who else's could it be? :-)) but I
> think
> > from an engineering point of view, one must spend one's time making the
> > Producer-Kafka connection solid, if it is mission-critical.
> >
> > Kafka is all about getting messages to disk, and assuming your disks are
> > solid (and 0.8 has replication) those messages are safe. To then try to
> > build a system to cope with the Kafka brokers being unavailable seems
> like
> > you're setting yourself for infinite regress. And to write code in the
> > Producer to spool to disk seems even more pointless. If you're that
> > worried, why not run a dedicated Kafka broker on the same node as the
> > Producer, and connect over localhost? To turn around and write code to
> > spool to disk, because the primary system that *spools to disk* is down
> > seems to be missing the point.
> >
> > That said, even by going over local-host, I guess the network connection
> > could go down. In that case, Producers should buffer in RAM, and start
> > sending some major alerts to the Operations team. But this should almost
> > *never happen*. If it is happening regularly *something is fundamentally
> > wrong with your system design*. Those Producers should also refuse any
> more
> > incoming traffic and await intervention. Even bringing up "netcat -l" and
> > letting it suck in the data and write it to disk would work then.
> > Alternatives include having Producers connect to a load-balancer with
> > multiple Kafka brokers behind it, which helps you deal with any one Kafka
> > broker failing. Or just have your Producers connect directly to multiple
> > Kafka brokers, and switch over as needed if any one broker goes down.
> >
> > I don't know if the standard Kafka producer that ships with Kafka
> supports
> > buffering in RAM in an emergency. We wrote our own that does, with a
> focus
> > on speed and simplicity, but I expect it will very rarely, if ever,
> buffer
> > in RAM.
> >
> > Building and using semi-reliable system after semi-reliable system, and
> > chaining them all together, hoping to be more tolerant of failure is not
> > necessarily a good approach. Instead, identifying that one system that is
> > critical, and ensuring that it remains up (redundant installations,
> > redundant disks, redundant network connections etc) is a better approach
> > IMHO.
> >
> > Philip
> >
> >
> > On Fri, Apr 12, 2013 at 7:54 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Another way to handle this is to provision enough client and broker
> > servers
> > > so that the peak load can be handled without spooling.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <piotr@liveramp.com
> > > >wrote:
> > >
> > > > Jun,
> > > >
> > > > When talking about "catastrophic consequences" I was actually only
> > > > referring to the producer side. in our use case (logging requests
> from
> > > > webapp servers), a spike in traffic would force us to either
> tolerate a
> > > > dramatic increase in the response time, or drop messages, both of
> which
> > > are
> > > > really undesirable. Hence the need to absorb spikes with some system
> on
> > > top
> > > > of Kafka, unless the spooling feature mentioned by Wing (
> > > > https://issues.apache.org/jira/browse/KAFKA-156) is implemented.
> This
> > is
> > > > assuming there are a lot more producer machines than broker nodes, so
> > > each
> > > > producer would absorb a small part of the extra load from the spike.
> > > >
> > > > Piotr
> > > >
> > > > On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <ju...@gmail.com> wrote:
> > > >
> > > > > Piotr,
> > > > >
> > > > > Actually, could you clarify what "catastrophic consequences" did
> you
> > > see
> > > > on
> > > > > the broker side? Do clients timeout due to longer serving time or
> > > > something
> > > > > else?
> > > > >
> > > > > Going forward, we plan to add per client quotas (KAFKA-656) to
> > prevent
> > > > the
> > > > > brokers from being overwhelmed by a runaway client.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > > > > otis_gospodnetic@yahoo.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Is there anything one can do to "defend" from:
> > > > > >
> > > > > > "Trying to push more data than the brokers can handle for any
> > > sustained
> > > > > > period of time has catastrophic consequences, regardless of what
> > > > timeout
> > > > > > settings are used. In our use case this means that we need to
> > either
> > > > > ensure
> > > > > > we have spare capacity for spikes, or use something on top of
> Kafka
> > > to
> > > > > > absorb spikes."
> > > > > >
> > > > > > ?
> > > > > > Thanks,
> > > > > > Otis
> > > > > > ----
> > > > > > Performance Monitoring for Solr / ElasticSearch / HBase -
> > > > > > http://sematext.com/spm
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > >________________________________
> > > > > > > From: Piotr Kozikowski <pi...@liveramp.com>
> > > > > > >To: users@kafka.apache.org
> > > > > > >Sent: Tuesday, April 9, 2013 1:23 PM
> > > > > > >Subject: Re: Analysis of producer performance
> > > > > > >
> > > > > > >Jun,
> > > > > > >
> > > > > > >Thank you for your comments. I'll reply point by point for
> > clarity.
> > > > > > >
> > > > > > >1. We were aware of the migration tool but since we haven't used
> > > Kafka
> > > > > for
> > > > > > >production yet we just started using the 0.8 version directly.
> > > > > > >
> > > > > > >2. I hadn't seen those particular slides, very interesting. I'm
> > not
> > > > sure
> > > > > > >we're testing the same thing though. In our case we vary the
> > number
> > > of
> > > > > > >physical machines, but each one has 10 threads accessing a pool
> of
> > > > Kafka
> > > > > > >producer objects and in theory a single machine is enough to
> > > saturate
> > > > > the
> > > > > > >brokers (which our test mostly confirms). Also, assuming that
> the
> > > > slides
> > > > > > >are based on the built-in producer performance tool, I know that
> > we
> > > > > > started
> > > > > > >getting very different numbers once we switched to use "real"
> > > (actual
> > > > > > >production log) messages. Compression may also be a factor in
> case
> > > it
> > > > > > >wasn't configured the same way in those tests.
> > > > > > >
> > > > > > >3. In the latency section, there are two tests, one for average
> > and
> > > > > > another
> > > > > > >for maximum latency. Each one has two graphs presenting the
> exact
> > > same
> > > > > > data
> > > > > > >but at different levels of zoom. The first one is to observe
> small
> > > > > > >variations of latency when target throughput <= actual
> throughput.
> > > The
> > > > > > >second is to observe the overall shape of the graph once latency
> > > > starts
> > > > > > >growing when target throughput > actual throughput. I hope that
> > > makes
> > > > > > sense.
> > > > > > >
> > > > > > >4. That sounds great, looking forward to it.
> > > > > > >
> > > > > > >Piotr
> > > > > > >
> > > > > > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com>
> wrote:
> > > > > > >
> > > > > > >> Piotr,
> > > > > > >>
> > > > > > >> Thanks for sharing this. Very interesting and useful study. A
> > few
> > > > > > comments:
> > > > > > >>
> > > > > > >> 1. For existing 0.7 users, we have a migration tool that
> mirrors
> > > > data
> > > > > > from
> > > > > > >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to
> > 0.8
> > > by
> > > > > > >> upgrading consumers first, followed by producers.
> > > > > > >>
> > > > > > >> 2. Have you looked at the Kafka ApacheCon slides (
> > > > > > >>
> > http://www.slideshare.net/junrao/kafka-replication-apachecon2013
> > > )?
> > > > > > Towards
> > > > > > >> the end, there are some performance numbers too. The figure
> for
> > > > > > throughput
> > > > > > >> vs #producer is different from what you have. Not sure if this
> > is
> > > > > > because
> > > > > > >> that you have turned on compression.
> > > > > > >>
> > > > > > >> 3. Not sure that I understand the difference btw the first 2
> > > graphs
> > > > in
> > > > > > the
> > > > > > >> latency section. What's different btw the 2 tests?
> > > > > > >>
> > > > > > >> 4. Post 0.8, we plan to improve the producer side throughput
> by
> > > > > > >> implementing non-blocking socket on the client side.
> > > > > > >>
> > > > > > >> Jun
> > > > > > >>
> > > > > > >>
> > > > > > >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <
> > > > piotr@liveramp.com>
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Hi,
> > > > > > >> >
> > > > > > >> > At LiveRamp we are considering replacing Scribe with Kafka,
> > and
> > > > as a
> > > > > > >> first
> > > > > > >> > step we run some tests to evaluate producer performance. You
> > can
> > > > > find
> > > > > > our
> > > > > > >> > preliminary results here:
> > > > > > >> >
> > > > > >
> > > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
> > > > .
> > > > > > >> We
> > > > > > >> > hope this will be useful for some folks, and If anyone has
> > > > comments
> > > > > or
> > > > > > >> > suggestions about what to do differently to obtain better
> > > results
> > > > > your
> > > > > > >> > feedback will be very welcome.
> > > > > > >> >
> > > > > > >> > Thanks,
> > > > > > >> >
> > > > > > >> > Piotr
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Analysis of producer performance -- and Producer-Kafka reliability

Posted by S Ahmed <sa...@gmail.com>.

Interesting topic.

How would buffering in RAM help in reality though (just trying to work
through the scenerio in my head):

producer tries to connect to a broker, it fails, so it appends the message
to a in-memory store.  If the broker is down for say 20 minutes and then
comes back online, won't this create problems now when the producer creates
a new message, and it has 20 minutes of backlog, and the broker now is
handling more load (assuming you are sending those in-memory messages using
a different thread)?




On Fri, Apr 12, 2013 at 11:21 AM, Philip O'Toole <ph...@loggly.com> wrote:

> This is just my opinion of course (who else's could it be? :-)) but I think
> from an engineering point of view, one must spend one's time making the
> Producer-Kafka connection solid, if it is mission-critical.
>
> Kafka is all about getting messages to disk, and assuming your disks are
> solid (and 0.8 has replication) those messages are safe. To then try to
> build a system to cope with the Kafka brokers being unavailable seems like
> you're setting yourself for infinite regress. And to write code in the
> Producer to spool to disk seems even more pointless. If you're that
> worried, why not run a dedicated Kafka broker on the same node as the
> Producer, and connect over localhost? To turn around and write code to
> spool to disk, because the primary system that *spools to disk* is down
> seems to be missing the point.
>
> That said, even by going over local-host, I guess the network connection
> could go down. In that case, Producers should buffer in RAM, and start
> sending some major alerts to the Operations team. But this should almost
> *never happen*. If it is happening regularly *something is fundamentally
> wrong with your system design*. Those Producers should also refuse any more
> incoming traffic and await intervention. Even bringing up "netcat -l" and
> letting it suck in the data and write it to disk would work then.
> Alternatives include having Producers connect to a load-balancer with
> multiple Kafka brokers behind it, which helps you deal with any one Kafka
> broker failing. Or just have your Producers connect directly to multiple
> Kafka brokers, and switch over as needed if any one broker goes down.
>
> I don't know if the standard Kafka producer that ships with Kafka supports
> buffering in RAM in an emergency. We wrote our own that does, with a focus
> on speed and simplicity, but I expect it will very rarely, if ever, buffer
> in RAM.
>
> Building and using semi-reliable system after semi-reliable system, and
> chaining them all together, hoping to be more tolerant of failure is not
> necessarily a good approach. Instead, identifying that one system that is
> critical, and ensuring that it remains up (redundant installations,
> redundant disks, redundant network connections etc) is a better approach
> IMHO.
>
> Philip
>
>
> On Fri, Apr 12, 2013 at 7:54 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > Another way to handle this is to provision enough client and broker
> servers
> > so that the peak load can be handled without spooling.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Thu, Apr 11, 2013 at 5:45 PM, Piotr Kozikowski <piotr@liveramp.com
> > >wrote:
> >
> > > Jun,
> > >
> > > When talking about "catastrophic consequences" I was actually only
> > > referring to the producer side. in our use case (logging requests from
> > > webapp servers), a spike in traffic would force us to either tolerate a
> > > dramatic increase in the response time, or drop messages, both of which
> > are
> > > really undesirable. Hence the need to absorb spikes with some system on
> > top
> > > of Kafka, unless the spooling feature mentioned by Wing (
> > > https://issues.apache.org/jira/browse/KAFKA-156) is implemented. This
> is
> > > assuming there are a lot more producer machines than broker nodes, so
> > each
> > > producer would absorb a small part of the extra load from the spike.
> > >
> > > Piotr
> > >
> > > On Wed, Apr 10, 2013 at 10:17 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > Piotr,
> > > >
> > > > Actually, could you clarify what "catastrophic consequences" did you
> > see
> > > on
> > > > the broker side? Do clients timeout due to longer serving time or
> > > something
> > > > else?
> > > >
> > > > Going forward, we plan to add per client quotas (KAFKA-656) to
> prevent
> > > the
> > > > brokers from being overwhelmed by a runaway client.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Wed, Apr 10, 2013 at 12:04 PM, Otis Gospodnetic <
> > > > otis_gospodnetic@yahoo.com> wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Is there anything one can do to "defend" from:
> > > > >
> > > > > "Trying to push more data than the brokers can handle for any
> > sustained
> > > > > period of time has catastrophic consequences, regardless of what
> > > timeout
> > > > > settings are used. In our use case this means that we need to
> either
> > > > ensure
> > > > > we have spare capacity for spikes, or use something on top of Kafka
> > to
> > > > > absorb spikes."
> > > > >
> > > > > ?
> > > > > Thanks,
> > > > > Otis
> > > > > ----
> > > > > Performance Monitoring for Solr / ElasticSearch / HBase -
> > > > > http://sematext.com/spm
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > >________________________________
> > > > > > From: Piotr Kozikowski <pi...@liveramp.com>
> > > > > >To: users@kafka.apache.org
> > > > > >Sent: Tuesday, April 9, 2013 1:23 PM
> > > > > >Subject: Re: Analysis of producer performance
> > > > > >
> > > > > >Jun,
> > > > > >
> > > > > >Thank you for your comments. I'll reply point by point for
> clarity.
> > > > > >
> > > > > >1. We were aware of the migration tool but since we haven't used
> > Kafka
> > > > for
> > > > > >production yet we just started using the 0.8 version directly.
> > > > > >
> > > > > >2. I hadn't seen those particular slides, very interesting. I'm
> not
> > > sure
> > > > > >we're testing the same thing though. In our case we vary the
> number
> > of
> > > > > >physical machines, but each one has 10 threads accessing a pool of
> > > Kafka
> > > > > >producer objects and in theory a single machine is enough to
> > saturate
> > > > the
> > > > > >brokers (which our test mostly confirms). Also, assuming that the
> > > slides
> > > > > >are based on the built-in producer performance tool, I know that
> we
> > > > > started
> > > > > >getting very different numbers once we switched to use "real"
> > (actual
> > > > > >production log) messages. Compression may also be a factor in case
> > it
> > > > > >wasn't configured the same way in those tests.
> > > > > >
> > > > > >3. In the latency section, there are two tests, one for average
> and
> > > > > another
> > > > > >for maximum latency. Each one has two graphs presenting the exact
> > same
> > > > > data
> > > > > >but at different levels of zoom. The first one is to observe small
> > > > > >variations of latency when target throughput <= actual throughput.
> > The
> > > > > >second is to observe the overall shape of the graph once latency
> > > starts
> > > > > >growing when target throughput > actual throughput. I hope that
> > makes
> > > > > sense.
> > > > > >
> > > > > >4. That sounds great, looking forward to it.
> > > > > >
> > > > > >Piotr
> > > > > >
> > > > > >On Mon, Apr 8, 2013 at 9:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > > > > >
> > > > > >> Piotr,
> > > > > >>
> > > > > >> Thanks for sharing this. Very interesting and useful study. A
> few
> > > > > comments:
> > > > > >>
> > > > > >> 1. For existing 0.7 users, we have a migration tool that mirrors
> > > data
> > > > > from
> > > > > >> an 0.7 cluster to an 0.8 cluster. Applications can upgrade to
> 0.8
> > by
> > > > > >> upgrading consumers first, followed by producers.
> > > > > >>
> > > > > >> 2. Have you looked at the Kafka ApacheCon slides (
> > > > > >>
> http://www.slideshare.net/junrao/kafka-replication-apachecon2013
> > )?
> > > > > Towards
> > > > > >> the end, there are some performance numbers too. The figure for
> > > > > throughput
> > > > > >> vs #producer is different from what you have. Not sure if this
> is
> > > > > because
> > > > > >> that you have turned on compression.
> > > > > >>
> > > > > >> 3. Not sure that I understand the difference btw the first 2
> > graphs
> > > in
> > > > > the
> > > > > >> latency section. What's different btw the 2 tests?
> > > > > >>
> > > > > >> 4. Post 0.8, we plan to improve the producer side throughput by
> > > > > >> implementing non-blocking socket on the client side.
> > > > > >>
> > > > > >> Jun
> > > > > >>
> > > > > >>
> > > > > >> On Mon, Apr 8, 2013 at 4:42 PM, Piotr Kozikowski <
> > > piotr@liveramp.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Hi,
> > > > > >> >
> > > > > >> > At LiveRamp we are considering replacing Scribe with Kafka,
> and
> > > as a
> > > > > >> first
> > > > > >> > step we run some tests to evaluate producer performance. You
> can
> > > > find
> > > > > our
> > > > > >> > preliminary results here:
> > > > > >> >
> > > > >
> > https://blog.liveramp.com/2013/04/08/kafka-0-8-producer-performance-2/
> > > .
> > > > > >> We
> > > > > >> > hope this will be useful for some folks, and If anyone has
> > > comments
> > > > or
> > > > > >> > suggestions about what to do differently to obtain better
> > results
> > > > your
> > > > > >> > feedback will be very welcome.
> > > > > >> >
> > > > > >> > Thanks,
> > > > > >> >
> > > > > >> > Piotr
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>