You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Daniel Compton <de...@danielcompton.net> on 2014/06/30 08:24:07 UTC

Kafka producer performance test sending 0x0 byte messages

Hi folks

I was doing some performance testing using the built in Kafka performance tester and it seems like it sends messages of size n bytes but with all bytes having the value 0x0. Is that correct? Reading the source seemed to indicate that too but I'm not a Scala developer so I could be wrong. 

Would this affect the performance compared to a real world scenario? Obviously you will get very efficient compression rates but apart from that, is there likely to be optimisations carried out  anywhere between the JVM and the network card that won't hold for messages with non zero entropy?

We're going to test this against our production workload so it's not a big deal for us but I wondered if this could give others skewed results?

---
Daniel

Re: Kafka producer performance test sending 0x0 byte messages

Posted by Jun Rao <ju...@gmail.com>.
Yes, this is a problem and will indeed affect the producer performance when
compression is turned on. Perhaps we should fill in the values with some
randomized bytes. Could you file a jira for this?

Thanks,

Jun


On Sun, Jun 29, 2014 at 11:24 PM, Daniel Compton <de...@danielcompton.net>
wrote:

> Hi folks
>
> I was doing some performance testing using the built in Kafka performance
> tester and it seems like it sends messages of size n bytes but with all
> bytes having the value 0x0. Is that correct? Reading the source seemed to
> indicate that too but I'm not a Scala developer so I could be wrong.
>
> Would this affect the performance compared to a real world scenario?
> Obviously you will get very efficient compression rates but apart from
> that, is there likely to be optimisations carried out  anywhere between the
> JVM and the network card that won't hold for messages with non zero entropy?
>
> We're going to test this against our production workload so it's not a big
> deal for us but I wondered if this could give others skewed results?
>
> ---
> Daniel

Re: Kafka producer performance test sending 0x0 byte messages

Posted by Bert Corderman <be...@gmail.com>.
Jun,

let me see if I can fix first and then will submit back.

Daniel,

I was looking at the code some more and was thinking this might work

https://github.com/apache/kafka/blob/0.8.1/perf/src/main/scala/kafka/perf/ProducerPerformance.scala

on line 246 instead of looping to create messages  I could open a sample
file and add the rows as messages lien by line until I hit configured
message cap.  If I hit end of file I would start at the top.  I think I can
figure this out.

Bert


On Mon, Jun 30, 2014 at 1:46 PM, Daniel Compton <de...@danielcompton.net>
wrote:

> Hi Bert
>
> What you are describing could be done partially with the console producer.
> It will read from a file and send each line to the Kafka broker. You could
> make a really big file or alter that code to repeat a certain number of
> times. The source is pretty readable, I think that might be an easier route
> to take.
>
> Daniel.
>
> > On 1/07/2014, at 2:07 am, Bert Corderman <be...@gmail.com> wrote:
> >
> > Daniel,
> >
> >
> >
> > We have the same question.  We noticed that the compression tests we ran
> > using the built in performance tester was not realistic.  I think on disk
> > compression was 200:1.  (yes that is two hundred to one) I had planned to
> > try and edit the producer performance tester source and do the following
> >
> >
> >
> > 1.       Add an option to read sample data from provided text file.
> > (thought would be to add a file with 1-5000 rows, whatever I thought my
> > batch size might be)
> >
> > 2.      Load sample file into array
> >
> > 3.      Change code that creates message to pull a random row from array
> >
> >
> >
> > I also am not a Scala developer  so would take me a little bit to figure
> > this out.  This is on hold right now as I am looking at options of
> > compression of the message before sending to kafka.  We had originally
> not
> > wanted to do this as we are assuming that we would not get efficient
> > compression ratios as we are only doing a single message however we are
> > also talking about sending multiple messages from our application as a
> > single Kafka message.  Our concern with using kafka compression is the
> > overhead required from decompression on the broker to assign Ids.  Here
> is
> > a good article that describes this
> >
> http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/
> >
> >
> >
> > But again we haven’t decided just yet.  Would like to test and evaluate.
> >
> >
> >
> > Bert
> >
> >
> > On Mon, Jun 30, 2014 at 2:24 AM, Daniel Compton <de...@danielcompton.net>
> > wrote:
> >
> >> Hi folks
> >>
> >> I was doing some performance testing using the built in Kafka
> performance
> >> tester and it seems like it sends messages of size n bytes but with all
> >> bytes having the value 0x0. Is that correct? Reading the source seemed
> to
> >> indicate that too but I'm not a Scala developer so I could be wrong.
> >>
> >> Would this affect the performance compared to a real world scenario?
> >> Obviously you will get very efficient compression rates but apart from
> >> that, is there likely to be optimisations carried out  anywhere between
> the
> >> JVM and the network card that won't hold for messages with non zero
> entropy?
> >>
> >> We're going to test this against our production workload so it's not a
> big
> >> deal for us but I wondered if this could give others skewed results?
> >>
> >> ---
> >> Daniel
>

Re: Kafka producer performance test sending 0x0 byte messages

Posted by Daniel Compton <de...@danielcompton.net>.
Hi Bert

What you are describing could be done partially with the console producer. It will read from a file and send each line to the Kafka broker. You could make a really big file or alter that code to repeat a certain number of times. The source is pretty readable, I think that might be an easier route to take. 

Daniel.

> On 1/07/2014, at 2:07 am, Bert Corderman <be...@gmail.com> wrote:
> 
> Daniel,
> 
> 
> 
> We have the same question.  We noticed that the compression tests we ran
> using the built in performance tester was not realistic.  I think on disk
> compression was 200:1.  (yes that is two hundred to one) I had planned to
> try and edit the producer performance tester source and do the following
> 
> 
> 
> 1.       Add an option to read sample data from provided text file.
> (thought would be to add a file with 1-5000 rows, whatever I thought my
> batch size might be)
> 
> 2.      Load sample file into array
> 
> 3.      Change code that creates message to pull a random row from array
> 
> 
> 
> I also am not a Scala developer  so would take me a little bit to figure
> this out.  This is on hold right now as I am looking at options of
> compression of the message before sending to kafka.  We had originally not
> wanted to do this as we are assuming that we would not get efficient
> compression ratios as we are only doing a single message however we are
> also talking about sending multiple messages from our application as a
> single Kafka message.  Our concern with using kafka compression is the
> overhead required from decompression on the broker to assign Ids.  Here is
> a good article that describes this
> http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/
> 
> 
> 
> But again we haven’t decided just yet.  Would like to test and evaluate.
> 
> 
> 
> Bert
> 
> 
> On Mon, Jun 30, 2014 at 2:24 AM, Daniel Compton <de...@danielcompton.net>
> wrote:
> 
>> Hi folks
>> 
>> I was doing some performance testing using the built in Kafka performance
>> tester and it seems like it sends messages of size n bytes but with all
>> bytes having the value 0x0. Is that correct? Reading the source seemed to
>> indicate that too but I'm not a Scala developer so I could be wrong.
>> 
>> Would this affect the performance compared to a real world scenario?
>> Obviously you will get very efficient compression rates but apart from
>> that, is there likely to be optimisations carried out  anywhere between the
>> JVM and the network card that won't hold for messages with non zero entropy?
>> 
>> We're going to test this against our production workload so it's not a big
>> deal for us but I wondered if this could give others skewed results?
>> 
>> ---
>> Daniel

Re: Kafka producer performance test sending 0x0 byte messages

Posted by Bert Corderman <be...@gmail.com>.
Daniel,



We have the same question.  We noticed that the compression tests we ran
using the built in performance tester was not realistic.  I think on disk
compression was 200:1.  (yes that is two hundred to one) I had planned to
try and edit the producer performance tester source and do the following



1.       Add an option to read sample data from provided text file.
(thought would be to add a file with 1-5000 rows, whatever I thought my
batch size might be)

2.      Load sample file into array

3.      Change code that creates message to pull a random row from array



I also am not a Scala developer  so would take me a little bit to figure
this out.  This is on hold right now as I am looking at options of
compression of the message before sending to kafka.  We had originally not
wanted to do this as we are assuming that we would not get efficient
compression ratios as we are only doing a single message however we are
also talking about sending multiple messages from our application as a
single Kafka message.  Our concern with using kafka compression is the
overhead required from decompression on the broker to assign Ids.  Here is
a good article that describes this
http://geekmantra.wordpress.com/2013/03/28/compression-in-kafka-gzip-or-snappy/



But again we haven’t decided just yet.  Would like to test and evaluate.



Bert


On Mon, Jun 30, 2014 at 2:24 AM, Daniel Compton <de...@danielcompton.net>
wrote:

> Hi folks
>
> I was doing some performance testing using the built in Kafka performance
> tester and it seems like it sends messages of size n bytes but with all
> bytes having the value 0x0. Is that correct? Reading the source seemed to
> indicate that too but I'm not a Scala developer so I could be wrong.
>
> Would this affect the performance compared to a real world scenario?
> Obviously you will get very efficient compression rates but apart from
> that, is there likely to be optimisations carried out  anywhere between the
> JVM and the network card that won't hold for messages with non zero entropy?
>
> We're going to test this against our production workload so it's not a big
> deal for us but I wondered if this could give others skewed results?
>
> ---
> Daniel