You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by yosi botzer <yo...@gmail.com> on 2014/01/01 13:17:57 UTC

only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Hi,

I am using kafka 0.8. I have 3 machines each running kafka broker.

I am using async mode of my Producer. I expected to see 3 different threads
with names starting with ProducerSendThread- (according to this article:
http://engineering.gnip.com/kafka-async-producer/)

However I can see only one thread with the name *ProducerSendThread-*

This is my producer configuration:

server=1
topic=dat7
metadata.broker.list=ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
serializer.class=kafka.serializer.DefaultEncoder
request.required.acks=1
compression.codec=snappy
producer.type=async
queue.buffering.max.ms=2000
queue.buffering.max.messages=1000
batch.num.messages=500


*What am I missing here?*


BTW, I have also experienced very strange behavior regrading my producer
performance (which may or may not be related to the issue above).

When I have defined a topic with 1 partition I got much better throughput
comparing to a topic with 3 partitions. A producer sending messages to a
topic with 3 partitions had much better throughput comparing to a topic
with 12 partitions.

I would expect to have best performance for the topic with 12 partitions
since I have 3 machines running a broker each of with 4 disks (the broker
is configured to use all 4 disks)

*Is there any logical explanation for this behavior?*

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Jun Rao <ju...@gmail.com>.
This is weird. The writing to different partitions should be independent.
Could you enable request log at the broker and see if the requests are what
you expect?

Thanks,

Jun


On Thu, Jan 2, 2014 at 4:12 AM, yosi botzer <yo...@gmail.com> wrote:

> Thanks Jun,
>
> I thought about your answer and came up with the following solution:
>
> Create a pool of async producers (the size of the pool is the same as the
> number of partitions) and define my own partitioner.
>
> Whenever I get a message I decide which producer to use with the same exact
> logic as my partitioner is using.
>
> This way All the messages targeted to a specific partition are handled by
> the same producer instance and when the time to send messages arrives all
> the messages should be sent to the same partition (and hence the same
> broker).  This way no multiple connections are needed.
>
> There is only one problem with this solution: *it does not work !*
>
> No matter what I try I can get reasonable performance when using a topic
> that has 1 partition, and as the number of partitions is going up the
> performance is becoming worse and worse.
>
> There is no sense whatsoever with this behavior. It means that I should
> choose between consumer scale (adding brokers and increasing partitions) or
> producer scale (reducing the number of partitions)
>
> There must be some kind of a workaround for this (increasing the buffer
> size does not really help)
>
> Yosi
>
>
>
> On Thu, Jan 2, 2014 at 7:40 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > When the producer send thread sends a batch of messages, it first
> > determines which partition each message should go to. It then groups
> > messages by broker (based on the leader of the partition of each
> > message) and sends a produce request per broker (each request may include
> > multiple partitions). Those produce requests are sent serially. So, if
> > there is only one partition, only 1 produce request needs to be sent per
> > batch of messages. If there are 3 partitions, chances are 3 produce
> > requests are needed. Because those produce requests are sent serially,
> the
> > more partitions you have, the more produce requests and the longer the
> > latency for sending a batch of messages. However, having 12 partitions
> > shouldn't be significantly worse than 3 partitions since there are only 3
> > brokers. One way to improve performance is to use a larger batch size.
> Try
> > making the batch size 3 times larger with 3 partitions.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Wed, Jan 1, 2014 at 12:43 PM, yosi botzer <yo...@gmail.com>
> > wrote:
> >
> > > Yes I am specifying a key for each message.
> > >
> > > The same producer code works much slower when sending messages to a
> topic
> > > with multiple partitions comparing to a topic with a single partition.
> > This
> > > doesn't make any sense to me at all.
> > >
> > > If I understand correctly I need multiple partitions in order to scale
> > the
> > > consumers.
> > >
> > > Could it be because the async producer is creating a connection per
> > broker
> > > (or per partition) and this is done in a serial way once the producer
> > needs
> > > to sens the messages? maybe when using a single partition the producer
> is
> > > dong it in one batch
> > >
> > > BTW, I have tried using multiple Producer instances but still I get
> poor
> > > performance when using a topic with multiple partitions (by multiple
> > > partitions I mean 12 which is exactly the number of broker machines
> > > multiply by the number of disks I have on each machine which sounds
> > > reasonable to me)
> > >
> > > Is there any solution anyone can think of?
> > >
> > >
> > > Yosi
> > >
> > >
> > >
> > > On Wed, Jan 1, 2014 at 7:57 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > In 0.7, we have 1 producer send thread per broker. This is changed in
> > > 0.8,
> > > > where there is only 1 producer send thread per producer. If a
> producer
> > > > needs to send messages to multiple brokers, the send thread will do
> > that
> > > > serially, which will reduce the throughput. We plan to improve that
> in
> > > 0.9
> > > > through client rewrites. For now, you can improve the throughput by
> > > either
> > > > using a larger batch size or using more producer instances.
> > > >
> > > > As for degraded performance with more partitions, are you specifying
> a
> > > key
> > > > for each message?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <yo...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using kafka 0.8. I have 3 machines each running kafka broker.
> > > > >
> > > > > I am using async mode of my Producer. I expected to see 3 different
> > > > threads
> > > > > with names starting with ProducerSendThread- (according to this
> > > article:
> > > > > http://engineering.gnip.com/kafka-async-producer/)
> > > > >
> > > > > However I can see only one thread with the name
> *ProducerSendThread-*
> > > > >
> > > > > This is my producer configuration:
> > > > >
> > > > > server=1
> > > > > topic=dat7
> > > > > metadata.broker.list=
> > > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > > request.required.acks=1
> > > > > compression.codec=snappy
> > > > > producer.type=async
> > > > > queue.buffering.max.ms=2000
> > > > > queue.buffering.max.messages=1000
> > > > > batch.num.messages=500
> > > > >
> > > > >
> > > > > *What am I missing here?*
> > > > >
> > > > >
> > > > > BTW, I have also experienced very strange behavior regrading my
> > > producer
> > > > > performance (which may or may not be related to the issue above).
> > > > >
> > > > > When I have defined a topic with 1 partition I got much better
> > > throughput
> > > > > comparing to a topic with 3 partitions. A producer sending messages
> > to
> > > a
> > > > > topic with 3 partitions had much better throughput comparing to a
> > topic
> > > > > with 12 partitions.
> > > > >
> > > > > I would expect to have best performance for the topic with 12
> > > partitions
> > > > > since I have 3 machines running a broker each of with 4 disks (the
> > > broker
> > > > > is configured to use all 4 disks)
> > > > >
> > > > > *Is there any logical explanation for this behavior?*
> > > > >
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by yosi botzer <yo...@gmail.com>.
Thanks Jun,

I thought about your answer and came up with the following solution:

Create a pool of async producers (the size of the pool is the same as the
number of partitions) and define my own partitioner.

Whenever I get a message I decide which producer to use with the same exact
logic as my partitioner is using.

This way All the messages targeted to a specific partition are handled by
the same producer instance and when the time to send messages arrives all
the messages should be sent to the same partition (and hence the same
broker).  This way no multiple connections are needed.

There is only one problem with this solution: *it does not work !*

No matter what I try I can get reasonable performance when using a topic
that has 1 partition, and as the number of partitions is going up the
performance is becoming worse and worse.

There is no sense whatsoever with this behavior. It means that I should
choose between consumer scale (adding brokers and increasing partitions) or
producer scale (reducing the number of partitions)

There must be some kind of a workaround for this (increasing the buffer
size does not really help)

Yosi



On Thu, Jan 2, 2014 at 7:40 AM, Jun Rao <ju...@gmail.com> wrote:

> When the producer send thread sends a batch of messages, it first
> determines which partition each message should go to. It then groups
> messages by broker (based on the leader of the partition of each
> message) and sends a produce request per broker (each request may include
> multiple partitions). Those produce requests are sent serially. So, if
> there is only one partition, only 1 produce request needs to be sent per
> batch of messages. If there are 3 partitions, chances are 3 produce
> requests are needed. Because those produce requests are sent serially, the
> more partitions you have, the more produce requests and the longer the
> latency for sending a batch of messages. However, having 12 partitions
> shouldn't be significantly worse than 3 partitions since there are only 3
> brokers. One way to improve performance is to use a larger batch size. Try
> making the batch size 3 times larger with 3 partitions.
>
> Thanks,
>
> Jun
>
>
> On Wed, Jan 1, 2014 at 12:43 PM, yosi botzer <yo...@gmail.com>
> wrote:
>
> > Yes I am specifying a key for each message.
> >
> > The same producer code works much slower when sending messages to a topic
> > with multiple partitions comparing to a topic with a single partition.
> This
> > doesn't make any sense to me at all.
> >
> > If I understand correctly I need multiple partitions in order to scale
> the
> > consumers.
> >
> > Could it be because the async producer is creating a connection per
> broker
> > (or per partition) and this is done in a serial way once the producer
> needs
> > to sens the messages? maybe when using a single partition the producer is
> > dong it in one batch
> >
> > BTW, I have tried using multiple Producer instances but still I get poor
> > performance when using a topic with multiple partitions (by multiple
> > partitions I mean 12 which is exactly the number of broker machines
> > multiply by the number of disks I have on each machine which sounds
> > reasonable to me)
> >
> > Is there any solution anyone can think of?
> >
> >
> > Yosi
> >
> >
> >
> > On Wed, Jan 1, 2014 at 7:57 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > In 0.7, we have 1 producer send thread per broker. This is changed in
> > 0.8,
> > > where there is only 1 producer send thread per producer. If a producer
> > > needs to send messages to multiple brokers, the send thread will do
> that
> > > serially, which will reduce the throughput. We plan to improve that in
> > 0.9
> > > through client rewrites. For now, you can improve the throughput by
> > either
> > > using a larger batch size or using more producer instances.
> > >
> > > As for degraded performance with more partitions, are you specifying a
> > key
> > > for each message?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <yo...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using kafka 0.8. I have 3 machines each running kafka broker.
> > > >
> > > > I am using async mode of my Producer. I expected to see 3 different
> > > threads
> > > > with names starting with ProducerSendThread- (according to this
> > article:
> > > > http://engineering.gnip.com/kafka-async-producer/)
> > > >
> > > > However I can see only one thread with the name *ProducerSendThread-*
> > > >
> > > > This is my producer configuration:
> > > >
> > > > server=1
> > > > topic=dat7
> > > > metadata.broker.list=
> > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > request.required.acks=1
> > > > compression.codec=snappy
> > > > producer.type=async
> > > > queue.buffering.max.ms=2000
> > > > queue.buffering.max.messages=1000
> > > > batch.num.messages=500
> > > >
> > > >
> > > > *What am I missing here?*
> > > >
> > > >
> > > > BTW, I have also experienced very strange behavior regrading my
> > producer
> > > > performance (which may or may not be related to the issue above).
> > > >
> > > > When I have defined a topic with 1 partition I got much better
> > throughput
> > > > comparing to a topic with 3 partitions. A producer sending messages
> to
> > a
> > > > topic with 3 partitions had much better throughput comparing to a
> topic
> > > > with 12 partitions.
> > > >
> > > > I would expect to have best performance for the topic with 12
> > partitions
> > > > since I have 3 machines running a broker each of with 4 disks (the
> > broker
> > > > is configured to use all 4 disks)
> > > >
> > > > *Is there any logical explanation for this behavior?*
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Jun Rao <ju...@gmail.com>.
When the producer send thread sends a batch of messages, it first
determines which partition each message should go to. It then groups
messages by broker (based on the leader of the partition of each
message) and sends a produce request per broker (each request may include
multiple partitions). Those produce requests are sent serially. So, if
there is only one partition, only 1 produce request needs to be sent per
batch of messages. If there are 3 partitions, chances are 3 produce
requests are needed. Because those produce requests are sent serially, the
more partitions you have, the more produce requests and the longer the
latency for sending a batch of messages. However, having 12 partitions
shouldn't be significantly worse than 3 partitions since there are only 3
brokers. One way to improve performance is to use a larger batch size. Try
making the batch size 3 times larger with 3 partitions.

Thanks,

Jun


On Wed, Jan 1, 2014 at 12:43 PM, yosi botzer <yo...@gmail.com> wrote:

> Yes I am specifying a key for each message.
>
> The same producer code works much slower when sending messages to a topic
> with multiple partitions comparing to a topic with a single partition. This
> doesn't make any sense to me at all.
>
> If I understand correctly I need multiple partitions in order to scale the
> consumers.
>
> Could it be because the async producer is creating a connection per broker
> (or per partition) and this is done in a serial way once the producer needs
> to sens the messages? maybe when using a single partition the producer is
> dong it in one batch
>
> BTW, I have tried using multiple Producer instances but still I get poor
> performance when using a topic with multiple partitions (by multiple
> partitions I mean 12 which is exactly the number of broker machines
> multiply by the number of disks I have on each machine which sounds
> reasonable to me)
>
> Is there any solution anyone can think of?
>
>
> Yosi
>
>
>
> On Wed, Jan 1, 2014 at 7:57 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > In 0.7, we have 1 producer send thread per broker. This is changed in
> 0.8,
> > where there is only 1 producer send thread per producer. If a producer
> > needs to send messages to multiple brokers, the send thread will do that
> > serially, which will reduce the throughput. We plan to improve that in
> 0.9
> > through client rewrites. For now, you can improve the throughput by
> either
> > using a larger batch size or using more producer instances.
> >
> > As for degraded performance with more partitions, are you specifying a
> key
> > for each message?
> >
> > Thanks,
> >
> > Jun
> >
> > On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <yo...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am using kafka 0.8. I have 3 machines each running kafka broker.
> > >
> > > I am using async mode of my Producer. I expected to see 3 different
> > threads
> > > with names starting with ProducerSendThread- (according to this
> article:
> > > http://engineering.gnip.com/kafka-async-producer/)
> > >
> > > However I can see only one thread with the name *ProducerSendThread-*
> > >
> > > This is my producer configuration:
> > >
> > > server=1
> > > topic=dat7
> > > metadata.broker.list=
> > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > serializer.class=kafka.serializer.DefaultEncoder
> > > request.required.acks=1
> > > compression.codec=snappy
> > > producer.type=async
> > > queue.buffering.max.ms=2000
> > > queue.buffering.max.messages=1000
> > > batch.num.messages=500
> > >
> > >
> > > *What am I missing here?*
> > >
> > >
> > > BTW, I have also experienced very strange behavior regrading my
> producer
> > > performance (which may or may not be related to the issue above).
> > >
> > > When I have defined a topic with 1 partition I got much better
> throughput
> > > comparing to a topic with 3 partitions. A producer sending messages to
> a
> > > topic with 3 partitions had much better throughput comparing to a topic
> > > with 12 partitions.
> > >
> > > I would expect to have best performance for the topic with 12
> partitions
> > > since I have 3 machines running a broker each of with 4 disks (the
> broker
> > > is configured to use all 4 disks)
> > >
> > > *Is there any logical explanation for this behavior?*
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by yosi botzer <yo...@gmail.com>.
Yes I am specifying a key for each message.

The same producer code works much slower when sending messages to a topic
with multiple partitions comparing to a topic with a single partition. This
doesn't make any sense to me at all.

If I understand correctly I need multiple partitions in order to scale the
consumers.

Could it be because the async producer is creating a connection per broker
(or per partition) and this is done in a serial way once the producer needs
to sens the messages? maybe when using a single partition the producer is
dong it in one batch

BTW, I have tried using multiple Producer instances but still I get poor
performance when using a topic with multiple partitions (by multiple
partitions I mean 12 which is exactly the number of broker machines
multiply by the number of disks I have on each machine which sounds
reasonable to me)

Is there any solution anyone can think of?


Yosi



On Wed, Jan 1, 2014 at 7:57 PM, Jun Rao <ju...@gmail.com> wrote:

> In 0.7, we have 1 producer send thread per broker. This is changed in 0.8,
> where there is only 1 producer send thread per producer. If a producer
> needs to send messages to multiple brokers, the send thread will do that
> serially, which will reduce the throughput. We plan to improve that in 0.9
> through client rewrites. For now, you can improve the throughput by either
> using a larger batch size or using more producer instances.
>
> As for degraded performance with more partitions, are you specifying a key
> for each message?
>
> Thanks,
>
> Jun
>
> On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <yo...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using kafka 0.8. I have 3 machines each running kafka broker.
> >
> > I am using async mode of my Producer. I expected to see 3 different
> threads
> > with names starting with ProducerSendThread- (according to this article:
> > http://engineering.gnip.com/kafka-async-producer/)
> >
> > However I can see only one thread with the name *ProducerSendThread-*
> >
> > This is my producer configuration:
> >
> > server=1
> > topic=dat7
> > metadata.broker.list=
> > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > serializer.class=kafka.serializer.DefaultEncoder
> > request.required.acks=1
> > compression.codec=snappy
> > producer.type=async
> > queue.buffering.max.ms=2000
> > queue.buffering.max.messages=1000
> > batch.num.messages=500
> >
> >
> > *What am I missing here?*
> >
> >
> > BTW, I have also experienced very strange behavior regrading my producer
> > performance (which may or may not be related to the issue above).
> >
> > When I have defined a topic with 1 partition I got much better throughput
> > comparing to a topic with 3 partitions. A producer sending messages to a
> > topic with 3 partitions had much better throughput comparing to a topic
> > with 12 partitions.
> >
> > I would expect to have best performance for the topic with 12 partitions
> > since I have 3 machines running a broker each of with 4 disks (the broker
> > is configured to use all 4 disks)
> >
> > *Is there any logical explanation for this behavior?*
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Jun Rao <ju...@gmail.com>.
In 0.7, we have 1 producer send thread per broker. This is changed in 0.8,
where there is only 1 producer send thread per producer. If a producer
needs to send messages to multiple brokers, the send thread will do that
serially, which will reduce the throughput. We plan to improve that in 0.9
through client rewrites. For now, you can improve the throughput by either
using a larger batch size or using more producer instances.

As for degraded performance with more partitions, are you specifying a key
for each message?

Thanks,

Jun

On Wed, Jan 1, 2014 at 4:17 AM, yosi botzer <yo...@gmail.com> wrote:

> Hi,
>
> I am using kafka 0.8. I have 3 machines each running kafka broker.
>
> I am using async mode of my Producer. I expected to see 3 different threads
> with names starting with ProducerSendThread- (according to this article:
> http://engineering.gnip.com/kafka-async-producer/)
>
> However I can see only one thread with the name *ProducerSendThread-*
>
> This is my producer configuration:
>
> server=1
> topic=dat7
> metadata.broker.list=
> ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> serializer.class=kafka.serializer.DefaultEncoder
> request.required.acks=1
> compression.codec=snappy
> producer.type=async
> queue.buffering.max.ms=2000
> queue.buffering.max.messages=1000
> batch.num.messages=500
>
>
> *What am I missing here?*
>
>
> BTW, I have also experienced very strange behavior regrading my producer
> performance (which may or may not be related to the issue above).
>
> When I have defined a topic with 1 partition I got much better throughput
> comparing to a topic with 3 partitions. A producer sending messages to a
> topic with 3 partitions had much better throughput comparing to a topic
> with 12 partitions.
>
> I would expect to have best performance for the topic with 12 partitions
> since I have 3 machines running a broker each of with 4 disks (the broker
> is configured to use all 4 disks)
>
> *Is there any logical explanation for this behavior?*
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Chris Hogue <cs...@gmail.com>.
I see. If I'm following correctly you're talking about trying more
application threads passing messages to the producer, which then serializes
them down to a single threaded send.

That sounds right, multiple producer instances in the application are the
only real way to use the stock producer to go wider.

-Chris




On Wed, Jan 1, 2014 at 10:23 AM, Gerrit Jansen van Vuuren <
gerritjvv@gmail.com> wrote:

> The network is 10gbit so it seems unlikely. The 5 brokers were running
> without much load or probs. The bottle neck is that no matter how many
> threads I use for sending, the sync block in the send method will never go
> faster and its always limited to a single thread.
>
> I also use snappy outside and no compression in the producer. A single
> producer gives me max 6-10k tps, with 10 producers I can get max 60k tps.
> This is on my servers and with my payload.
>
> My end conclusion was that its impossible to scale a single producer
> instance, and more threads make no difference on the sending side.
> On 1 Jan 2014 17:31, "Chris Hogue" <cs...@gmail.com> wrote:
>
> > Have you found what the actual bottleneck is? Is it the network send? Of
> > course this would be highly influenced by the brokers' performance. After
> > removing all compression work from the brokers we were able to get enough
> > throughput from them that it's not really a concern.
> >
> > Another rough side-effect of the single synchronous send thread is that a
> > single degrading or otherwise slow broker can back up the producing for
> the
> > whole app. I haven't heard a great solution to this but would love to if
> > someone's come up with it.
> >
> > -Chris
> >
> >
> >
> > On Wed, Jan 1, 2014 at 9:10 AM, Gerrit Jansen van Vuuren <
> > gerritjvv@gmail.com> wrote:
> >
> > > I've seen this bottle neck regardless of using compression or not, bpth
> > > situations give me poor performance on sending to kafka via the scala
> > > producer api.
> > > On 1 Jan 2014 16:42, "Chris Hogue" <cs...@gmail.com> wrote:
> > >
> > > > Hi.
> > > >
> > > > When writing that blog we were using Kafka 0.7 as well. Understanding
> > > that
> > > > it probably wasn't the primary design goal, the separate send threads
> > per
> > > > broker that offered a separation of compression were a convenient
> > > > side-effect of that design.
> > > >
> > > > We've since built new systems on 0.8 that have concentrated high
> > > throughput
> > > > on a small number of producers and had this discovery early on as
> well.
> > > >
> > > > Instead we've taken responsibility for the compression before the
> > > producer
> > > > and done that on separate threads as appropriate. While helpful for
> > > > compression on the producer application the main reason for this is
> to
> > > > prevent the broker from uncompressing and re-compressing each message
> > as
> > > it
> > > > assigns offsets. There's a significant throughput advantage in doing
> > > this.
> > > >
> > > > Truthfully since switching to snappy the compression throughput on
> the
> > > > producer is much less of a concern in the overall context of the
> > > > application.
> > > >
> > > > There was some discussion of these issues in the 'Client Improvement
> > > > Discussion' thread a while ago where Jay provided some insight and
> > > > discussion on future directions.
> > > >
> > > > -Chris
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jan 1, 2014 at 5:42 AM, yosi botzer <yo...@gmail.com>
> > > wrote:
> > > >
> > > > > This is very interesting, this is what I see as well. I wish
> someone
> > > > could
> > > > > explain why it is not as explained here:
> > > > > http://engineering.gnip.com/kafka-async-producer/
> > > > >
> > > > >
> > > > > On Wed, Jan 1, 2014 at 2:39 PM, Gerrit Jansen van Vuuren <
> > > > > gerritjvv@gmail.com> wrote:
> > > > >
> > > > > > I don't know the code enough to comment on that (maybe someone
> else
> > > on
> > > > > the
> > > > > > user list can do that), but from what I've seen doing some heavy
> > > > > profiling
> > > > > > I only see one thread per producer instance, it doesn't matter
> how
> > > many
> > > > > > brokers or topics you have the number of threads is always 1 per
> > > > > producer.
> > > > > > If you create 2 producers 2 threads and so on.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 1, 2014 at 1:27 PM, yosi botzer <
> yosi.botzer@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > But shouldn't I see a separate thread per broker (I am using
> the
> > > > async
> > > > > > > mode)?  Why do I get a better performance sending a message
> that
> > > has
> > > > > > fewer
> > > > > > > partitions?
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
> > > > > > > gerritjvv@gmail.com> wrote:
> > > > > > >
> > > > > > > > The producer is heavily synchronized (i.e. all the code in
> the
> > > send
> > > > > > > method
> > > > > > > > is encapsulated in one huge synchronized block).
> > > > > > > > Try creating multiple producers and round robin send over
> them.
> > > > > > > >
> > > > > > > > e.g.
> > > > > > > >
> > > > > > > > p = producers[ n++ % producers.length ]
> > > > > > > >
> > > > > > > > p.send msg
> > > > > > > > This will give you one thread per producer instance.
> > > > > > > >
> > > > > > > > I'm working on an async multi threaded producer for kafka,
> but
> > > its
> > > > > > > nothing
> > > > > > > > near complete yet.
> > > > > > > > https://github.com/gerritjvv/kafka-fast
> > > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >  Gerrit
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <
> > > yosi.botzer@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I am using kafka 0.8. I have 3 machines each running kafka
> > > > broker.
> > > > > > > > >
> > > > > > > > > I am using async mode of my Producer. I expected to see 3
> > > > different
> > > > > > > > threads
> > > > > > > > > with names starting with ProducerSendThread- (according to
> > this
> > > > > > > article:
> > > > > > > > > http://engineering.gnip.com/kafka-async-producer/)
> > > > > > > > >
> > > > > > > > > However I can see only one thread with the name
> > > > > *ProducerSendThread-*
> > > > > > > > >
> > > > > > > > > This is my producer configuration:
> > > > > > > > >
> > > > > > > > > server=1
> > > > > > > > > topic=dat7
> > > > > > > > > metadata.broker.list=
> > > > > > > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > > > > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > > > > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > > > > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > > > > > > request.required.acks=1
> > > > > > > > > compression.codec=snappy
> > > > > > > > > producer.type=async
> > > > > > > > > queue.buffering.max.ms=2000
> > > > > > > > > queue.buffering.max.messages=1000
> > > > > > > > > batch.num.messages=500
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > *What am I missing here?*
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > BTW, I have also experienced very strange behavior
> regrading
> > my
> > > > > > > producer
> > > > > > > > > performance (which may or may not be related to the issue
> > > above).
> > > > > > > > >
> > > > > > > > > When I have defined a topic with 1 partition I got much
> > better
> > > > > > > throughput
> > > > > > > > > comparing to a topic with 3 partitions. A producer sending
> > > > messages
> > > > > > to
> > > > > > > a
> > > > > > > > > topic with 3 partitions had much better throughput
> comparing
> > > to a
> > > > > > topic
> > > > > > > > > with 12 partitions.
> > > > > > > > >
> > > > > > > > > I would expect to have best performance for the topic with
> 12
> > > > > > > partitions
> > > > > > > > > since I have 3 machines running a broker each of with 4
> disks
> > > > (the
> > > > > > > broker
> > > > > > > > > is configured to use all 4 disks)
> > > > > > > > >
> > > > > > > > > *Is there any logical explanation for this behavior?*
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Gerrit Jansen van Vuuren <ge...@gmail.com>.
The network is 10gbit so it seems unlikely. The 5 brokers were running
without much load or probs. The bottle neck is that no matter how many
threads I use for sending, the sync block in the send method will never go
faster and its always limited to a single thread.

I also use snappy outside and no compression in the producer. A single
producer gives me max 6-10k tps, with 10 producers I can get max 60k tps.
This is on my servers and with my payload.

My end conclusion was that its impossible to scale a single producer
instance, and more threads make no difference on the sending side.
On 1 Jan 2014 17:31, "Chris Hogue" <cs...@gmail.com> wrote:

> Have you found what the actual bottleneck is? Is it the network send? Of
> course this would be highly influenced by the brokers' performance. After
> removing all compression work from the brokers we were able to get enough
> throughput from them that it's not really a concern.
>
> Another rough side-effect of the single synchronous send thread is that a
> single degrading or otherwise slow broker can back up the producing for the
> whole app. I haven't heard a great solution to this but would love to if
> someone's come up with it.
>
> -Chris
>
>
>
> On Wed, Jan 1, 2014 at 9:10 AM, Gerrit Jansen van Vuuren <
> gerritjvv@gmail.com> wrote:
>
> > I've seen this bottle neck regardless of using compression or not, bpth
> > situations give me poor performance on sending to kafka via the scala
> > producer api.
> > On 1 Jan 2014 16:42, "Chris Hogue" <cs...@gmail.com> wrote:
> >
> > > Hi.
> > >
> > > When writing that blog we were using Kafka 0.7 as well. Understanding
> > that
> > > it probably wasn't the primary design goal, the separate send threads
> per
> > > broker that offered a separation of compression were a convenient
> > > side-effect of that design.
> > >
> > > We've since built new systems on 0.8 that have concentrated high
> > throughput
> > > on a small number of producers and had this discovery early on as well.
> > >
> > > Instead we've taken responsibility for the compression before the
> > producer
> > > and done that on separate threads as appropriate. While helpful for
> > > compression on the producer application the main reason for this is to
> > > prevent the broker from uncompressing and re-compressing each message
> as
> > it
> > > assigns offsets. There's a significant throughput advantage in doing
> > this.
> > >
> > > Truthfully since switching to snappy the compression throughput on the
> > > producer is much less of a concern in the overall context of the
> > > application.
> > >
> > > There was some discussion of these issues in the 'Client Improvement
> > > Discussion' thread a while ago where Jay provided some insight and
> > > discussion on future directions.
> > >
> > > -Chris
> > >
> > >
> > >
> > >
> > > On Wed, Jan 1, 2014 at 5:42 AM, yosi botzer <yo...@gmail.com>
> > wrote:
> > >
> > > > This is very interesting, this is what I see as well. I wish someone
> > > could
> > > > explain why it is not as explained here:
> > > > http://engineering.gnip.com/kafka-async-producer/
> > > >
> > > >
> > > > On Wed, Jan 1, 2014 at 2:39 PM, Gerrit Jansen van Vuuren <
> > > > gerritjvv@gmail.com> wrote:
> > > >
> > > > > I don't know the code enough to comment on that (maybe someone else
> > on
> > > > the
> > > > > user list can do that), but from what I've seen doing some heavy
> > > > profiling
> > > > > I only see one thread per producer instance, it doesn't matter how
> > many
> > > > > brokers or topics you have the number of threads is always 1 per
> > > > producer.
> > > > > If you create 2 producers 2 threads and so on.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Jan 1, 2014 at 1:27 PM, yosi botzer <yosi.botzer@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > But shouldn't I see a separate thread per broker (I am using the
> > > async
> > > > > > mode)?  Why do I get a better performance sending a message that
> > has
> > > > > fewer
> > > > > > partitions?
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
> > > > > > gerritjvv@gmail.com> wrote:
> > > > > >
> > > > > > > The producer is heavily synchronized (i.e. all the code in the
> > send
> > > > > > method
> > > > > > > is encapsulated in one huge synchronized block).
> > > > > > > Try creating multiple producers and round robin send over them.
> > > > > > >
> > > > > > > e.g.
> > > > > > >
> > > > > > > p = producers[ n++ % producers.length ]
> > > > > > >
> > > > > > > p.send msg
> > > > > > > This will give you one thread per producer instance.
> > > > > > >
> > > > > > > I'm working on an async multi threaded producer for kafka, but
> > its
> > > > > > nothing
> > > > > > > near complete yet.
> > > > > > > https://github.com/gerritjvv/kafka-fast
> > > > > > >
> > > > > > >
> > > > > > > Regards,
> > > > > > >  Gerrit
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <
> > yosi.botzer@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am using kafka 0.8. I have 3 machines each running kafka
> > > broker.
> > > > > > > >
> > > > > > > > I am using async mode of my Producer. I expected to see 3
> > > different
> > > > > > > threads
> > > > > > > > with names starting with ProducerSendThread- (according to
> this
> > > > > > article:
> > > > > > > > http://engineering.gnip.com/kafka-async-producer/)
> > > > > > > >
> > > > > > > > However I can see only one thread with the name
> > > > *ProducerSendThread-*
> > > > > > > >
> > > > > > > > This is my producer configuration:
> > > > > > > >
> > > > > > > > server=1
> > > > > > > > topic=dat7
> > > > > > > > metadata.broker.list=
> > > > > > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > > > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > > > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > > > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > > > > > request.required.acks=1
> > > > > > > > compression.codec=snappy
> > > > > > > > producer.type=async
> > > > > > > > queue.buffering.max.ms=2000
> > > > > > > > queue.buffering.max.messages=1000
> > > > > > > > batch.num.messages=500
> > > > > > > >
> > > > > > > >
> > > > > > > > *What am I missing here?*
> > > > > > > >
> > > > > > > >
> > > > > > > > BTW, I have also experienced very strange behavior regrading
> my
> > > > > > producer
> > > > > > > > performance (which may or may not be related to the issue
> > above).
> > > > > > > >
> > > > > > > > When I have defined a topic with 1 partition I got much
> better
> > > > > > throughput
> > > > > > > > comparing to a topic with 3 partitions. A producer sending
> > > messages
> > > > > to
> > > > > > a
> > > > > > > > topic with 3 partitions had much better throughput comparing
> > to a
> > > > > topic
> > > > > > > > with 12 partitions.
> > > > > > > >
> > > > > > > > I would expect to have best performance for the topic with 12
> > > > > > partitions
> > > > > > > > since I have 3 machines running a broker each of with 4 disks
> > > (the
> > > > > > broker
> > > > > > > > is configured to use all 4 disks)
> > > > > > > >
> > > > > > > > *Is there any logical explanation for this behavior?*
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Chris Hogue <cs...@gmail.com>.
Have you found what the actual bottleneck is? Is it the network send? Of
course this would be highly influenced by the brokers' performance. After
removing all compression work from the brokers we were able to get enough
throughput from them that it's not really a concern.

Another rough side-effect of the single synchronous send thread is that a
single degrading or otherwise slow broker can back up the producing for the
whole app. I haven't heard a great solution to this but would love to if
someone's come up with it.

-Chris



On Wed, Jan 1, 2014 at 9:10 AM, Gerrit Jansen van Vuuren <
gerritjvv@gmail.com> wrote:

> I've seen this bottle neck regardless of using compression or not, bpth
> situations give me poor performance on sending to kafka via the scala
> producer api.
> On 1 Jan 2014 16:42, "Chris Hogue" <cs...@gmail.com> wrote:
>
> > Hi.
> >
> > When writing that blog we were using Kafka 0.7 as well. Understanding
> that
> > it probably wasn't the primary design goal, the separate send threads per
> > broker that offered a separation of compression were a convenient
> > side-effect of that design.
> >
> > We've since built new systems on 0.8 that have concentrated high
> throughput
> > on a small number of producers and had this discovery early on as well.
> >
> > Instead we've taken responsibility for the compression before the
> producer
> > and done that on separate threads as appropriate. While helpful for
> > compression on the producer application the main reason for this is to
> > prevent the broker from uncompressing and re-compressing each message as
> it
> > assigns offsets. There's a significant throughput advantage in doing
> this.
> >
> > Truthfully since switching to snappy the compression throughput on the
> > producer is much less of a concern in the overall context of the
> > application.
> >
> > There was some discussion of these issues in the 'Client Improvement
> > Discussion' thread a while ago where Jay provided some insight and
> > discussion on future directions.
> >
> > -Chris
> >
> >
> >
> >
> > On Wed, Jan 1, 2014 at 5:42 AM, yosi botzer <yo...@gmail.com>
> wrote:
> >
> > > This is very interesting, this is what I see as well. I wish someone
> > could
> > > explain why it is not as explained here:
> > > http://engineering.gnip.com/kafka-async-producer/
> > >
> > >
> > > On Wed, Jan 1, 2014 at 2:39 PM, Gerrit Jansen van Vuuren <
> > > gerritjvv@gmail.com> wrote:
> > >
> > > > I don't know the code enough to comment on that (maybe someone else
> on
> > > the
> > > > user list can do that), but from what I've seen doing some heavy
> > > profiling
> > > > I only see one thread per producer instance, it doesn't matter how
> many
> > > > brokers or topics you have the number of threads is always 1 per
> > > producer.
> > > > If you create 2 producers 2 threads and so on.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Jan 1, 2014 at 1:27 PM, yosi botzer <yo...@gmail.com>
> > > wrote:
> > > >
> > > > > But shouldn't I see a separate thread per broker (I am using the
> > async
> > > > > mode)?  Why do I get a better performance sending a message that
> has
> > > > fewer
> > > > > partitions?
> > > > >
> > > > >
> > > > > On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
> > > > > gerritjvv@gmail.com> wrote:
> > > > >
> > > > > > The producer is heavily synchronized (i.e. all the code in the
> send
> > > > > method
> > > > > > is encapsulated in one huge synchronized block).
> > > > > > Try creating multiple producers and round robin send over them.
> > > > > >
> > > > > > e.g.
> > > > > >
> > > > > > p = producers[ n++ % producers.length ]
> > > > > >
> > > > > > p.send msg
> > > > > > This will give you one thread per producer instance.
> > > > > >
> > > > > > I'm working on an async multi threaded producer for kafka, but
> its
> > > > > nothing
> > > > > > near complete yet.
> > > > > > https://github.com/gerritjvv/kafka-fast
> > > > > >
> > > > > >
> > > > > > Regards,
> > > > > >  Gerrit
> > > > > >
> > > > > >
> > > > > > On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <
> yosi.botzer@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am using kafka 0.8. I have 3 machines each running kafka
> > broker.
> > > > > > >
> > > > > > > I am using async mode of my Producer. I expected to see 3
> > different
> > > > > > threads
> > > > > > > with names starting with ProducerSendThread- (according to this
> > > > > article:
> > > > > > > http://engineering.gnip.com/kafka-async-producer/)
> > > > > > >
> > > > > > > However I can see only one thread with the name
> > > *ProducerSendThread-*
> > > > > > >
> > > > > > > This is my producer configuration:
> > > > > > >
> > > > > > > server=1
> > > > > > > topic=dat7
> > > > > > > metadata.broker.list=
> > > > > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > > > > request.required.acks=1
> > > > > > > compression.codec=snappy
> > > > > > > producer.type=async
> > > > > > > queue.buffering.max.ms=2000
> > > > > > > queue.buffering.max.messages=1000
> > > > > > > batch.num.messages=500
> > > > > > >
> > > > > > >
> > > > > > > *What am I missing here?*
> > > > > > >
> > > > > > >
> > > > > > > BTW, I have also experienced very strange behavior regrading my
> > > > > producer
> > > > > > > performance (which may or may not be related to the issue
> above).
> > > > > > >
> > > > > > > When I have defined a topic with 1 partition I got much better
> > > > > throughput
> > > > > > > comparing to a topic with 3 partitions. A producer sending
> > messages
> > > > to
> > > > > a
> > > > > > > topic with 3 partitions had much better throughput comparing
> to a
> > > > topic
> > > > > > > with 12 partitions.
> > > > > > >
> > > > > > > I would expect to have best performance for the topic with 12
> > > > > partitions
> > > > > > > since I have 3 machines running a broker each of with 4 disks
> > (the
> > > > > broker
> > > > > > > is configured to use all 4 disks)
> > > > > > >
> > > > > > > *Is there any logical explanation for this behavior?*
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Gerrit Jansen van Vuuren <ge...@gmail.com>.
I've seen this bottle neck regardless of using compression or not, bpth
situations give me poor performance on sending to kafka via the scala
producer api.
On 1 Jan 2014 16:42, "Chris Hogue" <cs...@gmail.com> wrote:

> Hi.
>
> When writing that blog we were using Kafka 0.7 as well. Understanding that
> it probably wasn't the primary design goal, the separate send threads per
> broker that offered a separation of compression were a convenient
> side-effect of that design.
>
> We've since built new systems on 0.8 that have concentrated high throughput
> on a small number of producers and had this discovery early on as well.
>
> Instead we've taken responsibility for the compression before the producer
> and done that on separate threads as appropriate. While helpful for
> compression on the producer application the main reason for this is to
> prevent the broker from uncompressing and re-compressing each message as it
> assigns offsets. There's a significant throughput advantage in doing this.
>
> Truthfully since switching to snappy the compression throughput on the
> producer is much less of a concern in the overall context of the
> application.
>
> There was some discussion of these issues in the 'Client Improvement
> Discussion' thread a while ago where Jay provided some insight and
> discussion on future directions.
>
> -Chris
>
>
>
>
> On Wed, Jan 1, 2014 at 5:42 AM, yosi botzer <yo...@gmail.com> wrote:
>
> > This is very interesting, this is what I see as well. I wish someone
> could
> > explain why it is not as explained here:
> > http://engineering.gnip.com/kafka-async-producer/
> >
> >
> > On Wed, Jan 1, 2014 at 2:39 PM, Gerrit Jansen van Vuuren <
> > gerritjvv@gmail.com> wrote:
> >
> > > I don't know the code enough to comment on that (maybe someone else on
> > the
> > > user list can do that), but from what I've seen doing some heavy
> > profiling
> > > I only see one thread per producer instance, it doesn't matter how many
> > > brokers or topics you have the number of threads is always 1 per
> > producer.
> > > If you create 2 producers 2 threads and so on.
> > >
> > >
> > >
> > >
> > >
> > > On Wed, Jan 1, 2014 at 1:27 PM, yosi botzer <yo...@gmail.com>
> > wrote:
> > >
> > > > But shouldn't I see a separate thread per broker (I am using the
> async
> > > > mode)?  Why do I get a better performance sending a message that has
> > > fewer
> > > > partitions?
> > > >
> > > >
> > > > On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
> > > > gerritjvv@gmail.com> wrote:
> > > >
> > > > > The producer is heavily synchronized (i.e. all the code in the send
> > > > method
> > > > > is encapsulated in one huge synchronized block).
> > > > > Try creating multiple producers and round robin send over them.
> > > > >
> > > > > e.g.
> > > > >
> > > > > p = producers[ n++ % producers.length ]
> > > > >
> > > > > p.send msg
> > > > > This will give you one thread per producer instance.
> > > > >
> > > > > I'm working on an async multi threaded producer for kafka, but its
> > > > nothing
> > > > > near complete yet.
> > > > > https://github.com/gerritjvv/kafka-fast
> > > > >
> > > > >
> > > > > Regards,
> > > > >  Gerrit
> > > > >
> > > > >
> > > > > On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <yosi.botzer@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am using kafka 0.8. I have 3 machines each running kafka
> broker.
> > > > > >
> > > > > > I am using async mode of my Producer. I expected to see 3
> different
> > > > > threads
> > > > > > with names starting with ProducerSendThread- (according to this
> > > > article:
> > > > > > http://engineering.gnip.com/kafka-async-producer/)
> > > > > >
> > > > > > However I can see only one thread with the name
> > *ProducerSendThread-*
> > > > > >
> > > > > > This is my producer configuration:
> > > > > >
> > > > > > server=1
> > > > > > topic=dat7
> > > > > > metadata.broker.list=
> > > > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > > > request.required.acks=1
> > > > > > compression.codec=snappy
> > > > > > producer.type=async
> > > > > > queue.buffering.max.ms=2000
> > > > > > queue.buffering.max.messages=1000
> > > > > > batch.num.messages=500
> > > > > >
> > > > > >
> > > > > > *What am I missing here?*
> > > > > >
> > > > > >
> > > > > > BTW, I have also experienced very strange behavior regrading my
> > > > producer
> > > > > > performance (which may or may not be related to the issue above).
> > > > > >
> > > > > > When I have defined a topic with 1 partition I got much better
> > > > throughput
> > > > > > comparing to a topic with 3 partitions. A producer sending
> messages
> > > to
> > > > a
> > > > > > topic with 3 partitions had much better throughput comparing to a
> > > topic
> > > > > > with 12 partitions.
> > > > > >
> > > > > > I would expect to have best performance for the topic with 12
> > > > partitions
> > > > > > since I have 3 machines running a broker each of with 4 disks
> (the
> > > > broker
> > > > > > is configured to use all 4 disks)
> > > > > >
> > > > > > *Is there any logical explanation for this behavior?*
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Chris Hogue <cs...@gmail.com>.
Hi.

When writing that blog we were using Kafka 0.7 as well. Understanding that
it probably wasn't the primary design goal, the separate send threads per
broker that offered a separation of compression were a convenient
side-effect of that design.

We've since built new systems on 0.8 that have concentrated high throughput
on a small number of producers and had this discovery early on as well.

Instead we've taken responsibility for the compression before the producer
and done that on separate threads as appropriate. While helpful for
compression on the producer application the main reason for this is to
prevent the broker from uncompressing and re-compressing each message as it
assigns offsets. There's a significant throughput advantage in doing this.

Truthfully since switching to snappy the compression throughput on the
producer is much less of a concern in the overall context of the
application.

There was some discussion of these issues in the 'Client Improvement
Discussion' thread a while ago where Jay provided some insight and
discussion on future directions.

-Chris




On Wed, Jan 1, 2014 at 5:42 AM, yosi botzer <yo...@gmail.com> wrote:

> This is very interesting, this is what I see as well. I wish someone could
> explain why it is not as explained here:
> http://engineering.gnip.com/kafka-async-producer/
>
>
> On Wed, Jan 1, 2014 at 2:39 PM, Gerrit Jansen van Vuuren <
> gerritjvv@gmail.com> wrote:
>
> > I don't know the code enough to comment on that (maybe someone else on
> the
> > user list can do that), but from what I've seen doing some heavy
> profiling
> > I only see one thread per producer instance, it doesn't matter how many
> > brokers or topics you have the number of threads is always 1 per
> producer.
> > If you create 2 producers 2 threads and so on.
> >
> >
> >
> >
> >
> > On Wed, Jan 1, 2014 at 1:27 PM, yosi botzer <yo...@gmail.com>
> wrote:
> >
> > > But shouldn't I see a separate thread per broker (I am using the async
> > > mode)?  Why do I get a better performance sending a message that has
> > fewer
> > > partitions?
> > >
> > >
> > > On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
> > > gerritjvv@gmail.com> wrote:
> > >
> > > > The producer is heavily synchronized (i.e. all the code in the send
> > > method
> > > > is encapsulated in one huge synchronized block).
> > > > Try creating multiple producers and round robin send over them.
> > > >
> > > > e.g.
> > > >
> > > > p = producers[ n++ % producers.length ]
> > > >
> > > > p.send msg
> > > > This will give you one thread per producer instance.
> > > >
> > > > I'm working on an async multi threaded producer for kafka, but its
> > > nothing
> > > > near complete yet.
> > > > https://github.com/gerritjvv/kafka-fast
> > > >
> > > >
> > > > Regards,
> > > >  Gerrit
> > > >
> > > >
> > > > On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <yo...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > I am using kafka 0.8. I have 3 machines each running kafka broker.
> > > > >
> > > > > I am using async mode of my Producer. I expected to see 3 different
> > > > threads
> > > > > with names starting with ProducerSendThread- (according to this
> > > article:
> > > > > http://engineering.gnip.com/kafka-async-producer/)
> > > > >
> > > > > However I can see only one thread with the name
> *ProducerSendThread-*
> > > > >
> > > > > This is my producer configuration:
> > > > >
> > > > > server=1
> > > > > topic=dat7
> > > > > metadata.broker.list=
> > > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > > request.required.acks=1
> > > > > compression.codec=snappy
> > > > > producer.type=async
> > > > > queue.buffering.max.ms=2000
> > > > > queue.buffering.max.messages=1000
> > > > > batch.num.messages=500
> > > > >
> > > > >
> > > > > *What am I missing here?*
> > > > >
> > > > >
> > > > > BTW, I have also experienced very strange behavior regrading my
> > > producer
> > > > > performance (which may or may not be related to the issue above).
> > > > >
> > > > > When I have defined a topic with 1 partition I got much better
> > > throughput
> > > > > comparing to a topic with 3 partitions. A producer sending messages
> > to
> > > a
> > > > > topic with 3 partitions had much better throughput comparing to a
> > topic
> > > > > with 12 partitions.
> > > > >
> > > > > I would expect to have best performance for the topic with 12
> > > partitions
> > > > > since I have 3 machines running a broker each of with 4 disks (the
> > > broker
> > > > > is configured to use all 4 disks)
> > > > >
> > > > > *Is there any logical explanation for this behavior?*
> > > > >
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by yosi botzer <yo...@gmail.com>.
This is very interesting, this is what I see as well. I wish someone could
explain why it is not as explained here:
http://engineering.gnip.com/kafka-async-producer/


On Wed, Jan 1, 2014 at 2:39 PM, Gerrit Jansen van Vuuren <
gerritjvv@gmail.com> wrote:

> I don't know the code enough to comment on that (maybe someone else on the
> user list can do that), but from what I've seen doing some heavy profiling
> I only see one thread per producer instance, it doesn't matter how many
> brokers or topics you have the number of threads is always 1 per producer.
> If you create 2 producers 2 threads and so on.
>
>
>
>
>
> On Wed, Jan 1, 2014 at 1:27 PM, yosi botzer <yo...@gmail.com> wrote:
>
> > But shouldn't I see a separate thread per broker (I am using the async
> > mode)?  Why do I get a better performance sending a message that has
> fewer
> > partitions?
> >
> >
> > On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
> > gerritjvv@gmail.com> wrote:
> >
> > > The producer is heavily synchronized (i.e. all the code in the send
> > method
> > > is encapsulated in one huge synchronized block).
> > > Try creating multiple producers and round robin send over them.
> > >
> > > e.g.
> > >
> > > p = producers[ n++ % producers.length ]
> > >
> > > p.send msg
> > > This will give you one thread per producer instance.
> > >
> > > I'm working on an async multi threaded producer for kafka, but its
> > nothing
> > > near complete yet.
> > > https://github.com/gerritjvv/kafka-fast
> > >
> > >
> > > Regards,
> > >  Gerrit
> > >
> > >
> > > On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <yo...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using kafka 0.8. I have 3 machines each running kafka broker.
> > > >
> > > > I am using async mode of my Producer. I expected to see 3 different
> > > threads
> > > > with names starting with ProducerSendThread- (according to this
> > article:
> > > > http://engineering.gnip.com/kafka-async-producer/)
> > > >
> > > > However I can see only one thread with the name *ProducerSendThread-*
> > > >
> > > > This is my producer configuration:
> > > >
> > > > server=1
> > > > topic=dat7
> > > > metadata.broker.list=
> > > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > > serializer.class=kafka.serializer.DefaultEncoder
> > > > request.required.acks=1
> > > > compression.codec=snappy
> > > > producer.type=async
> > > > queue.buffering.max.ms=2000
> > > > queue.buffering.max.messages=1000
> > > > batch.num.messages=500
> > > >
> > > >
> > > > *What am I missing here?*
> > > >
> > > >
> > > > BTW, I have also experienced very strange behavior regrading my
> > producer
> > > > performance (which may or may not be related to the issue above).
> > > >
> > > > When I have defined a topic with 1 partition I got much better
> > throughput
> > > > comparing to a topic with 3 partitions. A producer sending messages
> to
> > a
> > > > topic with 3 partitions had much better throughput comparing to a
> topic
> > > > with 12 partitions.
> > > >
> > > > I would expect to have best performance for the topic with 12
> > partitions
> > > > since I have 3 machines running a broker each of with 4 disks (the
> > broker
> > > > is configured to use all 4 disks)
> > > >
> > > > *Is there any logical explanation for this behavior?*
> > > >
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Gerrit Jansen van Vuuren <ge...@gmail.com>.
I don't know the code enough to comment on that (maybe someone else on the
user list can do that), but from what I've seen doing some heavy profiling
I only see one thread per producer instance, it doesn't matter how many
brokers or topics you have the number of threads is always 1 per producer.
If you create 2 producers 2 threads and so on.





On Wed, Jan 1, 2014 at 1:27 PM, yosi botzer <yo...@gmail.com> wrote:

> But shouldn't I see a separate thread per broker (I am using the async
> mode)?  Why do I get a better performance sending a message that has fewer
> partitions?
>
>
> On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
> gerritjvv@gmail.com> wrote:
>
> > The producer is heavily synchronized (i.e. all the code in the send
> method
> > is encapsulated in one huge synchronized block).
> > Try creating multiple producers and round robin send over them.
> >
> > e.g.
> >
> > p = producers[ n++ % producers.length ]
> >
> > p.send msg
> > This will give you one thread per producer instance.
> >
> > I'm working on an async multi threaded producer for kafka, but its
> nothing
> > near complete yet.
> > https://github.com/gerritjvv/kafka-fast
> >
> >
> > Regards,
> >  Gerrit
> >
> >
> > On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <yo...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > I am using kafka 0.8. I have 3 machines each running kafka broker.
> > >
> > > I am using async mode of my Producer. I expected to see 3 different
> > threads
> > > with names starting with ProducerSendThread- (according to this
> article:
> > > http://engineering.gnip.com/kafka-async-producer/)
> > >
> > > However I can see only one thread with the name *ProducerSendThread-*
> > >
> > > This is my producer configuration:
> > >
> > > server=1
> > > topic=dat7
> > > metadata.broker.list=
> > > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > > serializer.class=kafka.serializer.DefaultEncoder
> > > request.required.acks=1
> > > compression.codec=snappy
> > > producer.type=async
> > > queue.buffering.max.ms=2000
> > > queue.buffering.max.messages=1000
> > > batch.num.messages=500
> > >
> > >
> > > *What am I missing here?*
> > >
> > >
> > > BTW, I have also experienced very strange behavior regrading my
> producer
> > > performance (which may or may not be related to the issue above).
> > >
> > > When I have defined a topic with 1 partition I got much better
> throughput
> > > comparing to a topic with 3 partitions. A producer sending messages to
> a
> > > topic with 3 partitions had much better throughput comparing to a topic
> > > with 12 partitions.
> > >
> > > I would expect to have best performance for the topic with 12
> partitions
> > > since I have 3 machines running a broker each of with 4 disks (the
> broker
> > > is configured to use all 4 disks)
> > >
> > > *Is there any logical explanation for this behavior?*
> > >
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by yosi botzer <yo...@gmail.com>.
But shouldn't I see a separate thread per broker (I am using the async
mode)?  Why do I get a better performance sending a message that has fewer
partitions?


On Wed, Jan 1, 2014 at 2:22 PM, Gerrit Jansen van Vuuren <
gerritjvv@gmail.com> wrote:

> The producer is heavily synchronized (i.e. all the code in the send method
> is encapsulated in one huge synchronized block).
> Try creating multiple producers and round robin send over them.
>
> e.g.
>
> p = producers[ n++ % producers.length ]
>
> p.send msg
> This will give you one thread per producer instance.
>
> I'm working on an async multi threaded producer for kafka, but its nothing
> near complete yet.
> https://github.com/gerritjvv/kafka-fast
>
>
> Regards,
>  Gerrit
>
>
> On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <yo...@gmail.com> wrote:
>
> > Hi,
> >
> > I am using kafka 0.8. I have 3 machines each running kafka broker.
> >
> > I am using async mode of my Producer. I expected to see 3 different
> threads
> > with names starting with ProducerSendThread- (according to this article:
> > http://engineering.gnip.com/kafka-async-producer/)
> >
> > However I can see only one thread with the name *ProducerSendThread-*
> >
> > This is my producer configuration:
> >
> > server=1
> > topic=dat7
> > metadata.broker.list=
> > ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> > ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> > ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> > serializer.class=kafka.serializer.DefaultEncoder
> > request.required.acks=1
> > compression.codec=snappy
> > producer.type=async
> > queue.buffering.max.ms=2000
> > queue.buffering.max.messages=1000
> > batch.num.messages=500
> >
> >
> > *What am I missing here?*
> >
> >
> > BTW, I have also experienced very strange behavior regrading my producer
> > performance (which may or may not be related to the issue above).
> >
> > When I have defined a topic with 1 partition I got much better throughput
> > comparing to a topic with 3 partitions. A producer sending messages to a
> > topic with 3 partitions had much better throughput comparing to a topic
> > with 12 partitions.
> >
> > I would expect to have best performance for the topic with 12 partitions
> > since I have 3 machines running a broker each of with 4 disks (the broker
> > is configured to use all 4 disks)
> >
> > *Is there any logical explanation for this behavior?*
> >
>

Re: only one ProducerSendThread thread when running with multiple brokers (kafka 0.8)

Posted by Gerrit Jansen van Vuuren <ge...@gmail.com>.
The producer is heavily synchronized (i.e. all the code in the send method
is encapsulated in one huge synchronized block).
Try creating multiple producers and round robin send over them.

e.g.

p = producers[ n++ % producers.length ]

p.send msg
This will give you one thread per producer instance.

I'm working on an async multi threaded producer for kafka, but its nothing
near complete yet.
https://github.com/gerritjvv/kafka-fast


Regards,
 Gerrit


On Wed, Jan 1, 2014 at 1:17 PM, yosi botzer <yo...@gmail.com> wrote:

> Hi,
>
> I am using kafka 0.8. I have 3 machines each running kafka broker.
>
> I am using async mode of my Producer. I expected to see 3 different threads
> with names starting with ProducerSendThread- (according to this article:
> http://engineering.gnip.com/kafka-async-producer/)
>
> However I can see only one thread with the name *ProducerSendThread-*
>
> This is my producer configuration:
>
> server=1
> topic=dat7
> metadata.broker.list=
> ec2-54-245-111-112.us-west-2.compute.amazonaws.com:9092
> ,ec2-54-245-111-69.us-west-2.compute.amazonaws.com:9092,
> ec2-54-218-183-14.us-west-2.compute.amazonaws.com:9092
> serializer.class=kafka.serializer.DefaultEncoder
> request.required.acks=1
> compression.codec=snappy
> producer.type=async
> queue.buffering.max.ms=2000
> queue.buffering.max.messages=1000
> batch.num.messages=500
>
>
> *What am I missing here?*
>
>
> BTW, I have also experienced very strange behavior regrading my producer
> performance (which may or may not be related to the issue above).
>
> When I have defined a topic with 1 partition I got much better throughput
> comparing to a topic with 3 partitions. A producer sending messages to a
> topic with 3 partitions had much better throughput comparing to a topic
> with 12 partitions.
>
> I would expect to have best performance for the topic with 12 partitions
> since I have 3 machines running a broker each of with 4 disks (the broker
> is configured to use all 4 disks)
>
> *Is there any logical explanation for this behavior?*
>