You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Andrew Neilson <ar...@gmail.com> on 2013/04/21 10:59:39 UTC

seeing poor consumer performance in 0.7.2

I am currently running a deployment with 3 brokers, 3 ZK, 3 producers, 2
consumers, and 15 topics. I should first point out that this is my first
project using Kafka ;). The issue I'm seeing is that the consumers are only
processing about 15 messages per second from what should be the largest
topic it is consuming (we're sending 200-400 ~300 byte messages per second
to this topic). I should note that I'm using a high level ZK consumer and
ZK 3.4.3.

I have a strong feeling I have not configured things properly so I could
definitely use some guidance. Here is my broker configuration:

brokerid=1
port=9092
socket.send.buffer=1048576
socket.receive.buffer=1048576
max.socket.request.bytes=104857600
log.dir=/home/kafka/data
num.partitions=1
log.flush.interval=10000
log.default.flush.interval.ms=1000
log.default.flush.scheduler.interval.ms=1000
log.retention.hours=168
log.file.size=536870912
enable.zookeeper=true
zk.connect=XXX
zk.connectiontimeout.ms=1000000

Here is my producer config:

zk.connect=XXX
producer.type=async
compression.codec=0

Here is my consumer config:

zk.connect=XXX
zk.connectiontimeout.ms=100000
groupid=XXX
autooffset.reset=smallest
socket.buffersize=1048576
fetch.size=10485760
queuedchunks.max=10000

Thanks for any assistance you can provide,

Andrew

Re: seeing poor consumer performance in 0.7.2

Posted by Andrew Neilson <ar...@gmail.com>.

Oh... and at this point I'm talking about consumers that do no processing
and don't even produce any output. They simply send udp packets to graphite.


On Mon, Apr 22, 2013 at 9:13 PM, Andrew Neilson <ar...@gmail.com>wrote:

> Hmm it is highly unlikely that that is the culprit... There is lots of
> bandwidth available for me to use. I will definitely keep that in mind
> though. I was working on this today and have some tidbits of additional
> information and thoughts that you might be able to shed some light on:
>
>    - I mentioned I have 2 consumers, but each consumer is running with 8
>    threads for this topic (and each consumer has 8 cores available).
>    - When I initially asked for help the brokers were configured with
>    num.partitions=1, I've since tried higher numbers (3, 64) and haven't seen
>    much of an improvement aside from forcing both consumer apps to handle
>    messages (with the overall performance not changing much).
>    - I ran into this article
>    http://riccomini.name/posts/kafka/2012-10-05-kafka-consumer-memory-tuning/and tried a variety of options of queuedchunks.max and fetch.size with no
>    significant results (simply meaning it did not achieve the goal of me
>    constantly processing hundreds or thousands of messages per second, which
>    is similar to the rate of input). I would not be surprised if I'm wrong but
>    this made me start to think that the problem may lie outside of the
>    consumers
>    - Would the combination of a high number of partitions (64) and a high
>    log.flush.interval (10k) prevent logs from flushing as often as they need
>    to for my desired rate of consumption (even with
>    log.default.flush.interval.ms=1000?)
>
> Despite the changes I mentioned the behaviour is still the consumers
> receiving larger spikes of messages mixed with periods of complete
> inactivity and overall a long delay between messages being written and
> messages being read (about 2 minutes). Anyway... as always I greatly
> appreciate any help.
>
> On Sun, Apr 21, 2013 at 8:50 PM, Jun Rao <ju...@gmail.com> wrote:
>
>> Is your network shared? Is so, another possibility is that some other apps
>> are consuming the bandwidth.
>>
>> Thanks,
>>
>> Jun
>>
>>
>> On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson <arsneilson@gmail.com
>> >wrote:
>>
>> > Thanks very much for the reply Neha! So I swapped out the consumer that
>> > processes the messages with one that just prints them. It does indeed
>> > achieve a much better rate at peaks but can still nearly zero out (if
>> not
>> > completely zero out). I plotted the messages printed in graphite to show
>> > the behaviour I'm seeing (this is messages printed per second):
>> >
>> >
>> >
>> https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png
>> >
>> > The peaks are over ten thousand per second and the troughs can go below
>> 10
>> > per second just prior to another peak. I know that there are plenty of
>> > messages available because the ones currently being processed are still
>> > from Friday afternoon, so this may or may not have something to do with
>> > this pattern.
>> >
>> > Is there anything I can do to avoid the periods of lower performance?
>> > Ideally I would be processing messages as soon as they are written.
>> >
>> >
>> > On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <neha.narkhede@gmail.com
>> > >wrote:
>> >
>> > > Some of the reasons a consumer is slow are -
>> > > 1. Small fetch size
>> > > 2. Expensive message processing
>> > >
>> > > Are you processing the received messages in the consumer ? Have you
>> > > tried running console consumer for this topic and see how it performs
>> > > ?
>> > >
>> > > Thanks,
>> > > Neha
>> > >
>> > > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <arsneilson@gmail.com
>> >
>> > > wrote:
>> > > > I am currently running a deployment with 3 brokers, 3 ZK, 3
>> producers,
>> > 2
>> > > > consumers, and 15 topics. I should first point out that this is my
>> > first
>> > > > project using Kafka ;). The issue I'm seeing is that the consumers
>> are
>> > > only
>> > > > processing about 15 messages per second from what should be the
>> largest
>> > > > topic it is consuming (we're sending 200-400 ~300 byte messages per
>> > > second
>> > > > to this topic). I should note that I'm using a high level ZK
>> consumer
>> > and
>> > > > ZK 3.4.3.
>> > > >
>> > > > I have a strong feeling I have not configured things properly so I
>> > could
>> > > > definitely use some guidance. Here is my broker configuration:
>> > > >
>> > > > brokerid=1
>> > > > port=9092
>> > > > socket.send.buffer=1048576
>> > > > socket.receive.buffer=1048576
>> > > > max.socket.request.bytes=104857600
>> > > > log.dir=/home/kafka/data
>> > > > num.partitions=1
>> > > > log.flush.interval=10000
>> > > > log.default.flush.interval.ms=1000
>> > > > log.default.flush.scheduler.interval.ms=1000
>> > > > log.retention.hours=168
>> > > > log.file.size=536870912
>> > > > enable.zookeeper=true
>> > > > zk.connect=XXX
>> > > > zk.connectiontimeout.ms=1000000
>> > > >
>> > > > Here is my producer config:
>> > > >
>> > > > zk.connect=XXX
>> > > > producer.type=async
>> > > > compression.codec=0
>> > > >
>> > > > Here is my consumer config:
>> > > >
>> > > > zk.connect=XXX
>> > > > zk.connectiontimeout.ms=100000
>> > > > groupid=XXX
>> > > > autooffset.reset=smallest
>> > > > socket.buffersize=1048576
>> > > > fetch.size=10485760
>> > > > queuedchunks.max=10000
>> > > >
>> > > > Thanks for any assistance you can provide,
>> > > >
>> > > > Andrew
>> > >
>> >
>>
>
>

Re: seeing poor consumer performance in 0.7.2

Posted by Andrew Neilson <ar...@gmail.com>.

The only other thing being written to these disks is log4j (kafka.out), so
technically it is not dedicated to the data logs. The disks are 250GB SATA.


On Fri, Apr 26, 2013 at 6:35 PM, Neha Narkhede <ne...@gmail.com>wrote:

> >    - Decreased num.partitions and log.flush.interval on the brokers from
> >    64/10k to 32/100 in order to lower the average flush time (we were
> >    previously always hitting the default flush interval since no
> partitions
>
> Hmm, that is a pretty low value for flush interval leading to higher disk
> usage. Do you use dedicated disks for kafka data logs ? Also what sort of
> disks do you use ?
>
> Thanks,
> Neha
>
> >
> >
> > On Tue, Apr 23, 2013 at 7:53 AM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > You can run kafka.tools.ConsumerOffsetChecker to check the consumer
> lag. If
> > > the consumer is lagging, this indicates a problem on the consumer side.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Mon, Apr 22, 2013 at 9:13 PM, Andrew Neilson <arsneilson@gmail.com
> > > >wrote:
> > >
> > > > Hmm it is highly unlikely that that is the culprit... There is lots
> of
> > > > bandwidth available for me to use. I will definitely keep that in
> mind
> > > > though. I was working on this today and have some tidbits of
> additional
> > > > information and thoughts that you might be able to shed some light
> on:
> > > >
> > > >    - I mentioned I have 2 consumers, but each consumer is running
> with 8
> > > >    threads for this topic (and each consumer has 8 cores available).
> > > >    - When I initially asked for help the brokers were configured with
> > > >    num.partitions=1, I've since tried higher numbers (3, 64) and
> haven't
> > > > seen
> > > >    much of an improvement aside from forcing both consumer apps to
> handle
> > > >    messages (with the overall performance not changing much).
> > > >    - I ran into this article
> > > >
> > > >
> > >
>
> http://riccomini.name/posts/kafka/2012-10-05-kafka-consumer-memory-tuning/and
> > > > tried a variety of options of queuedchunks.max and fetch.size with no
> > > >    significant results (simply meaning it did not achieve the goal of
> me
> > > >    constantly processing hundreds or thousands of messages per
> second,
> > > > which
> > > >    is similar to the rate of input). I would not be surprised if I'm
> > > wrong
> > > > but
> > > >    this made me start to think that the problem may lie outside of
> the
> > > >    consumers
> > > >    - Would the combination of a high number of partitions (64) and a
> high
> > > >    log.flush.interval (10k) prevent logs from flushing as often as
> they
> > > > need
> > > >    to for my desired rate of consumption (even with
> > > >    log.default.flush.interval.ms=1000?)
> > > >
> > > > Despite the changes I mentioned the behaviour is still the consumers
> > > > receiving larger spikes of messages mixed with periods of complete
> > > > inactivity and overall a long delay between messages being written
> and
> > > > messages being read (about 2 minutes). Anyway... as always I greatly
> > > > appreciate any help.
> > > >
> > > > On Sun, Apr 21, 2013 at 8:50 PM, Jun Rao <ju...@gmail.com> wrote:
> > > >
> > > > > Is your network shared? Is so, another possibility is that some
> other
> > > > apps
> > > > > are consuming the bandwidth.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > >
> > > > > On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson <
> arsneilson@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Thanks very much for the reply Neha! So I swapped out the
> consumer
> > > that
> > > > > > processes the messages with one that just prints them. It does
> indeed
> > > > > > achieve a much better rate at peaks but can still nearly zero out
> (if
> > > > not
> > > > > > completely zero out). I plotted the messages printed in graphite
> to
> > > > show
> > > > > > the behaviour I'm seeing (this is messages printed per second):
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
>
> https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png
> > > > > >
> > > > > > The peaks are over ten thousand per second and the troughs can go
> > > below
> > > > > 10
> > > > > > per second just prior to another peak. I know that there are
> plenty
> > > of
> > > > > > messages available because the ones currently being processed are
> > > still
> > > > > > from Friday afternoon, so this may or may not have something to
> do
> > > with
> > > > > > this pattern.
> > > > > >
> > > > > > Is there anything I can do to avoid the periods of lower
> performance?
> > > > > > Ideally I would be processing messages as soon as they are
> written.
> > > > > >
> > > > > >
> > > > > > On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <
> > > > neha.narkhede@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Some of the reasons a consumer is slow are -
> > > > > > > 1. Small fetch size
> > > > > > > 2. Expensive message processing
> > > > > > >
> > > > > > > Are you processing the received messages in the consumer ? Have
> you
> > > > > > > tried running console consumer for this topic and see how it
> > > performs
> > > > > > > ?
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Neha
> > > > > > >
> > > > > > > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <
> > > > arsneilson@gmail.com>
> > > > > > > wrote:
> > > > > > > > I am currently running a deployment with 3 brokers, 3 ZK, 3
> > > > > producers,
> > > > > > 2
> > > > > > > > consumers, and 15 topics. I should first point out that this
> is
> > > my
> > > > > > first
> > > > > > > > project using Kafka ;). The issue I'm seeing is that the
> > > consumers
> > > > > are
> > > > > > > only
> > > > > > > > processing about 15 messages per second from what should be
> the
> > > > > largest
> > > > > > > > topic it is consuming (we're sending 200-400 ~300 byte
> messages
> > > per
> > > > > > > second
> > > > > > > > to this topic). I should note that I'm using a high level ZK
> > > > consumer
> > > > > > and
> > > > > > > > ZK 3.4.3.
> > > > > > > >
> > > > > > > > I have a strong feeling I have not configured things properly
> so
> > > I
> > > > > > could
> > > > > > > > definitely use some guidance. Here is my broker
> configuration:
> > > > > > > >
> > > > > > > > brokerid=1
> > > > > > > > port=9092
> > > > > > > > socket.send.buffer=1048576
> > > > > > > > socket.receive.buffer=1048576
> > > > > > > > max.socket.request.bytes=104857600
> > > > > > > > log.dir=/home/kafka/data
> > > > > > > > num.partitions=1
> > > > > > > > log.flush.interval=10000
> > > > > > > > log.default.flush.interval.ms=1000
> > > > > > > > log.default.flush.scheduler.interval.ms=1000
> > > > > > > > log.retention.hours=168
> > > > > > > > log.file.size=536870912
> > > > > > > > enable.zookeeper=true
> > > > > > > > zk.connect=XXX
> > > > > > > > zk.connectiontimeout.ms=1000000
> > > > > > > >
> > > > > > > > Here is my producer config:
> > > > > > > >
> > > > > > > > zk.connect=XXX
> > > > > > > > producer.type=async
> > > > > > > > compression.codec=0
> > > > > > > >
> > > > > > > > Here is my consumer config:
> > > > > > > >
> > > > > > > > zk.connect=XXX
> > > > > > > > zk.connectiontimeout.ms=100000
> > > > > > > > groupid=XXX
> > > > > > > > autooffset.reset=smallest
> > > > > > > > socket.buffersize=1048576
> > > > > > > > fetch.size=10485760
> > > > > > > > queuedchunks.max=10000
> > > > > > > >
> > > > > > > > Thanks for any assistance you can provide,
> > > > > > > >
> > > > > > > > Andrew
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
>

Re: seeing poor consumer performance in 0.7.2

Posted by Neha Narkhede <ne...@gmail.com>.

>    - Decreased num.partitions and log.flush.interval on the brokers from
>    64/10k to 32/100 in order to lower the average flush time (we were
>    previously always hitting the default flush interval since no
partitions

Hmm, that is a pretty low value for flush interval leading to higher disk
usage. Do you use dedicated disks for kafka data logs ? Also what sort of
disks do you use ?

Thanks,
Neha

>
>
> On Tue, Apr 23, 2013 at 7:53 AM, Jun Rao <ju...@gmail.com> wrote:
>
> > You can run kafka.tools.ConsumerOffsetChecker to check the consumer
lag. If
> > the consumer is lagging, this indicates a problem on the consumer side.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Mon, Apr 22, 2013 at 9:13 PM, Andrew Neilson <arsneilson@gmail.com
> > >wrote:
> >
> > > Hmm it is highly unlikely that that is the culprit... There is lots of
> > > bandwidth available for me to use. I will definitely keep that in mind
> > > though. I was working on this today and have some tidbits of
additional
> > > information and thoughts that you might be able to shed some light on:
> > >
> > >    - I mentioned I have 2 consumers, but each consumer is running
with 8
> > >    threads for this topic (and each consumer has 8 cores available).
> > >    - When I initially asked for help the brokers were configured with
> > >    num.partitions=1, I've since tried higher numbers (3, 64) and
haven't
> > > seen
> > >    much of an improvement aside from forcing both consumer apps to
handle
> > >    messages (with the overall performance not changing much).
> > >    - I ran into this article
> > >
> > >
> >
http://riccomini.name/posts/kafka/2012-10-05-kafka-consumer-memory-tuning/and
> > > tried a variety of options of queuedchunks.max and fetch.size with no
> > >    significant results (simply meaning it did not achieve the goal of
me
> > >    constantly processing hundreds or thousands of messages per second,
> > > which
> > >    is similar to the rate of input). I would not be surprised if I'm
> > wrong
> > > but
> > >    this made me start to think that the problem may lie outside of the
> > >    consumers
> > >    - Would the combination of a high number of partitions (64) and a
high
> > >    log.flush.interval (10k) prevent logs from flushing as often as
they
> > > need
> > >    to for my desired rate of consumption (even with
> > >    log.default.flush.interval.ms=1000?)
> > >
> > > Despite the changes I mentioned the behaviour is still the consumers
> > > receiving larger spikes of messages mixed with periods of complete
> > > inactivity and overall a long delay between messages being written and
> > > messages being read (about 2 minutes). Anyway... as always I greatly
> > > appreciate any help.
> > >
> > > On Sun, Apr 21, 2013 at 8:50 PM, Jun Rao <ju...@gmail.com> wrote:
> > >
> > > > Is your network shared? Is so, another possibility is that some
other
> > > apps
> > > > are consuming the bandwidth.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson <
arsneilson@gmail.com
> > > > >wrote:
> > > >
> > > > > Thanks very much for the reply Neha! So I swapped out the consumer
> > that
> > > > > processes the messages with one that just prints them. It does
indeed
> > > > > achieve a much better rate at peaks but can still nearly zero out
(if
> > > not
> > > > > completely zero out). I plotted the messages printed in graphite
to
> > > show
> > > > > the behaviour I'm seeing (this is messages printed per second):
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png
> > > > >
> > > > > The peaks are over ten thousand per second and the troughs can go
> > below
> > > > 10
> > > > > per second just prior to another peak. I know that there are
plenty
> > of
> > > > > messages available because the ones currently being processed are
> > still
> > > > > from Friday afternoon, so this may or may not have something to do
> > with
> > > > > this pattern.
> > > > >
> > > > > Is there anything I can do to avoid the periods of lower
performance?
> > > > > Ideally I would be processing messages as soon as they are
written.
> > > > >
> > > > >
> > > > > On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <
> > > neha.narkhede@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Some of the reasons a consumer is slow are -
> > > > > > 1. Small fetch size
> > > > > > 2. Expensive message processing
> > > > > >
> > > > > > Are you processing the received messages in the consumer ? Have
you
> > > > > > tried running console consumer for this topic and see how it
> > performs
> > > > > > ?
> > > > > >
> > > > > > Thanks,
> > > > > > Neha
> > > > > >
> > > > > > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <
> > > arsneilson@gmail.com>
> > > > > > wrote:
> > > > > > > I am currently running a deployment with 3 brokers, 3 ZK, 3
> > > > producers,
> > > > > 2
> > > > > > > consumers, and 15 topics. I should first point out that this
is
> > my
> > > > > first
> > > > > > > project using Kafka ;). The issue I'm seeing is that the
> > consumers
> > > > are
> > > > > > only
> > > > > > > processing about 15 messages per second from what should be
the
> > > > largest
> > > > > > > topic it is consuming (we're sending 200-400 ~300 byte
messages
> > per
> > > > > > second
> > > > > > > to this topic). I should note that I'm using a high level ZK
> > > consumer
> > > > > and
> > > > > > > ZK 3.4.3.
> > > > > > >
> > > > > > > I have a strong feeling I have not configured things properly
so
> > I
> > > > > could
> > > > > > > definitely use some guidance. Here is my broker configuration:
> > > > > > >
> > > > > > > brokerid=1
> > > > > > > port=9092
> > > > > > > socket.send.buffer=1048576
> > > > > > > socket.receive.buffer=1048576
> > > > > > > max.socket.request.bytes=104857600
> > > > > > > log.dir=/home/kafka/data
> > > > > > > num.partitions=1
> > > > > > > log.flush.interval=10000
> > > > > > > log.default.flush.interval.ms=1000
> > > > > > > log.default.flush.scheduler.interval.ms=1000
> > > > > > > log.retention.hours=168
> > > > > > > log.file.size=536870912
> > > > > > > enable.zookeeper=true
> > > > > > > zk.connect=XXX
> > > > > > > zk.connectiontimeout.ms=1000000
> > > > > > >
> > > > > > > Here is my producer config:
> > > > > > >
> > > > > > > zk.connect=XXX
> > > > > > > producer.type=async
> > > > > > > compression.codec=0
> > > > > > >
> > > > > > > Here is my consumer config:
> > > > > > >
> > > > > > > zk.connect=XXX
> > > > > > > zk.connectiontimeout.ms=100000
> > > > > > > groupid=XXX
> > > > > > > autooffset.reset=smallest
> > > > > > > socket.buffersize=1048576
> > > > > > > fetch.size=10485760
> > > > > > > queuedchunks.max=10000
> > > > > > >
> > > > > > > Thanks for any assistance you can provide,
> > > > > > >
> > > > > > > Andrew
> > > > > >
> > > > >
> > > >
> > >
> >

Re: seeing poor consumer performance in 0.7.2

Posted by Andrew Neilson <ar...@gmail.com>.

Thanks Jun, your suggestion helped me quite a bit.

Since earlier this week I've been able to work out the issues (at least it
seems like it for now). My consumer is now roughly processing messages at
the rate they are being produced with an acceptable amount of lag end to
end. Here is an overview of the issues I had. Let me know if the way I
resolved things makes sense:

   - many serialization errors in the producers. Fixing these eliminated
   what were previously perceived as lost or delayed messages.
   - one of the producers was not accessible through the VIP we were
   sending messages to. There was also a bug in the healthcheck that caused
   netscaler to drop one of the producers. Both of these contributed to
   sending too many messages to one producer which filled up the blocking
   queues.
   - I had to increase queue.size on the producers several times (currently
   at 320k). This may now be unnecessarily high given my next point
   - Increased batch.size on the producers several times. The last increase
   (batch.size=1600) is what finally got things going at the rate I am happy
   with.
   - Decreased num.partitions and log.flush.interval on the brokers from
   64/10k to 32/100 in order to lower the average flush time (we were
   previously always hitting the default flush interval since no partitions
   ever accumulated 10k messages). The flush times are currently < 100ms (not
   sure if this is too low but everything seems to be working). The avg flush
   time was previously 1 second.
   - Increased fetch.size and queuedchunks.max on the consumers several
   times and ended at 80MB/100k. This was before I made a bunch of the changes
   on the producer side so these may be unnecessarily high as well.

Once again, thanks for all of the help. I'm curious to know which if any of
the changes I made were unnecessary.

Andrew


On Tue, Apr 23, 2013 at 7:53 AM, Jun Rao <ju...@gmail.com> wrote:

> You can run kafka.tools.ConsumerOffsetChecker to check the consumer lag. If
> the consumer is lagging, this indicates a problem on the consumer side.
>
> Thanks,
>
> Jun
>
>
> On Mon, Apr 22, 2013 at 9:13 PM, Andrew Neilson <arsneilson@gmail.com
> >wrote:
>
> > Hmm it is highly unlikely that that is the culprit... There is lots of
> > bandwidth available for me to use. I will definitely keep that in mind
> > though. I was working on this today and have some tidbits of additional
> > information and thoughts that you might be able to shed some light on:
> >
> >    - I mentioned I have 2 consumers, but each consumer is running with 8
> >    threads for this topic (and each consumer has 8 cores available).
> >    - When I initially asked for help the brokers were configured with
> >    num.partitions=1, I've since tried higher numbers (3, 64) and haven't
> > seen
> >    much of an improvement aside from forcing both consumer apps to handle
> >    messages (with the overall performance not changing much).
> >    - I ran into this article
> >
> >
> http://riccomini.name/posts/kafka/2012-10-05-kafka-consumer-memory-tuning/and
> > tried a variety of options of queuedchunks.max and fetch.size with no
> >    significant results (simply meaning it did not achieve the goal of me
> >    constantly processing hundreds or thousands of messages per second,
> > which
> >    is similar to the rate of input). I would not be surprised if I'm
> wrong
> > but
> >    this made me start to think that the problem may lie outside of the
> >    consumers
> >    - Would the combination of a high number of partitions (64) and a high
> >    log.flush.interval (10k) prevent logs from flushing as often as they
> > need
> >    to for my desired rate of consumption (even with
> >    log.default.flush.interval.ms=1000?)
> >
> > Despite the changes I mentioned the behaviour is still the consumers
> > receiving larger spikes of messages mixed with periods of complete
> > inactivity and overall a long delay between messages being written and
> > messages being read (about 2 minutes). Anyway... as always I greatly
> > appreciate any help.
> >
> > On Sun, Apr 21, 2013 at 8:50 PM, Jun Rao <ju...@gmail.com> wrote:
> >
> > > Is your network shared? Is so, another possibility is that some other
> > apps
> > > are consuming the bandwidth.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > >
> > > On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson <arsneilson@gmail.com
> > > >wrote:
> > >
> > > > Thanks very much for the reply Neha! So I swapped out the consumer
> that
> > > > processes the messages with one that just prints them. It does indeed
> > > > achieve a much better rate at peaks but can still nearly zero out (if
> > not
> > > > completely zero out). I plotted the messages printed in graphite to
> > show
> > > > the behaviour I'm seeing (this is messages printed per second):
> > > >
> > > >
> > > >
> > >
> >
> https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png
> > > >
> > > > The peaks are over ten thousand per second and the troughs can go
> below
> > > 10
> > > > per second just prior to another peak. I know that there are plenty
> of
> > > > messages available because the ones currently being processed are
> still
> > > > from Friday afternoon, so this may or may not have something to do
> with
> > > > this pattern.
> > > >
> > > > Is there anything I can do to avoid the periods of lower performance?
> > > > Ideally I would be processing messages as soon as they are written.
> > > >
> > > >
> > > > On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <
> > neha.narkhede@gmail.com
> > > > >wrote:
> > > >
> > > > > Some of the reasons a consumer is slow are -
> > > > > 1. Small fetch size
> > > > > 2. Expensive message processing
> > > > >
> > > > > Are you processing the received messages in the consumer ? Have you
> > > > > tried running console consumer for this topic and see how it
> performs
> > > > > ?
> > > > >
> > > > > Thanks,
> > > > > Neha
> > > > >
> > > > > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <
> > arsneilson@gmail.com>
> > > > > wrote:
> > > > > > I am currently running a deployment with 3 brokers, 3 ZK, 3
> > > producers,
> > > > 2
> > > > > > consumers, and 15 topics. I should first point out that this is
> my
> > > > first
> > > > > > project using Kafka ;). The issue I'm seeing is that the
> consumers
> > > are
> > > > > only
> > > > > > processing about 15 messages per second from what should be the
> > > largest
> > > > > > topic it is consuming (we're sending 200-400 ~300 byte messages
> per
> > > > > second
> > > > > > to this topic). I should note that I'm using a high level ZK
> > consumer
> > > > and
> > > > > > ZK 3.4.3.
> > > > > >
> > > > > > I have a strong feeling I have not configured things properly so
> I
> > > > could
> > > > > > definitely use some guidance. Here is my broker configuration:
> > > > > >
> > > > > > brokerid=1
> > > > > > port=9092
> > > > > > socket.send.buffer=1048576
> > > > > > socket.receive.buffer=1048576
> > > > > > max.socket.request.bytes=104857600
> > > > > > log.dir=/home/kafka/data
> > > > > > num.partitions=1
> > > > > > log.flush.interval=10000
> > > > > > log.default.flush.interval.ms=1000
> > > > > > log.default.flush.scheduler.interval.ms=1000
> > > > > > log.retention.hours=168
> > > > > > log.file.size=536870912
> > > > > > enable.zookeeper=true
> > > > > > zk.connect=XXX
> > > > > > zk.connectiontimeout.ms=1000000
> > > > > >
> > > > > > Here is my producer config:
> > > > > >
> > > > > > zk.connect=XXX
> > > > > > producer.type=async
> > > > > > compression.codec=0
> > > > > >
> > > > > > Here is my consumer config:
> > > > > >
> > > > > > zk.connect=XXX
> > > > > > zk.connectiontimeout.ms=100000
> > > > > > groupid=XXX
> > > > > > autooffset.reset=smallest
> > > > > > socket.buffersize=1048576
> > > > > > fetch.size=10485760
> > > > > > queuedchunks.max=10000
> > > > > >
> > > > > > Thanks for any assistance you can provide,
> > > > > >
> > > > > > Andrew
> > > > >
> > > >
> > >
> >
>

Re: seeing poor consumer performance in 0.7.2

Posted by Jun Rao <ju...@gmail.com>.

You can run kafka.tools.ConsumerOffsetChecker to check the consumer lag. If
the consumer is lagging, this indicates a problem on the consumer side.

Thanks,

Jun


On Mon, Apr 22, 2013 at 9:13 PM, Andrew Neilson <ar...@gmail.com>wrote:

> Hmm it is highly unlikely that that is the culprit... There is lots of
> bandwidth available for me to use. I will definitely keep that in mind
> though. I was working on this today and have some tidbits of additional
> information and thoughts that you might be able to shed some light on:
>
>    - I mentioned I have 2 consumers, but each consumer is running with 8
>    threads for this topic (and each consumer has 8 cores available).
>    - When I initially asked for help the brokers were configured with
>    num.partitions=1, I've since tried higher numbers (3, 64) and haven't
> seen
>    much of an improvement aside from forcing both consumer apps to handle
>    messages (with the overall performance not changing much).
>    - I ran into this article
>
> http://riccomini.name/posts/kafka/2012-10-05-kafka-consumer-memory-tuning/and
> tried a variety of options of queuedchunks.max and fetch.size with no
>    significant results (simply meaning it did not achieve the goal of me
>    constantly processing hundreds or thousands of messages per second,
> which
>    is similar to the rate of input). I would not be surprised if I'm wrong
> but
>    this made me start to think that the problem may lie outside of the
>    consumers
>    - Would the combination of a high number of partitions (64) and a high
>    log.flush.interval (10k) prevent logs from flushing as often as they
> need
>    to for my desired rate of consumption (even with
>    log.default.flush.interval.ms=1000?)
>
> Despite the changes I mentioned the behaviour is still the consumers
> receiving larger spikes of messages mixed with periods of complete
> inactivity and overall a long delay between messages being written and
> messages being read (about 2 minutes). Anyway... as always I greatly
> appreciate any help.
>
> On Sun, Apr 21, 2013 at 8:50 PM, Jun Rao <ju...@gmail.com> wrote:
>
> > Is your network shared? Is so, another possibility is that some other
> apps
> > are consuming the bandwidth.
> >
> > Thanks,
> >
> > Jun
> >
> >
> > On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson <arsneilson@gmail.com
> > >wrote:
> >
> > > Thanks very much for the reply Neha! So I swapped out the consumer that
> > > processes the messages with one that just prints them. It does indeed
> > > achieve a much better rate at peaks but can still nearly zero out (if
> not
> > > completely zero out). I plotted the messages printed in graphite to
> show
> > > the behaviour I'm seeing (this is messages printed per second):
> > >
> > >
> > >
> >
> https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png
> > >
> > > The peaks are over ten thousand per second and the troughs can go below
> > 10
> > > per second just prior to another peak. I know that there are plenty of
> > > messages available because the ones currently being processed are still
> > > from Friday afternoon, so this may or may not have something to do with
> > > this pattern.
> > >
> > > Is there anything I can do to avoid the periods of lower performance?
> > > Ideally I would be processing messages as soon as they are written.
> > >
> > >
> > > On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <
> neha.narkhede@gmail.com
> > > >wrote:
> > >
> > > > Some of the reasons a consumer is slow are -
> > > > 1. Small fetch size
> > > > 2. Expensive message processing
> > > >
> > > > Are you processing the received messages in the consumer ? Have you
> > > > tried running console consumer for this topic and see how it performs
> > > > ?
> > > >
> > > > Thanks,
> > > > Neha
> > > >
> > > > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <
> arsneilson@gmail.com>
> > > > wrote:
> > > > > I am currently running a deployment with 3 brokers, 3 ZK, 3
> > producers,
> > > 2
> > > > > consumers, and 15 topics. I should first point out that this is my
> > > first
> > > > > project using Kafka ;). The issue I'm seeing is that the consumers
> > are
> > > > only
> > > > > processing about 15 messages per second from what should be the
> > largest
> > > > > topic it is consuming (we're sending 200-400 ~300 byte messages per
> > > > second
> > > > > to this topic). I should note that I'm using a high level ZK
> consumer
> > > and
> > > > > ZK 3.4.3.
> > > > >
> > > > > I have a strong feeling I have not configured things properly so I
> > > could
> > > > > definitely use some guidance. Here is my broker configuration:
> > > > >
> > > > > brokerid=1
> > > > > port=9092
> > > > > socket.send.buffer=1048576
> > > > > socket.receive.buffer=1048576
> > > > > max.socket.request.bytes=104857600
> > > > > log.dir=/home/kafka/data
> > > > > num.partitions=1
> > > > > log.flush.interval=10000
> > > > > log.default.flush.interval.ms=1000
> > > > > log.default.flush.scheduler.interval.ms=1000
> > > > > log.retention.hours=168
> > > > > log.file.size=536870912
> > > > > enable.zookeeper=true
> > > > > zk.connect=XXX
> > > > > zk.connectiontimeout.ms=1000000
> > > > >
> > > > > Here is my producer config:
> > > > >
> > > > > zk.connect=XXX
> > > > > producer.type=async
> > > > > compression.codec=0
> > > > >
> > > > > Here is my consumer config:
> > > > >
> > > > > zk.connect=XXX
> > > > > zk.connectiontimeout.ms=100000
> > > > > groupid=XXX
> > > > > autooffset.reset=smallest
> > > > > socket.buffersize=1048576
> > > > > fetch.size=10485760
> > > > > queuedchunks.max=10000
> > > > >
> > > > > Thanks for any assistance you can provide,
> > > > >
> > > > > Andrew
> > > >
> > >
> >
>

Re: seeing poor consumer performance in 0.7.2

Posted by Andrew Neilson <ar...@gmail.com>.

Hmm it is highly unlikely that that is the culprit... There is lots of
bandwidth available for me to use. I will definitely keep that in mind
though. I was working on this today and have some tidbits of additional
information and thoughts that you might be able to shed some light on:

   - I mentioned I have 2 consumers, but each consumer is running with 8
   threads for this topic (and each consumer has 8 cores available).
   - When I initially asked for help the brokers were configured with
   num.partitions=1, I've since tried higher numbers (3, 64) and haven't seen
   much of an improvement aside from forcing both consumer apps to handle
   messages (with the overall performance not changing much).
   - I ran into this article
   http://riccomini.name/posts/kafka/2012-10-05-kafka-consumer-memory-tuning/and
tried a variety of options of queuedchunks.max and fetch.size with no
   significant results (simply meaning it did not achieve the goal of me
   constantly processing hundreds or thousands of messages per second, which
   is similar to the rate of input). I would not be surprised if I'm wrong but
   this made me start to think that the problem may lie outside of the
   consumers
   - Would the combination of a high number of partitions (64) and a high
   log.flush.interval (10k) prevent logs from flushing as often as they need
   to for my desired rate of consumption (even with
   log.default.flush.interval.ms=1000?)

Despite the changes I mentioned the behaviour is still the consumers
receiving larger spikes of messages mixed with periods of complete
inactivity and overall a long delay between messages being written and
messages being read (about 2 minutes). Anyway... as always I greatly
appreciate any help.

On Sun, Apr 21, 2013 at 8:50 PM, Jun Rao <ju...@gmail.com> wrote:

> Is your network shared? Is so, another possibility is that some other apps
> are consuming the bandwidth.
>
> Thanks,
>
> Jun
>
>
> On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson <arsneilson@gmail.com
> >wrote:
>
> > Thanks very much for the reply Neha! So I swapped out the consumer that
> > processes the messages with one that just prints them. It does indeed
> > achieve a much better rate at peaks but can still nearly zero out (if not
> > completely zero out). I plotted the messages printed in graphite to show
> > the behaviour I'm seeing (this is messages printed per second):
> >
> >
> >
> https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png
> >
> > The peaks are over ten thousand per second and the troughs can go below
> 10
> > per second just prior to another peak. I know that there are plenty of
> > messages available because the ones currently being processed are still
> > from Friday afternoon, so this may or may not have something to do with
> > this pattern.
> >
> > Is there anything I can do to avoid the periods of lower performance?
> > Ideally I would be processing messages as soon as they are written.
> >
> >
> > On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <neha.narkhede@gmail.com
> > >wrote:
> >
> > > Some of the reasons a consumer is slow are -
> > > 1. Small fetch size
> > > 2. Expensive message processing
> > >
> > > Are you processing the received messages in the consumer ? Have you
> > > tried running console consumer for this topic and see how it performs
> > > ?
> > >
> > > Thanks,
> > > Neha
> > >
> > > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <ar...@gmail.com>
> > > wrote:
> > > > I am currently running a deployment with 3 brokers, 3 ZK, 3
> producers,
> > 2
> > > > consumers, and 15 topics. I should first point out that this is my
> > first
> > > > project using Kafka ;). The issue I'm seeing is that the consumers
> are
> > > only
> > > > processing about 15 messages per second from what should be the
> largest
> > > > topic it is consuming (we're sending 200-400 ~300 byte messages per
> > > second
> > > > to this topic). I should note that I'm using a high level ZK consumer
> > and
> > > > ZK 3.4.3.
> > > >
> > > > I have a strong feeling I have not configured things properly so I
> > could
> > > > definitely use some guidance. Here is my broker configuration:
> > > >
> > > > brokerid=1
> > > > port=9092
> > > > socket.send.buffer=1048576
> > > > socket.receive.buffer=1048576
> > > > max.socket.request.bytes=104857600
> > > > log.dir=/home/kafka/data
> > > > num.partitions=1
> > > > log.flush.interval=10000
> > > > log.default.flush.interval.ms=1000
> > > > log.default.flush.scheduler.interval.ms=1000
> > > > log.retention.hours=168
> > > > log.file.size=536870912
> > > > enable.zookeeper=true
> > > > zk.connect=XXX
> > > > zk.connectiontimeout.ms=1000000
> > > >
> > > > Here is my producer config:
> > > >
> > > > zk.connect=XXX
> > > > producer.type=async
> > > > compression.codec=0
> > > >
> > > > Here is my consumer config:
> > > >
> > > > zk.connect=XXX
> > > > zk.connectiontimeout.ms=100000
> > > > groupid=XXX
> > > > autooffset.reset=smallest
> > > > socket.buffersize=1048576
> > > > fetch.size=10485760
> > > > queuedchunks.max=10000
> > > >
> > > > Thanks for any assistance you can provide,
> > > >
> > > > Andrew
> > >
> >
>

Re: seeing poor consumer performance in 0.7.2

Posted by Jun Rao <ju...@gmail.com>.

Is your network shared? Is so, another possibility is that some other apps
are consuming the bandwidth.

Thanks,

Jun


On Sun, Apr 21, 2013 at 12:23 PM, Andrew Neilson <ar...@gmail.com>wrote:

> Thanks very much for the reply Neha! So I swapped out the consumer that
> processes the messages with one that just prints them. It does indeed
> achieve a much better rate at peaks but can still nearly zero out (if not
> completely zero out). I plotted the messages printed in graphite to show
> the behaviour I'm seeing (this is messages printed per second):
>
>
> https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png
>
> The peaks are over ten thousand per second and the troughs can go below 10
> per second just prior to another peak. I know that there are plenty of
> messages available because the ones currently being processed are still
> from Friday afternoon, so this may or may not have something to do with
> this pattern.
>
> Is there anything I can do to avoid the periods of lower performance?
> Ideally I would be processing messages as soon as they are written.
>
>
> On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <neha.narkhede@gmail.com
> >wrote:
>
> > Some of the reasons a consumer is slow are -
> > 1. Small fetch size
> > 2. Expensive message processing
> >
> > Are you processing the received messages in the consumer ? Have you
> > tried running console consumer for this topic and see how it performs
> > ?
> >
> > Thanks,
> > Neha
> >
> > On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <ar...@gmail.com>
> > wrote:
> > > I am currently running a deployment with 3 brokers, 3 ZK, 3 producers,
> 2
> > > consumers, and 15 topics. I should first point out that this is my
> first
> > > project using Kafka ;). The issue I'm seeing is that the consumers are
> > only
> > > processing about 15 messages per second from what should be the largest
> > > topic it is consuming (we're sending 200-400 ~300 byte messages per
> > second
> > > to this topic). I should note that I'm using a high level ZK consumer
> and
> > > ZK 3.4.3.
> > >
> > > I have a strong feeling I have not configured things properly so I
> could
> > > definitely use some guidance. Here is my broker configuration:
> > >
> > > brokerid=1
> > > port=9092
> > > socket.send.buffer=1048576
> > > socket.receive.buffer=1048576
> > > max.socket.request.bytes=104857600
> > > log.dir=/home/kafka/data
> > > num.partitions=1
> > > log.flush.interval=10000
> > > log.default.flush.interval.ms=1000
> > > log.default.flush.scheduler.interval.ms=1000
> > > log.retention.hours=168
> > > log.file.size=536870912
> > > enable.zookeeper=true
> > > zk.connect=XXX
> > > zk.connectiontimeout.ms=1000000
> > >
> > > Here is my producer config:
> > >
> > > zk.connect=XXX
> > > producer.type=async
> > > compression.codec=0
> > >
> > > Here is my consumer config:
> > >
> > > zk.connect=XXX
> > > zk.connectiontimeout.ms=100000
> > > groupid=XXX
> > > autooffset.reset=smallest
> > > socket.buffersize=1048576
> > > fetch.size=10485760
> > > queuedchunks.max=10000
> > >
> > > Thanks for any assistance you can provide,
> > >
> > > Andrew
> >
>

Re: seeing poor consumer performance in 0.7.2

Posted by Andrew Neilson <ar...@gmail.com>.

Thanks very much for the reply Neha! So I swapped out the consumer that
processes the messages with one that just prints them. It does indeed
achieve a much better rate at peaks but can still nearly zero out (if not
completely zero out). I plotted the messages printed in graphite to show
the behaviour I'm seeing (this is messages printed per second):

https://www.dropbox.com/s/7u7uyrefw6inetu/Screen%20Shot%202013-04-21%20at%2011.44.38%20AM.png

The peaks are over ten thousand per second and the troughs can go below 10
per second just prior to another peak. I know that there are plenty of
messages available because the ones currently being processed are still
from Friday afternoon, so this may or may not have something to do with
this pattern.

Is there anything I can do to avoid the periods of lower performance?
Ideally I would be processing messages as soon as they are written.

On Sun, Apr 21, 2013 at 8:49 AM, Neha Narkhede <ne...@gmail.com>wrote:

> Some of the reasons a consumer is slow are -
> 1. Small fetch size
> 2. Expensive message processing
>
> Are you processing the received messages in the consumer ? Have you
> tried running console consumer for this topic and see how it performs
> ?
>
> Thanks,
> Neha
>
> On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <ar...@gmail.com>
> wrote:
> > I am currently running a deployment with 3 brokers, 3 ZK, 3 producers, 2
> > consumers, and 15 topics. I should first point out that this is my first
> > project using Kafka ;). The issue I'm seeing is that the consumers are
> only
> > processing about 15 messages per second from what should be the largest
> > topic it is consuming (we're sending 200-400 ~300 byte messages per
> second
> > to this topic). I should note that I'm using a high level ZK consumer and
> > ZK 3.4.3.
> >
> > I have a strong feeling I have not configured things properly so I could
> > definitely use some guidance. Here is my broker configuration:
> >
> > brokerid=1
> > port=9092
> > socket.send.buffer=1048576
> > socket.receive.buffer=1048576
> > max.socket.request.bytes=104857600
> > log.dir=/home/kafka/data
> > num.partitions=1
> > log.flush.interval=10000
> > log.default.flush.interval.ms=1000
> > log.default.flush.scheduler.interval.ms=1000
> > log.retention.hours=168
> > log.file.size=536870912
> > enable.zookeeper=true
> > zk.connect=XXX
> > zk.connectiontimeout.ms=1000000
> >
> > Here is my producer config:
> >
> > zk.connect=XXX
> > producer.type=async
> > compression.codec=0
> >
> > Here is my consumer config:
> >
> > zk.connect=XXX
> > zk.connectiontimeout.ms=100000
> > groupid=XXX
> > autooffset.reset=smallest
> > socket.buffersize=1048576
> > fetch.size=10485760
> > queuedchunks.max=10000
> >
> > Thanks for any assistance you can provide,
> >
> > Andrew
>

Re: seeing poor consumer performance in 0.7.2

Posted by Neha Narkhede <ne...@gmail.com>.

Some of the reasons a consumer is slow are -
1. Small fetch size
2. Expensive message processing

Are you processing the received messages in the consumer ? Have you
tried running console consumer for this topic and see how it performs
?

Thanks,
Neha

On Sun, Apr 21, 2013 at 1:59 AM, Andrew Neilson <ar...@gmail.com> wrote:
> I am currently running a deployment with 3 brokers, 3 ZK, 3 producers, 2
> consumers, and 15 topics. I should first point out that this is my first
> project using Kafka ;). The issue I'm seeing is that the consumers are only
> processing about 15 messages per second from what should be the largest
> topic it is consuming (we're sending 200-400 ~300 byte messages per second
> to this topic). I should note that I'm using a high level ZK consumer and
> ZK 3.4.3.
>
> I have a strong feeling I have not configured things properly so I could
> definitely use some guidance. Here is my broker configuration:
>
> brokerid=1
> port=9092
> socket.send.buffer=1048576
> socket.receive.buffer=1048576
> max.socket.request.bytes=104857600
> log.dir=/home/kafka/data
> num.partitions=1
> log.flush.interval=10000
> log.default.flush.interval.ms=1000
> log.default.flush.scheduler.interval.ms=1000
> log.retention.hours=168
> log.file.size=536870912
> enable.zookeeper=true
> zk.connect=XXX
> zk.connectiontimeout.ms=1000000
>
> Here is my producer config:
>
> zk.connect=XXX
> producer.type=async
> compression.codec=0
>
> Here is my consumer config:
>
> zk.connect=XXX
> zk.connectiontimeout.ms=100000
> groupid=XXX
> autooffset.reset=smallest
> socket.buffersize=1048576
> fetch.size=10485760
> queuedchunks.max=10000
>
> Thanks for any assistance you can provide,
>
> Andrew