You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Rajiv Kurian <ra...@signalfx.com> on 2015/09/22 20:46:49 UTC

Debugging high log flush latency on a broker.

I have a particular broker(version 0.8.2.1)  in a cluster receiving about
15000 messages/second of around 100 bytes each (bytes-in / messages-in).
This broker has bursts of really high log flush latency p95s. The latency
 sometimes goes to above 1.5 seconds from a steady state of < 20 ms.

Running top on the box  shows bursts of wa (io wait) of over 10% during
these high latency cycles. Running iostat during the same time shows that
it was doing more than 16000 block writes/sec during these periods of high
latency. Running iostat during a normal run shows around 3000 - 7000 block
writes/second. So it does seem like during the latency cycle we are just
trying to do too much IO.

I have tried replacing the broker several times to see if that would help
but it hasn't so far, so my guess is there is something peculiar about one
or more of the partitions that are assigned to that broker. The other
observation is that messages-in/sec and bytes-in/sec DO NOT go up higher
when the broker is misbehaving. I have also noticed that every time I
replace the broker it takes much longer for the new broker to bootstrap
when compared to other brokers that I have replaced. For example it takes
up to 20 minutes before the log flush latency on the newly bootstrapped
broker (with the same ID as the one it replaced) to go to normal levels.

Any hints on what else I can try (replacing broker does not seem to help?
15000 messages/second of around 100 bytes each on a dual SSD, 30 GB RAM and
16 core box would seem very doable. It doesn't bother the other brokers in
the cluster either.

Also any hints on how I can find the exact topic/partitions assigned to
this broker? I know in ZK we can see the partition -> broker mapping, but I
am looking for a broker -> partition mapping. I can't be sure if the load
that is causing this problem is because of leader traffic or follower
traffic. What is weird is that I rarely if ever see other brokers in the
cluster have the same problem. With 3 way replication (leader + 2 replicas)
I'd imagine that the same work load would cause problems on other brokers
too.

Thanks,
Rajiv

Re: Debugging high log flush latency on a broker.

Posted by Rajiv Kurian <ra...@signalfx.com>.
Thanks Steve. I followed your suggestion to get the topics.

What is weird is that the bad broker does not get any more traffic
(messages or bytes) when this happens. Also I have more than 2 G (out of
28G) free memory according to collectd and running vmstat on the box, so I
hope that things don't get paged out. The incoming traffic is paltry (15k
messages/sec on such a big box) so I can't understand how this is happening.

All my config across all brokers is the same. They are running on the same
EC2 instance with the same OS configuration. I don't see any network hiccup
either. The network in is pretty steady. When the symptom starts we see the
CPU take a spike it goes to 45% from 20% and then stays at around 22-24%
for the rest of the spike.

On Tue, Sep 22, 2015 at 12:34 PM, Steve Miller <st...@idrathernotsay.com>
wrote:

>    There may be more elegant ways to do this, but I'd think that you could
> just ls all the directories specified in log.dirs in your server.properties
> file for Kafka.  You should see directories for each
> topicname-partitionnumber there.
>
>    Offhand it sounds to me like maybe something's evicting pages from the
> buffer cache from time to time, causing Kafka to do a lot more I/O all of a
> sudden than usual.  Why that happens, I don't know, but that'd be my guess:
> either something needs more pages for applications all of a sudden, or like
> you said, there's some characteristic of the traffic for the partitions on
> this broker that isn't the same as it is for all the other brokers.
>
>    Filesystem type and creation parameters are the same as on the other
> hosts?  sysctl stuff all tuned the same way (assuming this is Linux, that
> is)?
>
>    Any chance there's some sort of network hiccup that makes some follower
> get a little behind, and then the act of it trying to catch back up pushes
> the I/O past what it can sustain steady-state?  (If something gets
> significantly behind, depending on the size of your buffer cache relative
> to the retention in your topics, you could have something, say, start
> reading from the first offset in that topic and partition, which might well
> require going to disk rather than being satisfied from the buffer cache.  I
> could see that slowing I/O enough, if it's on the edge otherwise, that now
> you can't keep up with the write rate until that consumer gets caught up.)
>
>    The other idea would be that, I dunno, maybe there's topic where the
> segment size is different, and so when it goes to delete a segment it's
> spending a lot more time putting blocks from that file back onto the
> filesystem free list (or whatever data structure it is these days (-: ).
>
>         -Steve
>
> On Tue, Sep 22, 2015 at 11:46:49AM -0700, Rajiv Kurian wrote:
> > Also any hints on how I can find the exact topic/partitions assigned to
> > this broker? I know in ZK we can see the partition -> broker mapping,
> but I
> > am looking for a broker -> partition mapping. I can't be sure if the load
> > that is causing this problem is because of leader traffic or follower
> > traffic. What is weird is that I rarely if ever see other brokers in the
> > cluster have the same problem. With 3 way replication (leader + 2
> replicas)
> > I'd imagine that the same work load would cause problems on other brokers
> > too.
>

Re: Debugging high log flush latency on a broker.

Posted by Steve Miller <st...@idrathernotsay.com>.
   There may be more elegant ways to do this, but I'd think that you could just ls all the directories specified in log.dirs in your server.properties file for Kafka.  You should see directories for each topicname-partitionnumber there.

   Offhand it sounds to me like maybe something's evicting pages from the buffer cache from time to time, causing Kafka to do a lot more I/O all of a sudden than usual.  Why that happens, I don't know, but that'd be my guess: either something needs more pages for applications all of a sudden, or like you said, there's some characteristic of the traffic for the partitions on this broker that isn't the same as it is for all the other brokers.

   Filesystem type and creation parameters are the same as on the other hosts?  sysctl stuff all tuned the same way (assuming this is Linux, that is)?

   Any chance there's some sort of network hiccup that makes some follower get a little behind, and then the act of it trying to catch back up pushes the I/O past what it can sustain steady-state?  (If something gets significantly behind, depending on the size of your buffer cache relative to the retention in your topics, you could have something, say, start reading from the first offset in that topic and partition, which might well require going to disk rather than being satisfied from the buffer cache.  I could see that slowing I/O enough, if it's on the edge otherwise, that now you can't keep up with the write rate until that consumer gets caught up.)

   The other idea would be that, I dunno, maybe there's topic where the segment size is different, and so when it goes to delete a segment it's spending a lot more time putting blocks from that file back onto the filesystem free list (or whatever data structure it is these days (-: ).

	-Steve

On Tue, Sep 22, 2015 at 11:46:49AM -0700, Rajiv Kurian wrote:
> Also any hints on how I can find the exact topic/partitions assigned to
> this broker? I know in ZK we can see the partition -> broker mapping, but I
> am looking for a broker -> partition mapping. I can't be sure if the load
> that is causing this problem is because of leader traffic or follower
> traffic. What is weird is that I rarely if ever see other brokers in the
> cluster have the same problem. With 3 way replication (leader + 2 replicas)
> I'd imagine that the same work load would cause problems on other brokers
> too.