You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Paul van der Linden <pa...@sportr.co.uk> on 2017/03/23 10:25:17 UTC

Re: Kafka high cpu usage and disconnects

Hi,

I deployed Kafka about a week ago, but there are a few problems with how
Kafka behaves.
The first is the surprisingly high resource usage, one this the memory
(1.5-2 GB for each broker, 3 brokers) although this might be normal. The
other one is the cpu usage, which starts with 20% minimum on each broker,
which I find strange with the current throughput (which is < 1 msg/s).

This might has something to do with something else which I find strange,
Kafka disconnects clients about every 10-20 minutes per broker. It might
have something to do with the configuration: Deployed in kubernetes,
bootstrapping with a single dns name (which is backed by all alive kafka
brokers), and then every broker has a separate dns address which is used
after the bootstrap. This means that a client is connected twice to one of
the brokers. The reason for the bootstrap dns name is to make sure I don't
have to update all clients to include other brokers.

Any advice on how to solve these 2 problems?

Thanks,
Paul

On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <pa...@sportr.co.uk>
wrote:

> Hi,
>
> I deployed Kafka about a week ago, but there are a few problems with how
> Kafka behaves.
> The first is the surprisingly high resource usage, one this the memory
> (1.5-2 GB for each broker, 3 brokers) although this might be normal. The
> other one is the cpu usage, which starts with 20% minimum on each broker,
> which I find strange with the current throughput (which is < 1 msg/s).
>
> This might has something to do with something else which I find strange,
> Kafka disconnects clients about every 10-20 minutes per broker. It might
> have something to do with the configuration: Deployed in kubernetes,
> bootstrapping with a single dns name (which is backed by all alive kafka
> brokers), and then every broker has a separate dns address which is used
> after the bootstrap. This means that a client is connected twice to one of
> the brokers. The reason for the bootstrap dns name is to make sure I don't
> have to update all clients to include other brokers.
>
> Any advice on how to solve these 2 problems?
>
> Thanks,
> Paul
>

Re: Kafka high cpu usage and disconnects

Posted by Paul van der Linden <pa...@sportr.co.uk>.
If I run 3 brokers in a cluster on localhost the cpu usage is virtually
zero. Not sure why on other environments the minimum usage of each broker
is at least 13% (with zero producers/consumers), that doesn't sound normal.

On Thu, Mar 23, 2017 at 4:48 PM, Paul van der Linden <pa...@sportr.co.uk>
wrote:

> Doesn't seem to be the clients indeed. Maybe it already uses 13% of cpu on
> maintaining the cluster. With no connections at all, except zookeeper and
> the other 2 brokers. This is the cpu usage:
>
> CPU SAMPLES BEGIN (total = 86359) Thu Mar 23 16:47:26 2017
> rank   self  accum   count trace method
>    1 87.23% 87.23%   75327 300920 sun.nio.ch.EPollArrayWrapper.epollWait
>    2 12.48% 99.71%   10780 300518 java.net.PlainSocketImpl.socketAccept
>    3  0.06% 99.77%      51 300940 sun.nio.ch.FileDispatcherImpl.write0
>    4  0.02% 99.79%      20 301559 sun.nio.ch.FileDispatcherImpl.read0
>    5  0.01% 99.80%      10 301567 org.apache.log4j.Category.
> getEffectiveLevel
> CPU SAMPLES END
>
> On Thu, Mar 23, 2017 at 1:02 PM, Jaikiran Pai <ja...@gmail.com>
> wrote:
>
>> One thing that you might want to check is the number of consumers that
>> are connected/consuming against this Kafka setup. We have consistently
>> noticed that the CPU usage of the broker is very high even with very few
>> consumers (around 10 Java consumers). There's even a JIRA for it. From what
>> I remember, it had to do with the constant hearbeat and other such network
>> activities that happen between these consumers and the broker. We had this
>> issues since 0.8.x days till 0.10.0.1. We just migrated to 0.10.2.0 and we
>> will have to see if it is still reproducible in there.
>>
>> I don't mean to say you are running into the same issue, but you can
>> check that aspect as well (maybe shutdown all consumers and see how the
>> broker CPU behaves).
>>
>> -Jaikiran
>>
>>
>> On Thursday 23 March 2017 06:15 PM, Paul van der Linden wrote:
>>
>>> Thanks. I managed to get a cpu dump from staging.
>>>
>>> The output:
>>> THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0",
>>> group="system")
>>> THREAD START (obj=50000427, id = 200001, name="main", group="main")
>>> THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread",
>>> group="main")
>>> THREAD START (obj=500008e6, id = 200006,
>>> name="ThrottledRequestReaper-Fetch", group="main")
>>> THREAD START (obj=500008e6, id = 200007,
>>> name="ThrottledRequestReaper-Produce", group="main")
>>> THREAD START (obj=50000914, id = 200008,
>>> name="ZkClient-EventThread-18-zookeeper:2181", group="main")
>>> THREAD START (obj=500008e6, id = 200009, name="main-SendThread()",
>>> group="main")
>>> THREAD START (obj=50000950, id = 200010, name="main-EventThread",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1",
>>> group="main")
>>> THREAD END (id = 200011)
>>> THREAD START (obj=50000427, id = 200012,
>>> name="metrics-meter-tick-thread-1", group="main")
>>> THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2",
>>> group="main")
>>> THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread
>>> -0",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200017,
>>> name="kafka-network-thread-2-PLAINTEXT-0", group="main")
>>> THREAD START (obj=50000427, id = 200018,
>>> name="kafka-network-thread-2-PLAINTEXT-1", group="main")
>>> THREAD START (obj=50000427, id = 200019,
>>> name="kafka-network-thread-2-PLAINTEXT-2", group="main")
>>> THREAD START (obj=50000427, id = 200020,
>>> name="kafka-socket-acceptor-PLAINTEXT-9092", group="main")
>>> THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2",
>>> group="main")
>>> THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200023,
>>> name="metrics-meter-tick-thread-2", group="main")
>>> THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6",
>>> group="main")
>>> THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2",
>>> group="main")
>>> THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2",
>>> group="main")
>>> THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0
>>> ",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8",
>>> group="main")
>>> THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0
>>> ",
>>> group="main")
>>> THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1
>>> ",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9",
>>> group="main")
>>> THREAD START (obj=50000427, id = 200045, name="executor-Fetch",
>>> group="main")
>>> TRACE 300920:
>>> sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
>>> line)
>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>>> TRACE 300518:
>>> java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>>> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketI
>>> mpl.java:409)
>>> java.net.ServerSocket.implAccept(ServerSocket.java:545)
>>> java.net.ServerSocket.accept(ServerSocket.java:513)
>>> TRACE 300940:
>>> sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown
>>> line)
>>> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>>> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>>> sun.nio.ch.IOUtil.write(IOUtil.java:65)
>>> TRACE 301003:
>>> org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown
>>> line)
>>> org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474)
>>> org.xerial.snappy.Snappy.uncompress(Snappy.java:513)
>>> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStr
>>> eam.java:147)
>>> TRACE 300979:
>>> sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown
>>> line)
>>> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>>> sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>> TRACE 301630:
>>> sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown
>>> line)
>>> sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayW
>>> rapper.java:299)
>>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
>>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>>> TRACE 301259:
>>> sun.misc.Unsafe.unpark(Unsafe.java:Unknown line)
>>> java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.unpark
>>> Successor(AbstractQueuedSynchronizer.java:662)
>>> java.util.concurrent.locks.AbstractQueuedSynchronizer.releas
>>> e(AbstractQueuedSynchronizer.java:1264)
>>> TRACE 301559:
>>> sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown
>>> line)
>>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>>> sun.nio.ch.IOUtil.read(IOUtil.java:197)
>>> TRACE 300590:
>>> java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line)
>>> java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>>> java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>>> TRACE 301643:
>>> scala.Tuple2.equals(Tuple2.scala:20)
>>> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
>>> kafka.utils.Pool.get(Pool.scala:69)
>>> kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280)
>>> TRACE 300592:
>>> java.util.zip.ZipFile.read(ZipFile.java:Unknown line)
>>> java.util.zip.ZipFile.access$1400(ZipFile.java:60)
>>> java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
>>> java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
>>> TRACE 301018:
>>> kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148)
>>> kafka.message.Message.computeChecksum(Message.scala:216)
>>> kafka.message.Message.isValid(Message.scala:226)
>>> kafka.message.Message.ensureValid(Message.scala:232)
>>> TRACE 301561:
>>> java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line)
>>> kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62)
>>> kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49)
>>> kafka.server.ReplicaManager$$anonfun$checkpointHighWatermark
>>> s$2.apply(ReplicaManager.scala:945)
>>> TRACE 301422:
>>> java.util.Arrays.copyOf(Arrays.java:3332)
>>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStrin
>>> gBuilder.java:137)
>>> java.lang.AbstractStringBuilder.ensureCapacityInternal(Abstr
>>> actStringBuilder.java:121)
>>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
>>> CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017
>>> rank   self  accum   count trace method
>>>     1 86.46% 86.46%  604544 300920 sun.nio.ch.EPollArrayWrapper.e
>>> pollWait
>>>     2 12.62% 99.08%   88254 300518 java.net.PlainSocketImpl.socketAccept
>>>     3  0.11% 99.19%     759 300940 sun.nio.ch.FileDispatcherImpl.write0
>>>     4  0.04% 99.23%     253 301003
>>> org.xerial.snappy.SnappyNative.rawUncompress
>>>     5  0.03% 99.26%     231 300979 sun.nio.ch.FileDispatcherImpl.pread0
>>>     6  0.03% 99.29%     220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl
>>>     7  0.03% 99.32%     219 301259 sun.misc.Unsafe.unpark
>>>     8  0.02% 99.34%     145 301559 sun.nio.ch.FileDispatcherImpl.read0
>>>     9  0.01% 99.36%      89 300590 java.lang.ClassLoader.defineClass1
>>>    10  0.01% 99.37%      87 301643 scala.Tuple2.equals
>>>    11  0.01% 99.38%      79 300592 java.util.zip.ZipFile.read
>>>    12  0.01% 99.39%      79 301018 kafka.utils.CoreUtils$.crc32
>>>    13  0.01% 99.40%      78 301561 java.io.FileDescriptor.sync
>>>    14  0.01% 99.41%      72 301422 java.util.Arrays.copyOf
>>> CPU SAMPLES END
>>>
>>> It seems like the constant disconnects is far bigger then the 10 minutes
>>> default. I suspect this has something to do with double connects, which
>>> I'm
>>> not sure to get around.
>>>
>>> On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <ma...@gmail.com>
>>> wrote:
>>>
>>> 1. may be you can monitor thread wise cpu usage and correlate with thread
>>>> dump
>>>>      to identify the bottleneck
>>>> 2. Broker config property connections.max.idle.ms is used to close
>>>> idle connections.
>>>>      default is 10min.
>>>>
>>>> On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <paul@sportr.co.uk
>>>> >
>>>> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I deployed Kafka about a week ago, but there are a few problems with
>>>>> how
>>>>> Kafka behaves.
>>>>> The first is the surprisingly high resource usage, one this the memory
>>>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal.
>>>>> The
>>>>> other one is the cpu usage, which starts with 20% minimum on each
>>>>> broker,
>>>>> which I find strange with the current throughput (which is < 1 msg/s).
>>>>>
>>>>> This might has something to do with something else which I find
>>>>> strange,
>>>>> Kafka disconnects clients about every 10-20 minutes per broker. It
>>>>> might
>>>>> have something to do with the configuration: Deployed in kubernetes,
>>>>> bootstrapping with a single dns name (which is backed by all alive
>>>>> kafka
>>>>> brokers), and then every broker has a separate dns address which is
>>>>> used
>>>>> after the bootstrap. This means that a client is connected twice to one
>>>>>
>>>> of
>>>>
>>>>> the brokers. The reason for the bootstrap dns name is to make sure I
>>>>>
>>>> don't
>>>>
>>>>> have to update all clients to include other brokers.
>>>>>
>>>>> Any advice on how to solve these 2 problems?
>>>>>
>>>>> Thanks,
>>>>> Paul
>>>>>
>>>>> On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <
>>>>> paul@sportr.co.uk
>>>>>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> I deployed Kafka about a week ago, but there are a few problems with
>>>>>>
>>>>> how
>>>>
>>>>> Kafka behaves.
>>>>>> The first is the surprisingly high resource usage, one this the memory
>>>>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal.
>>>>>>
>>>>> The
>>>>
>>>>> other one is the cpu usage, which starts with 20% minimum on each
>>>>>>
>>>>> broker,
>>>>
>>>>> which I find strange with the current throughput (which is < 1 msg/s).
>>>>>>
>>>>>> This might has something to do with something else which I find
>>>>>>
>>>>> strange,
>>>>
>>>>> Kafka disconnects clients about every 10-20 minutes per broker. It
>>>>>>
>>>>> might
>>>>
>>>>> have something to do with the configuration: Deployed in kubernetes,
>>>>>> bootstrapping with a single dns name (which is backed by all alive
>>>>>>
>>>>> kafka
>>>>
>>>>> brokers), and then every broker has a separate dns address which is
>>>>>>
>>>>> used
>>>>
>>>>> after the bootstrap. This means that a client is connected twice to one
>>>>>>
>>>>> of
>>>>>
>>>>>> the brokers. The reason for the bootstrap dns name is to make sure I
>>>>>>
>>>>> don't
>>>>>
>>>>>> have to update all clients to include other brokers.
>>>>>>
>>>>>> Any advice on how to solve these 2 problems?
>>>>>>
>>>>>> Thanks,
>>>>>> Paul
>>>>>>
>>>>>>
>>
>

Re: Kafka high cpu usage and disconnects

Posted by Paul van der Linden <pa...@sportr.co.uk>.
Doesn't seem to be the clients indeed. Maybe it already uses 13% of cpu on
maintaining the cluster. With no connections at all, except zookeeper and
the other 2 brokers. This is the cpu usage:

CPU SAMPLES BEGIN (total = 86359) Thu Mar 23 16:47:26 2017
rank   self  accum   count trace method
   1 87.23% 87.23%   75327 300920 sun.nio.ch.EPollArrayWrapper.epollWait
   2 12.48% 99.71%   10780 300518 java.net.PlainSocketImpl.socketAccept
   3  0.06% 99.77%      51 300940 sun.nio.ch.FileDispatcherImpl.write0
   4  0.02% 99.79%      20 301559 sun.nio.ch.FileDispatcherImpl.read0
   5  0.01% 99.80%      10 301567
org.apache.log4j.Category.getEffectiveLevel
CPU SAMPLES END

On Thu, Mar 23, 2017 at 1:02 PM, Jaikiran Pai <ja...@gmail.com>
wrote:

> One thing that you might want to check is the number of consumers that are
> connected/consuming against this Kafka setup. We have consistently noticed
> that the CPU usage of the broker is very high even with very few consumers
> (around 10 Java consumers). There's even a JIRA for it. From what I
> remember, it had to do with the constant hearbeat and other such network
> activities that happen between these consumers and the broker. We had this
> issues since 0.8.x days till 0.10.0.1. We just migrated to 0.10.2.0 and we
> will have to see if it is still reproducible in there.
>
> I don't mean to say you are running into the same issue, but you can check
> that aspect as well (maybe shutdown all consumers and see how the broker
> CPU behaves).
>
> -Jaikiran
>
>
> On Thursday 23 March 2017 06:15 PM, Paul van der Linden wrote:
>
>> Thanks. I managed to get a cpu dump from staging.
>>
>> The output:
>> THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0",
>> group="system")
>> THREAD START (obj=50000427, id = 200001, name="main", group="main")
>> THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread",
>> group="main")
>> THREAD START (obj=500008e6, id = 200006,
>> name="ThrottledRequestReaper-Fetch", group="main")
>> THREAD START (obj=500008e6, id = 200007,
>> name="ThrottledRequestReaper-Produce", group="main")
>> THREAD START (obj=50000914, id = 200008,
>> name="ZkClient-EventThread-18-zookeeper:2181", group="main")
>> THREAD START (obj=500008e6, id = 200009, name="main-SendThread()",
>> group="main")
>> THREAD START (obj=50000950, id = 200010, name="main-EventThread",
>> group="main")
>> THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1",
>> group="main")
>> THREAD END (id = 200011)
>> THREAD START (obj=50000427, id = 200012,
>> name="metrics-meter-tick-thread-1", group="main")
>> THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0",
>> group="main")
>> THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1",
>> group="main")
>> THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2",
>> group="main")
>> THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread
>> -0",
>> group="main")
>> THREAD START (obj=50000427, id = 200017,
>> name="kafka-network-thread-2-PLAINTEXT-0", group="main")
>> THREAD START (obj=50000427, id = 200018,
>> name="kafka-network-thread-2-PLAINTEXT-1", group="main")
>> THREAD START (obj=50000427, id = 200019,
>> name="kafka-network-thread-2-PLAINTEXT-2", group="main")
>> THREAD START (obj=50000427, id = 200020,
>> name="kafka-socket-acceptor-PLAINTEXT-9092", group="main")
>> THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2",
>> group="main")
>> THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2",
>> group="main")
>> THREAD START (obj=50000427, id = 200023,
>> name="metrics-meter-tick-thread-2", group="main")
>> THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3",
>> group="main")
>> THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4",
>> group="main")
>> THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5",
>> group="main")
>> THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6",
>> group="main")
>> THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2",
>> group="main")
>> THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2",
>> group="main")
>> THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2",
>> group="main")
>> THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0",
>> group="main")
>> THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0",
>> group="main")
>> THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5",
>> group="main")
>> THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4",
>> group="main")
>> THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3",
>> group="main")
>> THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2",
>> group="main")
>> THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1",
>> group="main")
>> THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6",
>> group="main")
>> THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7",
>> group="main")
>> THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7",
>> group="main")
>> THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8",
>> group="main")
>> THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0",
>> group="main")
>> THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1",
>> group="main")
>> THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9",
>> group="main")
>> THREAD START (obj=50000427, id = 200045, name="executor-Fetch",
>> group="main")
>> TRACE 300920:
>> sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown
>> line)
>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
>> TRACE 300518:
>> java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
>> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
>> java.net.ServerSocket.implAccept(ServerSocket.java:545)
>> java.net.ServerSocket.accept(ServerSocket.java:513)
>> TRACE 300940:
>> sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown
>> line)
>> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
>> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>> sun.nio.ch.IOUtil.write(IOUtil.java:65)
>> TRACE 301003:
>> org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown
>> line)
>> org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474)
>> org.xerial.snappy.Snappy.uncompress(Snappy.java:513)
>> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147)
>> TRACE 300979:
>> sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown
>> line)
>> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
>> sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> TRACE 301630:
>> sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown
>> line)
>> sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayW
>> rapper.java:299)
>> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
>> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
>> TRACE 301259:
>> sun.misc.Unsafe.unpark(Unsafe.java:Unknown line)
>> java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.unpark
>> Successor(AbstractQueuedSynchronizer.java:662)
>> java.util.concurrent.locks.AbstractQueuedSynchronizer.releas
>> e(AbstractQueuedSynchronizer.java:1264)
>> TRACE 301559:
>> sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown line)
>> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>> sun.nio.ch.IOUtil.read(IOUtil.java:197)
>> TRACE 300590:
>> java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line)
>> java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>> java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
>> TRACE 301643:
>> scala.Tuple2.equals(Tuple2.scala:20)
>> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
>> kafka.utils.Pool.get(Pool.scala:69)
>> kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280)
>> TRACE 300592:
>> java.util.zip.ZipFile.read(ZipFile.java:Unknown line)
>> java.util.zip.ZipFile.access$1400(ZipFile.java:60)
>> java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
>> java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
>> TRACE 301018:
>> kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148)
>> kafka.message.Message.computeChecksum(Message.scala:216)
>> kafka.message.Message.isValid(Message.scala:226)
>> kafka.message.Message.ensureValid(Message.scala:232)
>> TRACE 301561:
>> java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line)
>> kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62)
>> kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49)
>> kafka.server.ReplicaManager$$anonfun$checkpointHighWatermark
>> s$2.apply(ReplicaManager.scala:945)
>> TRACE 301422:
>> java.util.Arrays.copyOf(Arrays.java:3332)
>> java.lang.AbstractStringBuilder.expandCapacity(AbstractStrin
>> gBuilder.java:137)
>> java.lang.AbstractStringBuilder.ensureCapacityInternal(Abstr
>> actStringBuilder.java:121)
>> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
>> CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017
>> rank   self  accum   count trace method
>>     1 86.46% 86.46%  604544 300920 sun.nio.ch.EPollArrayWrapper.epollWait
>>     2 12.62% 99.08%   88254 300518 java.net.PlainSocketImpl.socketAccept
>>     3  0.11% 99.19%     759 300940 sun.nio.ch.FileDispatcherImpl.write0
>>     4  0.04% 99.23%     253 301003
>> org.xerial.snappy.SnappyNative.rawUncompress
>>     5  0.03% 99.26%     231 300979 sun.nio.ch.FileDispatcherImpl.pread0
>>     6  0.03% 99.29%     220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl
>>     7  0.03% 99.32%     219 301259 sun.misc.Unsafe.unpark
>>     8  0.02% 99.34%     145 301559 sun.nio.ch.FileDispatcherImpl.read0
>>     9  0.01% 99.36%      89 300590 java.lang.ClassLoader.defineClass1
>>    10  0.01% 99.37%      87 301643 scala.Tuple2.equals
>>    11  0.01% 99.38%      79 300592 java.util.zip.ZipFile.read
>>    12  0.01% 99.39%      79 301018 kafka.utils.CoreUtils$.crc32
>>    13  0.01% 99.40%      78 301561 java.io.FileDescriptor.sync
>>    14  0.01% 99.41%      72 301422 java.util.Arrays.copyOf
>> CPU SAMPLES END
>>
>> It seems like the constant disconnects is far bigger then the 10 minutes
>> default. I suspect this has something to do with double connects, which
>> I'm
>> not sure to get around.
>>
>> On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <ma...@gmail.com>
>> wrote:
>>
>> 1. may be you can monitor thread wise cpu usage and correlate with thread
>>> dump
>>>      to identify the bottleneck
>>> 2. Broker config property connections.max.idle.ms is used to close
>>> idle connections.
>>>      default is 10min.
>>>
>>> On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <pa...@sportr.co.uk>
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I deployed Kafka about a week ago, but there are a few problems with how
>>>> Kafka behaves.
>>>> The first is the surprisingly high resource usage, one this the memory
>>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal. The
>>>> other one is the cpu usage, which starts with 20% minimum on each
>>>> broker,
>>>> which I find strange with the current throughput (which is < 1 msg/s).
>>>>
>>>> This might has something to do with something else which I find strange,
>>>> Kafka disconnects clients about every 10-20 minutes per broker. It might
>>>> have something to do with the configuration: Deployed in kubernetes,
>>>> bootstrapping with a single dns name (which is backed by all alive kafka
>>>> brokers), and then every broker has a separate dns address which is used
>>>> after the bootstrap. This means that a client is connected twice to one
>>>>
>>> of
>>>
>>>> the brokers. The reason for the bootstrap dns name is to make sure I
>>>>
>>> don't
>>>
>>>> have to update all clients to include other brokers.
>>>>
>>>> Any advice on how to solve these 2 problems?
>>>>
>>>> Thanks,
>>>> Paul
>>>>
>>>> On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <
>>>> paul@sportr.co.uk
>>>>
>>>> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I deployed Kafka about a week ago, but there are a few problems with
>>>>>
>>>> how
>>>
>>>> Kafka behaves.
>>>>> The first is the surprisingly high resource usage, one this the memory
>>>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal.
>>>>>
>>>> The
>>>
>>>> other one is the cpu usage, which starts with 20% minimum on each
>>>>>
>>>> broker,
>>>
>>>> which I find strange with the current throughput (which is < 1 msg/s).
>>>>>
>>>>> This might has something to do with something else which I find
>>>>>
>>>> strange,
>>>
>>>> Kafka disconnects clients about every 10-20 minutes per broker. It
>>>>>
>>>> might
>>>
>>>> have something to do with the configuration: Deployed in kubernetes,
>>>>> bootstrapping with a single dns name (which is backed by all alive
>>>>>
>>>> kafka
>>>
>>>> brokers), and then every broker has a separate dns address which is
>>>>>
>>>> used
>>>
>>>> after the bootstrap. This means that a client is connected twice to one
>>>>>
>>>> of
>>>>
>>>>> the brokers. The reason for the bootstrap dns name is to make sure I
>>>>>
>>>> don't
>>>>
>>>>> have to update all clients to include other brokers.
>>>>>
>>>>> Any advice on how to solve these 2 problems?
>>>>>
>>>>> Thanks,
>>>>> Paul
>>>>>
>>>>>
>

Re: Kafka high cpu usage and disconnects

Posted by Jaikiran Pai <ja...@gmail.com>.
One thing that you might want to check is the number of consumers that 
are connected/consuming against this Kafka setup. We have consistently 
noticed that the CPU usage of the broker is very high even with very few 
consumers (around 10 Java consumers). There's even a JIRA for it. From 
what I remember, it had to do with the constant hearbeat and other such 
network activities that happen between these consumers and the broker. 
We had this issues since 0.8.x days till 0.10.0.1. We just migrated to 
0.10.2.0 and we will have to see if it is still reproducible in there.

I don't mean to say you are running into the same issue, but you can 
check that aspect as well (maybe shutdown all consumers and see how the 
broker CPU behaves).

-Jaikiran

On Thursday 23 March 2017 06:15 PM, Paul van der Linden wrote:
> Thanks. I managed to get a cpu dump from staging.
>
> The output:
> THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0",
> group="system")
> THREAD START (obj=50000427, id = 200001, name="main", group="main")
> THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread",
> group="main")
> THREAD START (obj=500008e6, id = 200006,
> name="ThrottledRequestReaper-Fetch", group="main")
> THREAD START (obj=500008e6, id = 200007,
> name="ThrottledRequestReaper-Produce", group="main")
> THREAD START (obj=50000914, id = 200008,
> name="ZkClient-EventThread-18-zookeeper:2181", group="main")
> THREAD START (obj=500008e6, id = 200009, name="main-SendThread()",
> group="main")
> THREAD START (obj=50000950, id = 200010, name="main-EventThread",
> group="main")
> THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1",
> group="main")
> THREAD END (id = 200011)
> THREAD START (obj=50000427, id = 200012,
> name="metrics-meter-tick-thread-1", group="main")
> THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0",
> group="main")
> THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1",
> group="main")
> THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2",
> group="main")
> THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread-0",
> group="main")
> THREAD START (obj=50000427, id = 200017,
> name="kafka-network-thread-2-PLAINTEXT-0", group="main")
> THREAD START (obj=50000427, id = 200018,
> name="kafka-network-thread-2-PLAINTEXT-1", group="main")
> THREAD START (obj=50000427, id = 200019,
> name="kafka-network-thread-2-PLAINTEXT-2", group="main")
> THREAD START (obj=50000427, id = 200020,
> name="kafka-socket-acceptor-PLAINTEXT-9092", group="main")
> THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2",
> group="main")
> THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2",
> group="main")
> THREAD START (obj=50000427, id = 200023,
> name="metrics-meter-tick-thread-2", group="main")
> THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3",
> group="main")
> THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4",
> group="main")
> THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5",
> group="main")
> THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6",
> group="main")
> THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2",
> group="main")
> THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2",
> group="main")
> THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2",
> group="main")
> THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0",
> group="main")
> THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0",
> group="main")
> THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5",
> group="main")
> THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4",
> group="main")
> THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3",
> group="main")
> THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2",
> group="main")
> THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1",
> group="main")
> THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6",
> group="main")
> THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7",
> group="main")
> THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7",
> group="main")
> THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8",
> group="main")
> THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0",
> group="main")
> THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1",
> group="main")
> THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9",
> group="main")
> THREAD START (obj=50000427, id = 200045, name="executor-Fetch",
> group="main")
> TRACE 300920:
> sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
> TRACE 300518:
> java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
> java.net.ServerSocket.implAccept(ServerSocket.java:545)
> java.net.ServerSocket.accept(ServerSocket.java:513)
> TRACE 300940:
> sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown line)
> sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
> sun.nio.ch.IOUtil.write(IOUtil.java:65)
> TRACE 301003:
> org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown line)
> org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474)
> org.xerial.snappy.Snappy.uncompress(Snappy.java:513)
> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147)
> TRACE 300979:
> sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown line)
> sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
> sun.nio.ch.IOUtil.read(IOUtil.java:197)
> TRACE 301630:
> sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown line)
> sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299)
> sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
> sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
> TRACE 301259:
> sun.misc.Unsafe.unpark(Unsafe.java:Unknown line)
> java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264)
> TRACE 301559:
> sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown line)
> sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> sun.nio.ch.IOUtil.read(IOUtil.java:197)
> TRACE 300590:
> java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line)
> java.lang.ClassLoader.defineClass(ClassLoader.java:763)
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
> TRACE 301643:
> scala.Tuple2.equals(Tuple2.scala:20)
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
> kafka.utils.Pool.get(Pool.scala:69)
> kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280)
> TRACE 300592:
> java.util.zip.ZipFile.read(ZipFile.java:Unknown line)
> java.util.zip.ZipFile.access$1400(ZipFile.java:60)
> java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
> java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
> TRACE 301018:
> kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148)
> kafka.message.Message.computeChecksum(Message.scala:216)
> kafka.message.Message.isValid(Message.scala:226)
> kafka.message.Message.ensureValid(Message.scala:232)
> TRACE 301561:
> java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line)
> kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62)
> kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49)
> kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:945)
> TRACE 301422:
> java.util.Arrays.copyOf(Arrays.java:3332)
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
> java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
> CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017
> rank   self  accum   count trace method
>     1 86.46% 86.46%  604544 300920 sun.nio.ch.EPollArrayWrapper.epollWait
>     2 12.62% 99.08%   88254 300518 java.net.PlainSocketImpl.socketAccept
>     3  0.11% 99.19%     759 300940 sun.nio.ch.FileDispatcherImpl.write0
>     4  0.04% 99.23%     253 301003
> org.xerial.snappy.SnappyNative.rawUncompress
>     5  0.03% 99.26%     231 300979 sun.nio.ch.FileDispatcherImpl.pread0
>     6  0.03% 99.29%     220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl
>     7  0.03% 99.32%     219 301259 sun.misc.Unsafe.unpark
>     8  0.02% 99.34%     145 301559 sun.nio.ch.FileDispatcherImpl.read0
>     9  0.01% 99.36%      89 300590 java.lang.ClassLoader.defineClass1
>    10  0.01% 99.37%      87 301643 scala.Tuple2.equals
>    11  0.01% 99.38%      79 300592 java.util.zip.ZipFile.read
>    12  0.01% 99.39%      79 301018 kafka.utils.CoreUtils$.crc32
>    13  0.01% 99.40%      78 301561 java.io.FileDescriptor.sync
>    14  0.01% 99.41%      72 301422 java.util.Arrays.copyOf
> CPU SAMPLES END
>
> It seems like the constant disconnects is far bigger then the 10 minutes
> default. I suspect this has something to do with double connects, which I'm
> not sure to get around.
>
> On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <ma...@gmail.com>
> wrote:
>
>> 1. may be you can monitor thread wise cpu usage and correlate with thread
>> dump
>>      to identify the bottleneck
>> 2. Broker config property connections.max.idle.ms is used to close
>> idle connections.
>>      default is 10min.
>>
>> On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <pa...@sportr.co.uk>
>> wrote:
>>
>>> Hi,
>>>
>>> I deployed Kafka about a week ago, but there are a few problems with how
>>> Kafka behaves.
>>> The first is the surprisingly high resource usage, one this the memory
>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal. The
>>> other one is the cpu usage, which starts with 20% minimum on each broker,
>>> which I find strange with the current throughput (which is < 1 msg/s).
>>>
>>> This might has something to do with something else which I find strange,
>>> Kafka disconnects clients about every 10-20 minutes per broker. It might
>>> have something to do with the configuration: Deployed in kubernetes,
>>> bootstrapping with a single dns name (which is backed by all alive kafka
>>> brokers), and then every broker has a separate dns address which is used
>>> after the bootstrap. This means that a client is connected twice to one
>> of
>>> the brokers. The reason for the bootstrap dns name is to make sure I
>> don't
>>> have to update all clients to include other brokers.
>>>
>>> Any advice on how to solve these 2 problems?
>>>
>>> Thanks,
>>> Paul
>>>
>>> On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <paul@sportr.co.uk
>>>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I deployed Kafka about a week ago, but there are a few problems with
>> how
>>>> Kafka behaves.
>>>> The first is the surprisingly high resource usage, one this the memory
>>>> (1.5-2 GB for each broker, 3 brokers) although this might be normal.
>> The
>>>> other one is the cpu usage, which starts with 20% minimum on each
>> broker,
>>>> which I find strange with the current throughput (which is < 1 msg/s).
>>>>
>>>> This might has something to do with something else which I find
>> strange,
>>>> Kafka disconnects clients about every 10-20 minutes per broker. It
>> might
>>>> have something to do with the configuration: Deployed in kubernetes,
>>>> bootstrapping with a single dns name (which is backed by all alive
>> kafka
>>>> brokers), and then every broker has a separate dns address which is
>> used
>>>> after the bootstrap. This means that a client is connected twice to one
>>> of
>>>> the brokers. The reason for the bootstrap dns name is to make sure I
>>> don't
>>>> have to update all clients to include other brokers.
>>>>
>>>> Any advice on how to solve these 2 problems?
>>>>
>>>> Thanks,
>>>> Paul
>>>>


Re: Kafka high cpu usage and disconnects

Posted by Paul van der Linden <pa...@sportr.co.uk>.
Thanks. I managed to get a cpu dump from staging.

The output:
THREAD START (obj=50000427, id = 200004, name="RMI TCP Accept-0",
group="system")
THREAD START (obj=50000427, id = 200001, name="main", group="main")
THREAD START (obj=50000427, id = 200005, name="SensorExpiryThread",
group="main")
THREAD START (obj=500008e6, id = 200006,
name="ThrottledRequestReaper-Fetch", group="main")
THREAD START (obj=500008e6, id = 200007,
name="ThrottledRequestReaper-Produce", group="main")
THREAD START (obj=50000914, id = 200008,
name="ZkClient-EventThread-18-zookeeper:2181", group="main")
THREAD START (obj=500008e6, id = 200009, name="main-SendThread()",
group="main")
THREAD START (obj=50000950, id = 200010, name="main-EventThread",
group="main")
THREAD START (obj=50000427, id = 200011, name="pool-3-thread-1",
group="main")
THREAD END (id = 200011)
THREAD START (obj=50000427, id = 200012,
name="metrics-meter-tick-thread-1", group="main")
THREAD START (obj=50000427, id = 200014, name="kafka-scheduler-0",
group="main")
THREAD START (obj=50000427, id = 200013, name="kafka-scheduler-1",
group="main")
THREAD START (obj=50000427, id = 200015, name="kafka-scheduler-2",
group="main")
THREAD START (obj=50000c33, id = 200016, name="kafka-log-cleaner-thread-0",
group="main")
THREAD START (obj=50000427, id = 200017,
name="kafka-network-thread-2-PLAINTEXT-0", group="main")
THREAD START (obj=50000427, id = 200018,
name="kafka-network-thread-2-PLAINTEXT-1", group="main")
THREAD START (obj=50000427, id = 200019,
name="kafka-network-thread-2-PLAINTEXT-2", group="main")
THREAD START (obj=50000427, id = 200020,
name="kafka-socket-acceptor-PLAINTEXT-9092", group="main")
THREAD START (obj=500008e6, id = 200021, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200022, name="ExpirationReaper-2",
group="main")
THREAD START (obj=50000427, id = 200023,
name="metrics-meter-tick-thread-2", group="main")
THREAD START (obj=50000427, id = 200024, name="kafka-scheduler-3",
group="main")
THREAD START (obj=50000427, id = 200025, name="kafka-scheduler-4",
group="main")
THREAD START (obj=50000427, id = 200026, name="kafka-scheduler-5",
group="main")
THREAD START (obj=50000427, id = 200027, name="kafka-scheduler-6",
group="main")
THREAD START (obj=500008e6, id = 200028, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200029, name="ExpirationReaper-2",
group="main")
THREAD START (obj=500008e6, id = 200030, name="ExpirationReaper-2",
group="main")
THREAD START (obj=50000427, id = 200031, name="group-metadata-manager-0",
group="main")
THREAD START (obj=50000427, id = 200032, name="kafka-request-handler-0",
group="main")
THREAD START (obj=50000427, id = 200037, name="kafka-request-handler-5",
group="main")
THREAD START (obj=50000427, id = 200036, name="kafka-request-handler-4",
group="main")
THREAD START (obj=50000427, id = 200035, name="kafka-request-handler-3",
group="main")
THREAD START (obj=50000427, id = 200034, name="kafka-request-handler-2",
group="main")
THREAD START (obj=50000427, id = 200033, name="kafka-request-handler-1",
group="main")
THREAD START (obj=50000427, id = 200038, name="kafka-request-handler-6",
group="main")
THREAD START (obj=50000427, id = 200039, name="kafka-request-handler-7",
group="main")
THREAD START (obj=50000427, id = 200040, name="kafka-scheduler-7",
group="main")
THREAD START (obj=50000427, id = 200041, name="kafka-scheduler-8",
group="main")
THREAD START (obj=50000ee2, id = 200042, name="ReplicaFetcherThread-0-0",
group="main")
THREAD START (obj=50000ee2, id = 200043, name="ReplicaFetcherThread-0-1",
group="main")
THREAD START (obj=50000427, id = 200044, name="kafka-scheduler-9",
group="main")
THREAD START (obj=50000427, id = 200045, name="executor-Fetch",
group="main")
TRACE 300920:
sun.nio.ch.EPollArrayWrapper.epollWait(EPollArrayWrapper.java:Unknown line)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
TRACE 300518:
java.net.PlainSocketImpl.socketAccept(PlainSocketImpl.java:Unknown line)
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)
java.net.ServerSocket.implAccept(ServerSocket.java:545)
java.net.ServerSocket.accept(ServerSocket.java:513)
TRACE 300940:
sun.nio.ch.FileDispatcherImpl.write0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
sun.nio.ch.IOUtil.write(IOUtil.java:65)
TRACE 301003:
org.xerial.snappy.SnappyNative.rawUncompress(SnappyNative.java:Unknown line)
org.xerial.snappy.Snappy.rawUncompress(Snappy.java:474)
org.xerial.snappy.Snappy.uncompress(Snappy.java:513)
org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:147)
TRACE 300979:
sun.nio.ch.FileDispatcherImpl.pread0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.FileDispatcherImpl.pread(FileDispatcherImpl.java:52)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:220)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
TRACE 301630:
sun.nio.ch.EPollArrayWrapper.epollCtl(EPollArrayWrapper.java:Unknown line)
sun.nio.ch.EPollArrayWrapper.updateRegistrations(EPollArrayWrapper.java:299)
sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:268)
sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:93)
TRACE 301259:
sun.misc.Unsafe.unpark(Unsafe.java:Unknown line)
java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
java.util.concurrent.locks.AbstractQueuedSynchronizer.unparkSuccessor(AbstractQueuedSynchronizer.java:662)
java.util.concurrent.locks.AbstractQueuedSynchronizer.release(AbstractQueuedSynchronizer.java:1264)
TRACE 301559:
sun.nio.ch.FileDispatcherImpl.read0(FileDispatcherImpl.java:Unknown line)
sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
sun.nio.ch.IOUtil.read(IOUtil.java:197)
TRACE 300590:
java.lang.ClassLoader.defineClass1(ClassLoader.java:Unknown line)
java.lang.ClassLoader.defineClass(ClassLoader.java:763)
java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
TRACE 301643:
scala.Tuple2.equals(Tuple2.scala:20)
java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:940)
kafka.utils.Pool.get(Pool.scala:69)
kafka.server.ReplicaManager.getPartition(ReplicaManager.scala:280)
TRACE 300592:
java.util.zip.ZipFile.read(ZipFile.java:Unknown line)
java.util.zip.ZipFile.access$1400(ZipFile.java:60)
java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:717)
java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:419)
TRACE 301018:
kafka.utils.CoreUtils$.crc32(CoreUtils.scala:148)
kafka.message.Message.computeChecksum(Message.scala:216)
kafka.message.Message.isValid(Message.scala:226)
kafka.message.Message.ensureValid(Message.scala:232)
TRACE 301561:
java.io.FileDescriptor.sync(FileDescriptor.java:Unknown line)
kafka.server.OffsetCheckpoint.liftedTree1$1(OffsetCheckpoint.scala:62)
kafka.server.OffsetCheckpoint.write(OffsetCheckpoint.scala:49)
kafka.server.ReplicaManager$$anonfun$checkpointHighWatermarks$2.apply(ReplicaManager.scala:945)
TRACE 301422:
java.util.Arrays.copyOf(Arrays.java:3332)
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:137)
java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:121)
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:421)
CPU SAMPLES BEGIN (total = 699214) Thu Mar 23 12:41:17 2017
rank   self  accum   count trace method
   1 86.46% 86.46%  604544 300920 sun.nio.ch.EPollArrayWrapper.epollWait
   2 12.62% 99.08%   88254 300518 java.net.PlainSocketImpl.socketAccept
   3  0.11% 99.19%     759 300940 sun.nio.ch.FileDispatcherImpl.write0
   4  0.04% 99.23%     253 301003
org.xerial.snappy.SnappyNative.rawUncompress
   5  0.03% 99.26%     231 300979 sun.nio.ch.FileDispatcherImpl.pread0
   6  0.03% 99.29%     220 301630 sun.nio.ch.EPollArrayWrapper.epollCtl
   7  0.03% 99.32%     219 301259 sun.misc.Unsafe.unpark
   8  0.02% 99.34%     145 301559 sun.nio.ch.FileDispatcherImpl.read0
   9  0.01% 99.36%      89 300590 java.lang.ClassLoader.defineClass1
  10  0.01% 99.37%      87 301643 scala.Tuple2.equals
  11  0.01% 99.38%      79 300592 java.util.zip.ZipFile.read
  12  0.01% 99.39%      79 301018 kafka.utils.CoreUtils$.crc32
  13  0.01% 99.40%      78 301561 java.io.FileDescriptor.sync
  14  0.01% 99.41%      72 301422 java.util.Arrays.copyOf
CPU SAMPLES END

It seems like the constant disconnects is far bigger then the 10 minutes
default. I suspect this has something to do with double connects, which I'm
not sure to get around.

On Thu, Mar 23, 2017 at 11:46 AM, Manikumar <ma...@gmail.com>
wrote:

> 1. may be you can monitor thread wise cpu usage and correlate with thread
> dump
>     to identify the bottleneck
> 2. Broker config property connections.max.idle.ms is used to close
> idle connections.
>     default is 10min.
>
> On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <pa...@sportr.co.uk>
> wrote:
>
> > Hi,
> >
> > I deployed Kafka about a week ago, but there are a few problems with how
> > Kafka behaves.
> > The first is the surprisingly high resource usage, one this the memory
> > (1.5-2 GB for each broker, 3 brokers) although this might be normal. The
> > other one is the cpu usage, which starts with 20% minimum on each broker,
> > which I find strange with the current throughput (which is < 1 msg/s).
> >
> > This might has something to do with something else which I find strange,
> > Kafka disconnects clients about every 10-20 minutes per broker. It might
> > have something to do with the configuration: Deployed in kubernetes,
> > bootstrapping with a single dns name (which is backed by all alive kafka
> > brokers), and then every broker has a separate dns address which is used
> > after the bootstrap. This means that a client is connected twice to one
> of
> > the brokers. The reason for the bootstrap dns name is to make sure I
> don't
> > have to update all clients to include other brokers.
> >
> > Any advice on how to solve these 2 problems?
> >
> > Thanks,
> > Paul
> >
> > On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <paul@sportr.co.uk
> >
> > wrote:
> >
> > > Hi,
> > >
> > > I deployed Kafka about a week ago, but there are a few problems with
> how
> > > Kafka behaves.
> > > The first is the surprisingly high resource usage, one this the memory
> > > (1.5-2 GB for each broker, 3 brokers) although this might be normal.
> The
> > > other one is the cpu usage, which starts with 20% minimum on each
> broker,
> > > which I find strange with the current throughput (which is < 1 msg/s).
> > >
> > > This might has something to do with something else which I find
> strange,
> > > Kafka disconnects clients about every 10-20 minutes per broker. It
> might
> > > have something to do with the configuration: Deployed in kubernetes,
> > > bootstrapping with a single dns name (which is backed by all alive
> kafka
> > > brokers), and then every broker has a separate dns address which is
> used
> > > after the bootstrap. This means that a client is connected twice to one
> > of
> > > the brokers. The reason for the bootstrap dns name is to make sure I
> > don't
> > > have to update all clients to include other brokers.
> > >
> > > Any advice on how to solve these 2 problems?
> > >
> > > Thanks,
> > > Paul
> > >
> >
>

Re: Kafka high cpu usage and disconnects

Posted by Manikumar <ma...@gmail.com>.
1. may be you can monitor thread wise cpu usage and correlate with thread
dump
    to identify the bottleneck
2. Broker config property connections.max.idle.ms is used to close
idle connections.
    default is 10min.

On Thu, Mar 23, 2017 at 3:55 PM, Paul van der Linden <pa...@sportr.co.uk>
wrote:

> Hi,
>
> I deployed Kafka about a week ago, but there are a few problems with how
> Kafka behaves.
> The first is the surprisingly high resource usage, one this the memory
> (1.5-2 GB for each broker, 3 brokers) although this might be normal. The
> other one is the cpu usage, which starts with 20% minimum on each broker,
> which I find strange with the current throughput (which is < 1 msg/s).
>
> This might has something to do with something else which I find strange,
> Kafka disconnects clients about every 10-20 minutes per broker. It might
> have something to do with the configuration: Deployed in kubernetes,
> bootstrapping with a single dns name (which is backed by all alive kafka
> brokers), and then every broker has a separate dns address which is used
> after the bootstrap. This means that a client is connected twice to one of
> the brokers. The reason for the bootstrap dns name is to make sure I don't
> have to update all clients to include other brokers.
>
> Any advice on how to solve these 2 problems?
>
> Thanks,
> Paul
>
> On Tue, Mar 21, 2017 at 10:30 AM, Paul van der Linden <pa...@sportr.co.uk>
> wrote:
>
> > Hi,
> >
> > I deployed Kafka about a week ago, but there are a few problems with how
> > Kafka behaves.
> > The first is the surprisingly high resource usage, one this the memory
> > (1.5-2 GB for each broker, 3 brokers) although this might be normal. The
> > other one is the cpu usage, which starts with 20% minimum on each broker,
> > which I find strange with the current throughput (which is < 1 msg/s).
> >
> > This might has something to do with something else which I find strange,
> > Kafka disconnects clients about every 10-20 minutes per broker. It might
> > have something to do with the configuration: Deployed in kubernetes,
> > bootstrapping with a single dns name (which is backed by all alive kafka
> > brokers), and then every broker has a separate dns address which is used
> > after the bootstrap. This means that a client is connected twice to one
> of
> > the brokers. The reason for the bootstrap dns name is to make sure I
> don't
> > have to update all clients to include other brokers.
> >
> > Any advice on how to solve these 2 problems?
> >
> > Thanks,
> > Paul
> >
>