You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Lukas Lalinsky <lu...@exponea.com> on 2017/09/15 14:30:03 UTC
Kafka 0.11 broker running out of file descriptors
Hello,
I'm dealing with a strange issue in production and I'm running out of
options what to do about it.
It's a 3 node cluster running Kafka 0.11.0.1 with most topics having
replication factor of 2. At some point, the broker that is about do die
shrinks ISR for a few partitions just to itself:
[2017-09-15 11:25:29,104] INFO Partition [...,12] on broker 3: Shrinking
ISR from 3,2 to 3 (kafka.cluster.Partition)
[2017-09-15 11:25:29,107] INFO Partition [...,8] on broker 3: Shrinking ISR
from 3,1 to 3 (kafka.cluster.Partition)
[2017-09-15 11:25:29,108] INFO Partition [...,38] on broker 3: Shrinking
ISR from 3,2 to 3 (kafka.cluster.Partition)
Then slightly after that, another broker writes errors like this to the log
file:
[2017-09-15 11:25:45,536] WARN [ReplicaFetcherThread-0-3]: Error in fetch
to broker 3, request (type=FetchRequest, replicaId=2, maxWait=500,
minBytes=1, maxBytes=10485760, fetchData={...})
(kafka.server.ReplicaFetcherThread)
java.io.IOException: Connection to 3 was disconnected before the response
was read
at
org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:93)
at
kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:93)
at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:207)
at
kafka.server.ReplicaFetcherThread.fetch(ReplicaFetcherThread.scala:42)
at
kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:151)
at
kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
There are many of such messages. At that point, I see the number of open
file descriptors on the other broker growing. And eventually it crashes
with thousands of messages like this:
[2017-09-15 11:31:23,273] ERROR Error while accepting connection
(kafka.network.Acceptor)
java.io.IOException: Too many open files
at sun.nio.ch.ServerSocketChannelImpl.accept0(Native Method)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:422)
at
sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:250)
at kafka.network.Acceptor.accept(SocketServer.scala:337)
at kafka.network.Acceptor.run(SocketServer.scala:280)
at java.lang.Thread.run(Thread.java:745)
The file descriptor limit is set to 128k, the number of open file
descriptors during normal operation is about 8k, so there is a lot of
headroom.
I'm not sure if it's the other brokers trying to replicate that kills it,
or whether it's clients trying to publish messages.
Has anyone seen a behavior like this? I'd appreciate any pointers.
Thanks,
Lukas