You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by Joseph Francis <Jo...@skyscanner.net> on 2016/07/06 09:31:35 UTC

Kafka OOME: Direct buffer memory

We are running kafka 0.9.0.1 in production and saw these exceptions:


[2016-06-23 22:55:10,239] INFO [KafkaApi-3] Closing connection due to error during produce request with correlation id 6 from client id kafka-python with ack=0


Topic and partition to exceptions: [xyx,8] -> kafka.common.MessageSizeTooLargeException (kafka.server.KafkaApis)


[2016-06-23 22:55:41,917] INFO Scheduling log segment 95455988 for log abc_json-7 for deletion. (kafka.log.Log)


[2016-06-23 22:55:41,921] INFO Scheduling log segment 2036034857 for log xyz_json-3 for deletion. (kafka.log.Log)


[2016-06-23 22:55:51,112] INFO Rolled new log segment for 'abc_json-7' in 1 ms. (kafka.log.Log)


[2016-06-23 22:55:59,411] ERROR Processor got uncaught exception. (kafka.network.Processor)


java.lang.OutOfMemoryError: Direct buffer memory


        at java.nio.Bits.reserveMemory(Bits.java:658)


        at java.nio.DirectByteBuffer.<init>(DirectByteBuffer.java:123)


        at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:306)


        at sun.nio.ch.Util.getTemporaryDirectBuffer(Util.java:174)


        at sun.nio.ch.IOUtil.read(IOUtil.java:195)


        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)


        at org.apache.kafka.common.network.PlaintextTransportLayer.read(PlaintextTransportLayer.java:108)


        at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:97)


        at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:71)


        at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:153)


        at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:134)


        at org.apache.kafka.common.network.Selector.poll(Selector.java:286)


        at kafka.network.Processor.run(SocketServer.scala:413)


        at java.lang.Thread.run(Thread.java:745)


2 of our brokers were affected and caused them to slowdown.

Prior to seeing these exceptions, we did partition reassignments on the cluster and observed a steep decrease in cache usage on the nodes.


Not quite sure of the exact cause of the "Direct buffer memory" exceptions, is it simply the case that kafka is receiving too large messages that it cant hold in memory?


Thanks,

Joseph