You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pulsar.apache.org by GitBox <gi...@apache.org> on 2019/12/19 12:03:11 UTC

[GitHub] [pulsar] andrekramer1 opened a new issue #5896: Out of Memory with many consumers

andrekramer1 opened a new issue #5896: Out of Memory with many consumers
URL: https://github.com/apache/pulsar/issues/5896
 
 
   Load testing on one standalone Pulsar with 4 producers and many consumers (1000s) we are getting out of memory errors - failing to allocate direct memory. Increasing memory has not helped with enough producers, Pulsar is failing with exceptions like the following when there are many consumers:
   -------------------------------------------------------------------------------------------------------
   [pulsar-client-io-1-1] WARN org.apache.pulsar.client.impl.ClientCnx - [minnow.apama.com/10.42.100.81:6650] Got exception OutOfMemoryError : GC overhead limit exceeded
   java.lang.OutOfMemoryError: GC overhead limit exceeded
           at org.apache.pulsar.shade.io.netty.util.Recycler$Stack.newHandle(Recycler.java:658)
           at org.apache.pulsar.shade.io.netty.util.Recycler.get(Recycler.java:163)
           at org.apache.pulsar.common.api.proto.PulsarApi$MessageIdData$Builder.buildPartial(PulsarApi.java:1339)
           at org.apache.pulsar.common.api.proto.PulsarApi$CommandMessage$Builder.mergeFrom(PulsarApi.java:15892)
           at org.apache.pulsar.common.api.proto.PulsarApi$CommandMessage$Builder.mergeFrom(PulsarApi.java:15745)
           at org.apache.pulsar.common.util.protobuf.ByteBufCodedInputStream.readMessage(ByteBufCodedInputStream.java:124)
           at org.apache.pulsar.common.api.proto.PulsarApi$BaseCommand$Builder.mergeFrom(PulsarApi.java:28839)
           at org.apache.pulsar.common.protocol.PulsarDecoder.channelRead(PulsarDecoder.java:83)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
           at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323)
           at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:310)
           at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:426)
           at org.apache.pulsar.shade.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:278)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
           at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1434)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
           at org.apache.pulsar.shade.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
           at org.apache.pulsar.shade.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:965)
           at org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:799)
           at org.apache.pulsar.shade.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:382)
           at org.apache.pulsar.shade.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
           at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
           at org.apache.pulsar.shade.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:335)
           at org.apache.pulsar.shade.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:909)
           at org.apache.pulsar.shade.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
           at java.lang.Thread.run(Thread.java:745)
   -------------------------------------------------------------------------------------------------------
   Looking into a memory dump showed many buffers allocated to nio/netty.
   
   This issue could be related to 5751, 5720.
   
   We were load testing on a single Linux host with plenty of physical memory.
   
   
   Possibly related to issues 5513, 4196, 4632, we've also seen one crash due to Direct memory error but that seemed related to Bookkeeper processing:
   -------------------------------------------------------------------------------------------------------
   
   9:50:46.759 [BookieShutdownTrigger] ERROR org.apache.bookkeeper.bookie.BookieThread - Uncaught exception in thread BookieShutdownTrigger
   io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 134217728, max: 134217728)
           at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:655) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:610) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:769) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:745) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena.allocate(PoolArena.java:226) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PoolArena.allocate(PoolArena.java:146) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:324) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:185) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:176) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:113) ~[io.netty-netty-all-4.1.32.Final.jar:4.1.32.Final]
           at org.apache.bookkeeper.bookie.EntryLogger$BufferedLogChannel.appendLedgersMap(EntryLogger.java:145) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.EntryLogManagerBase.createNewLog(EntryLogManagerBase.java:159) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog.getCurrentLogForLedgerForAddEntry(EntryLogManagerForSingleEntryLog.java:106) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.EntryLogManagerBase.addEntry(EntryLogManagerBase.java:72) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.EntryLogManagerForSingleEntryLog.addEntry(EntryLogManagerForSingleEntryLog.java:87) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.EntryLogger.addEntry(EntryLogger.java:619) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.lambda$checkpoint$6(SingleDirectoryDbLedgerStorage.java:597) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.storage.ldb.WriteCache.forEach(WriteCache.java:268) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.checkpoint(SingleDirectoryDbLedgerStorage.java:595) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.flush(SingleDirectoryDbLedgerStorage.java:686) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.shutdown(SingleDirectoryDbLedgerStorage.java:221) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.shutdown(DbLedgerStorage.java:161) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.Bookie.shutdown(Bookie.java:1172) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
           at org.apache.bookkeeper.bookie.Bookie$6.run(Bookie.java:1132) ~[org.apache.bookkeeper-bookkeeper-server-4.9.2.jar:4.9.2]
   250.253: Total time for which application threads were stopped: 0.0002466 seconds, Stopping threads took: 0.0000837 seconds
   09:50:46.839 [BookieDeathWatcher-3181] INFO  org.apache.bookkeeper.proto.BookieServer - BookieDeathWatcher noticed the bookie is not running any more, exiting the watch loop!
   250.325: Total time for which application threads were stopped: 0.0013974 seconds, Stopping threads took: 0.0011630 seconds
   09:50:46.842 [component-shutdown-thread] INFO  org.apache.bookkeeper.common.component.ComponentStarter - Closing component bookie-server in shutdown hook.
   09:50:46.846 [component-shutdown-thread] INFO  org.apache.bookkeeper.proto.BookieServer - Shutting down BookieServer

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [pulsar] andrekramer1 commented on issue #5896: Out of Memory with many consumers

Posted by GitBox <gi...@apache.org>.
andrekramer1 commented on issue #5896: Out of Memory with many consumers
URL: https://github.com/apache/pulsar/issues/5896#issuecomment-576363300
 
 
   The OOM appear to be in reading from Bookkeeper and possibly the broker is issuing too many read requests at a time. We've had some limited success with the dispatcher throttling introduced in 2.5. Using both:
   ./bin/pulsar-admin namespaces set-dispatch-rate public/default \
     --msg-dispatch-rate 500 \
     --dispatch-rate-period 1
   ./bin/pulsar-admin namespaces set-subscription-dispatch-rate public/default \
     --msg-dispatch-rate 50 \
     --dispatch-rate-period 1
   allows runs to not OOM most but not all of the time.
   
   A rate limit is difficult to use as the number of topics and subscriptions will not be known and we would have to plan for the worst case scenario. 
   
   One or more new limits on bookie parallel reads seem to be required? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services