You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@activemq.apache.org by "Francesco Nigro (Jira)" <ji...@apache.org> on 2020/06/11 09:53:00 UTC

[jira] [Comment Edited] (ARTEMIS-2696) Netty DirectBufferLeak noticed in the log and server not processing messages with out of direct memory errors after running for a day

    [ https://issues.apache.org/jira/browse/ARTEMIS-2696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17133092#comment-17133092 ] 

Francesco Nigro edited comment on ARTEMIS-2696 at 6/11/20, 9:52 AM:
--------------------------------------------------------------------

Thanks [~baselzinga], that seems important indeed...I'm looking into your change to understand how `handler.pushBytes` is supposed to handle the ownership of `buffer` and what we expect them to do...No idea yet what changes could have caused this to happen just recently

My concern is that if you release it before it get written by Netty it risk to cause other issues...


was (Author: nigro.fra@gmail.com):
Thanks [~baselzinga], that seems important indeed...I'm looking into your change to understand how `handler.pushBytes` is supposed to handle the ownership of `buffer` and what we expect them to do...No idea yet what changes could have caused this to happen just recently

> Netty DirectBufferLeak noticed in the log and server not processing messages with out of direct memory errors after running for a day
> -------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: ARTEMIS-2696
>                 URL: https://issues.apache.org/jira/browse/ARTEMIS-2696
>             Project: ActiveMQ Artemis
>          Issue Type: Bug
>          Components: Broker
>    Affects Versions: 2.11.0, 2.13.0
>         Environment: Ubuntu 18.04.1 LTS
> Netty 4.1.45
> Qpid 0.48
>  
>  
>            Reporter: Bas
>            Priority: Major
>
> Our servers started to crash daily once we were on the 2.11.0 version.
> The crashes were not killing the entire artemis server but will cause endless log statements of the following:
> Transport failed: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 637534215, max: 652738560)
> We launched an investigation and activated the netty advanced leak detection: -Dio.netty.leakDetectionLevel=advanced
> After which we indeed noticed leak detection log entries with following information:
> #1:
> io.netty.buffer.AdvancedLeakAwareByteBuf.readBytes(AdvancedLeakAwareByteBuf.java:484) org.apache.activemq.artemis.core.server.protocol.websocket.WebSocketFrameEncoder.writeContinuationFrame(WebSocketFrameEncoder.java:56) org.apache.activemq.artemis.core.server.protocol.websocket.WebSocketFrameEncoder.write(WebSocketFrameEncoder.java:45) io.netty.channel.AbstractChannelHandlerContext.invokeWrite0(AbstractChannelHandlerContext.java:715) io.netty.channel.AbstractChannelHandlerContext.invokeWriteAndFlush(AbstractChannelHandlerContext.java:762) io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:788) io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756) io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020) io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:299) org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:382) org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:286) org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:281) org.apache.activemq.artemis.protocol.amqp.broker.AMQPConnectionCallback.onTransport(AMQPConnectionCallback.java:203) org.apache.activemq.artemis.protocol.amqp.proton.AMQPConnectionContext.pushBytes(AMQPConnectionContext.java:384) org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:210) io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>  
> #2: Hint: 'websocket-frame-encoder' will handle the message from this point. io.netty.channel.DefaultChannelPipeline.touch(DefaultChannelPipeline.java:116) io.netty.channel.AbstractChannelHandlerContext.write(AbstractChannelHandlerContext.java:784) io.netty.channel.AbstractChannelHandlerContext.writeAndFlush(AbstractChannelHandlerContext.java:756) io.netty.channel.DefaultChannelPipeline.writeAndFlush(DefaultChannelPipeline.java:1020) io.netty.channel.AbstractChannel.writeAndFlush(AbstractChannel.java:299) org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:382) org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:286) org.apache.activemq.artemis.core.remoting.impl.netty.NettyConnection.write(NettyConnection.java:281) org.apache.activemq.artemis.protocol.amqp.broker.AMQPConnectionCallback.onTransport(AMQPConnectionCallback.java:203) org.apache.activemq.artemis.protocol.amqp.proton.AMQPConnectionContext.pushBytes(AMQPConnectionContext.java:384) org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:210) io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>  
> #3: io.netty.buffer.AdvancedLeakAwareByteBuf.writeBytes(AdvancedLeakAwareByteBuf.java:622) org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:207) io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>  
> Created at: io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:349) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) org.apache.activemq.artemis.protocol.amqp.proton.handler.ProtonHandler.actualFlush(ProtonHandler.java:206) io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:164) io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:472) io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:500) io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118)
>  
> We looked into the source code of some classes in the above stack traces to look for changes where a directbuffer.release() would be missing but did not find any. It is also though to grasp what is happening in these classes because I really need to look into the concepts here if I want to understand what is happening and what needs to happen. Which will take a lot of time. But something got my attention where I might be totally wrong in my estimation as being related to the issue.
>  
> The only thing I have on this is the following information has a direct relation with a class in the leak report from netty: [https://github.com/apache/activemq-artemis/blob/master/artemis-server/src/main/java/org/apache/activemq/artemis/core/server/protocol/websocket/WebSocketFrameEncoder.java]
> This class is added in following commit doing something with a websocket continuation frame:
> [https://github.com/apache/activemq-artemis/commit/9fac4b866cf9cefbb6f7c13b820e07455b6649f5]
>  
> This commit is not in 2.10:
> git tag --contains 9fac4b8
> 2.11.0
> So, we wanted to test if doing a rollback to 2.10 would fix the issue. The 2.10 version is now running for 3 days without having issues while with 2.11 it would have crashed with out of memory errors 3 times already.
>  
> The issue could be in a different commit/location than what is analysed above but 2.10 seems to work without issues.
> We also wiped our entine journal before going to 2.10 so I want to test the 2.11 again with a clean journal and see if wiping the data store could have caused 2.10 to function better. So more information will follow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)