You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "stack (JIRA)" <ji...@apache.org> on 2018/06/27 18:47:00 UTC

[jira] [Commented] (HBASE-20793) Master can't RPC: OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4143972700, max: 4151836672)

    [ https://issues.apache.org/jira/browse/HBASE-20793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16525453#comment-16525453 ] 

stack commented on HBASE-20793:
-------------------------------

Via [~mdrob@cloudera.com], this might be netty-only issue (OutOfDirectMemoryError is from netty).

Other stuff from drob:

* Setting DEBUG on org.apache.hadoop.hbase.shaded.io.netty.util.internal.PlatformDependent dumps netty startup configs.
* When io.netty.maxDirectMemory system property is not set, (or explicitly set to < 0), netty gives itself a max direct memory pool equal to the JDK max direct memory but independent from the rest of JDK memory. 

        // Here is how the system property is used:
        //
        // * <  0  - Don't use cleaner, and inherit max direct memory from java. In this case the
        //           "practical max direct memory" would be 2 * max memory as defined by the JDK.
        // * == 0  - Use cleaner, Netty will not enforce max memory, and instead will defer to JDK.
        // * >  0  - Don't use cleaner. This will limit Netty's total direct memory
        //           (note: that JDK's direct memory limit is independent of this).
        long maxDirectMemory = SystemPropertyUtil.getLong("io.netty.maxDirectMemory", -1);



> Master can't RPC: OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4143972700, max: 4151836672)
> ----------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-20793
>                 URL: https://issues.apache.org/jira/browse/HBASE-20793
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 2.0.0
>            Reporter: stack
>            Priority: Major
>
> Master is hung up unable to RPC out to the cluster. It is failing with the below:
> {code}
> Caused by: java.net.SocketTimeoutException: callTimeout=60000, callDuration=69047: Call to ve0801.XYZ.com/10.10.10.10:22101 failed on local exception: java.io.IOException: org.apache.hbase.thirdparty.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4143972700, max: 4151836672) row '' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=ve0801.halxg.cloudera.com,22101,1529611440163, seqNum=-1
>         at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:159)
>         at org.apache.hadoop.hbase.client.ResultBoundedCompletionService$QueueingFuture.run(ResultBoundedCompletionService.java:80)
>         ... 3 more
> Caused by: java.io.IOException: Call to ve0801.XYZ.com/10.10.10.10:22101 failed on local exception: java.io.IOException: org.apache.hbase.thirdparty.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4143972700, max: 4151836672)
>         at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:180)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
>         at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
>         at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
>         at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
>         at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.cleanupCalls(NettyRpcDuplexHandler.java:202)
>         at org.apache.hadoop.hbase.ipc.NettyRpcDuplexHandler.exceptionCaught(NettyRpcDuplexHandler.java:219)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:264)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireExceptionCaught(AbstractChannelHandlerContext.java:256)
>         at org.apache.hbase.thirdparty.io.netty.channel.ChannelInboundHandlerAdapter.exceptionCaught(ChannelInboundHandlerAdapter.java:131)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeExceptionCaught(AbstractChannelHandlerContext.java:285)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.notifyHandlerException(AbstractChannelHandlerContext.java:850)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:364)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>         at org.apache.hbase.thirdparty.io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
>         at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
>         at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
>         at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:801)
>         at org.apache.hbase.thirdparty.io.netty.channel.epoll.AbstractEpollChannel$AbstractEpollUnsafe$1.run(AbstractEpollChannel.java:412)
>         at org.apache.hbase.thirdparty.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
>         at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
>         at org.apache.hbase.thirdparty.io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309)
>         at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
>         at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
>         ... 1 more
> Caused by: java.io.IOException: org.apache.hbase.thirdparty.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4143972700, max: 4151836672)
>         at org.apache.hadoop.hbase.ipc.IPCUtil.toIOE(IPCUtil.java:148)
>         ... 26 more
> Caused by: org.apache.hbase.thirdparty.io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 4143972700, max: 4151836672)
>         at org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:640)
>         at org.apache.hbase.thirdparty.io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:594)
>         at org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:764)
>         at org.apache.hbase.thirdparty.io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:740)
>         at org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:244)
>         at org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.allocate(PoolArena.java:226)
>         at org.apache.hbase.thirdparty.io.netty.buffer.PoolArena.reallocate(PoolArena.java:397)
>         at org.apache.hbase.thirdparty.io.netty.buffer.PooledByteBuf.capacity(PooledByteBuf.java:118)
>         at org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBuf.ensureWritable0(AbstractByteBuf.java:285)
>         at org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBuf.ensureWritable(AbstractByteBuf.java:265)
>         at org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1077)
>         at org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1070)
>         at org.apache.hbase.thirdparty.io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1060)
>         at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:92)
>         at org.apache.hbase.thirdparty.io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:263)
>         at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
> {code}
> Perhaps we are leaking  -- fun, fun, see https://github.com/jeffgriffith/native-jvm-leaks -- or perhaps we actually need 4G of offheap running a Master for 650 nodes and 300k regions.
> Filing this issue as our first foray into the wonderful world of offheap accounting and tracking.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)