You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Daniel Jeliński <dj...@gmail.com> on 2017/11/06 14:33:31 UTC

Re: OutOfMemoryError: Direct buffer memory on PUT

For others that run into similar issue, it turned out that the
OutOfMemoryError was thrown (and subsequently hidden) on the client side.
The error was caused by excessive direct memory usage in Java NIO's
bytebuffer caching (described here:
http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
-Djdk.nio.maxCachedBufferSize=262144
allowed the application to complete.

Yet another proof that correct handling of OOME is hard.
Thanks,
Daniel

2017-10-11 11:33 GMT+02:00 Daniel Jeliński <dj...@gmail.com>:

> Thanks for the hints. I'll see if we can explicitly set
> MaxDirectMemorySize to a safe number.
> Thanks,
> Daniel
>
> 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <es...@cloudera.com>:
>
>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
>> classes/sun/misc/VM.java#l184
>>
>>     // The initial value of this field is arbitrary; during JRE
>> initialization
>>     // it will be reset to the value specified on the command line, if
>> any,
>>     // otherwise to Runtime.getRuntime().maxMemory().
>>
>> which goes all the way down to memory/heap.cpp to whatever was left to the
>> reserved memory depending on the flags and the platform used as Vladimir
>> says.
>>
>> Also, depending on which distribution and features are used there are
>> specific guidelines about setting that parameter so mileage might vary.
>>
>> thanks,
>> esteban.
>>
>>
>>
>> --
>> Cloudera, Inc.
>>
>>
>> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
>> vladrodionov@gmail.com>
>> wrote:
>>
>> > >> The default value is zero, which means the maximum direct memory is
>> > unbounded.
>> >
>> > That is not correct. If you do not specify MaxDirectMemorySize, default
>> is
>> > platform specific
>> >
>> > The link above is for JRockit JVM I presume?
>> >
>> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
>> esteban@cloudera.com>
>> > wrote:
>> >
>> > > I don't think is truly unbounded, IIRC it s limited to the maximum
>> > > allocated heap.
>> > >
>> > > thanks,
>> > > esteban.
>> > >
>> > > --
>> > > Cloudera, Inc.
>> > >
>> > >
>> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yu...@gmail.com> wrote:
>> > >
>> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionxx.
>> htm :
>> > > >
>> > > > java -XX:MaxDirectMemorySize=2g myApp
>> > > >
>> > > > Default Value
>> > > >
>> > > > The default value is zero, which means the maximum direct memory is
>> > > > unbounded.
>> > > >
>> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
>> > > > vladrodionov@gmail.com>
>> > > > wrote:
>> > > >
>> > > > > >> XXMaxDirectMemorySize is set to the default 0, which means
>> > unlimited
>> > > > as
>> > > > > far
>> > > > > >> as I can tell.
>> > > > >
>> > > > > Not sure if this is true. The only conforming that link I found
>> was
>> > for
>> > > > > JRockit JVM.
>> > > > >
>> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
>> > djelinski1@gmail.com
>> > > >
>> > > > > wrote:
>> > > > >
>> > > > > > Vladimir,
>> > > > > > XXMaxDirectMemorySize is set to the default 0, which means
>> > unlimited
>> > > as
>> > > > > far
>> > > > > > as I can tell.
>> > > > > > Thanks,
>> > > > > > Daniel
>> > > > > >
>> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
>> > vladrodionov@gmail.com
>> > > >:
>> > > > > >
>> > > > > > > Have you try to increase direct memory size for server
>> process?
>> > > > > > > -XXMaxDirectMemorySize=?
>> > > > > > >
>> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
>> > > > djelinski1@gmail.com>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > Hello,
>> > > > > > > > I'm running an application doing a lot of Puts (size
>> anywhere
>> > > > > between 0
>> > > > > > > and
>> > > > > > > > 10MB, one cell at a time); occasionally I'm getting an error
>> > like
>> > > > the
>> > > > > > > > below:
>> > > > > > > > 2017-10-09 04:29:29,811 WARN  [AsyncProcess] - #13368,
>> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1
>> failed=1ops,
>> > > last
>> > > > > > > > exception: java.io.IOException: com.google.protobuf.
>> > > > > ServiceException:
>> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on
>> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534, tracking
>> > started
>> > > > Mon
>> > > > > > Oct
>> > > > > > > 09
>> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure
>> > > > > > > >
>> > > > > > > > After that the connection to RegionServer becomes unusable.
>> > Every
>> > > > > > > > subsequent attempt to execute Put on that connection
>> results in
>> > > > > > > > CallTimeoutException. I only found the OutOfMemory by
>> reducing
>> > > the
>> > > > > > number
>> > > > > > > > of tries to 1.
>> > > > > > > >
>> > > > > > > > The host running HBase appears to have at least a few GB of
>> > free
>> > > > > memory
>> > > > > > > > available. Server logs do not mention anything about this
>> > error.
>> > > > > > Cluster
>> > > > > > > is
>> > > > > > > > running HBase 1.2.0-cdh5.10.2.
>> > > > > > > >
>> > > > > > > > Is this a known problem? Are there workarounds available?
>> > > > > > > > Thanks,
>> > > > > > > > Daniel
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: OutOfMemoryError: Direct buffer memory on PUT

Posted by Huaxiang Sun <hs...@cloudera.com>.
We run into one similar case for the replication at the DR cluster. Turned out that I filed HBASE-19320 <https://issues.apache.org/jira/browse/HBASE-19320>
without knowing works done here. The way I detected the DM leak is using the metrix to find the direct memory usage and
heap dump to analyze the place which holds DM.

Another alternative to avoid OOME for DM as mentioned in the discussion in the jira is to to ayncRpcClient, just FYI.

Thanks,
Huaxiang


> On Nov 8, 2017, at 10:25 AM, Stack <st...@duboce.net> wrote:
> 
> On Wed, Nov 8, 2017 at 3:31 AM, Abhishek Singh Chouhan <
> abhishekchouhan121@gmail.com> wrote:
> 
>> I faced the same issue and have been debugging this for some time now(the
>> logging is not very helpful as daniel mentions :)).
>> Looking deeper into this i realized that the side effects also are large
>> incorrect byte buffer allocations on the server side apart from call
>> timeouts on the client side.
>> Have filed HBASE-19215 <https://issues.apache.org/jira/browse/HBASE-19215>
>> for
>> this
>> 
>> 
> Thank you lads for the info. Lets carry-on over in HBASE-19215. Good one.
> S
> 
> 
> 
>> On Wed, Nov 8, 2017 at 4:05 PM, Daniel Jeliński <dj...@gmail.com>
>> wrote:
>> 
>>> 2017-11-07 18:22 GMT+01:00 Stack <st...@duboce.net>:
>>> 
>>>> On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <dj...@gmail.com>
>>>> wrote:
>>>> 
>>>>> For others that run into similar issue, it turned out that the
>>>>> OutOfMemoryError was thrown (and subsequently hidden) on the client
>>> side.
>>>>> The error was caused by excessive direct memory usage in Java NIO's
>>>>> bytebuffer caching (described here:
>>>>> http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
>>>>> -Djdk.nio.maxCachedBufferSize=262144
>>>>> allowed the application to complete.
>>>>> 
>>>>> 
>>>> Suggestions for how to expose the client-side OOME Daniel? We should
>> add
>>>> note to the thrown exception about "-Djdk.nio.maxCachedBufferSize"
>> (and
>>>> make sure the exception makes it out!)
>>>> 
>>> 
>>> Well I found the problem by adding printStackTrace to
>>> AsyncProcess.createLog function, which was responsible for logging the
>>> original OOME. This is not very elegant, and I wouldn't recommend adding
>> it
>>> to the official codebase, but the stack trace offers some hints:
>>> 
>>> java.io.IOException: com.google.protobuf.ServiceException:
>>> java.lang.OutOfMemoryError: Direct buffer memory
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(
>>> ProtobufUtil.java:329)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.client.MultiServerCallable.
>>> call(MultiServerCallable.java:130)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.client.MultiServerCallable.
>>> call(MultiServerCallable.java:53)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
>>> RpcRetryingCaller.java:200)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$
>>> SingleServerRequestRunnable.run(AsyncProcess.java:727)
>>> 
>>>                        at
>>> java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>>> 
>>>                        at java.util.concurrent.FutureTask.run(Unknown
>>> Source)
>>> 
>>>                        at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>>> 
>>>                        at
>>> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>>> 
>>>                        at java.lang.Thread.run(Unknown Source)
>>> 
>>> Caused by: com.google.protobuf.ServiceException:
>>> java.lang.OutOfMemoryError: Direct buffer memory
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
>>> AbstractRpcClient.java:240)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient$
>>> BlockingRpcChannelImplementation.callBlockingMethod(
>>> AbstractRpcClient.java:336)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
>>> BlockingStub.multi(ClientProtos.java:34142)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.client.MultiServerCallable.
>>> call(MultiServerCallable.java:128)
>>> 
>>>                        ... 8 more
>>> 
>>> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>>> 
>>>                        at java.nio.Bits.reserveMemory(Unknown Source)
>>> 
>>>                        at java.nio.DirectByteBuffer.<init>(Unknown
>>> Source)
>>> 
>>>                        at java.nio.ByteBuffer.allocateDirect(Unknown
>>> Source)
>>> 
>>>                        at sun.nio.ch.Util.getTemporaryDirectBuffer(
>>> Unknown
>>> Source)
>>> 
>>>                        at sun.nio.ch.IOUtil.write(Unknown Source)
>>> 
>>>                        at sun.nio.ch.SocketChannelImpl.write(Unknown
>>> Source)
>>> 
>>>                        at
>>> org.apache.hadoop.net.SocketOutputStream$Writer.
>>> performIO(SocketOutputStream.java:63)
>>> 
>>>                        at
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(
>>> SocketIOWithTimeout.java:142)
>>> 
>>>                        at
>>> org.apache.hadoop.net.SocketOutputStream.write(
>>> SocketOutputStream.java:159)
>>> 
>>>                        at
>>> org.apache.hadoop.net.SocketOutputStream.write(
>>> SocketOutputStream.java:117)
>>> 
>>>                        at
>>> org.apache.hadoop.security.SaslOutputStream.write(
>>> SaslOutputStream.java:169)
>>> 
>>>                        at java.io.BufferedOutputStream.write(Unknown
>>> Source)
>>> 
>>>                        at java.io.DataOutputStream.write(Unknown
>> Source)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
>>> writeRequest(RpcClientImpl.java:921)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(
>>> RpcClientImpl.java:874)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)
>>> 
>>>                        at
>>> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
>>> AbstractRpcClient.java:227)
>>> 
>>>                        ... 11 more
>>> This stack trace comes from cdh5.10.2 version, but the master branch is
>>> sufficiently similar. So, depending on what we want to achieve, we could:
>>> - just replace catch(Throwable e) in AbstractRpcClient.
>> callBlockingMethod
>>> with something more fine-grained and fail the application
>>> - or forward OOME in callBlockingMethod, but add information about
>>> maxCachedBufferSize,
>>> also failing the application but suggesting possible corrective action to
>>> the user
>>> - or pass the error to the user, allowing the application to intercept
>> it.
>>> Not sure yet how to do that, but we would need to do something about the
>>> connection becoming unusable after OOME, in case user decides to keep
>>> going.
>>> What's your take?
>>> 
>>> 
>>> 
>>> 
>>>> Thanks for updating the list,
>>>> S
>>>> 
>>>> 
>>>> 
>>>>> Yet another proof that correct handling of OOME is hard.
>>>>> Thanks,
>>>>> Daniel
>>>>> 
>>>>> 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <dj...@gmail.com>:
>>>>> 
>>>>>> Thanks for the hints. I'll see if we can explicitly set
>>>>>> MaxDirectMemorySize to a safe number.
>>>>>> Thanks,
>>>>>> Daniel
>>>>>> 
>>>>>> 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <esteban@cloudera.com
>>> :
>>>>>> 
>>>>>>> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
>>>>>>> classes/sun/misc/VM.java#l184
>>>>>>> 
>>>>>>>    // The initial value of this field is arbitrary; during JRE
>>>>>>> initialization
>>>>>>>    // it will be reset to the value specified on the command
>> line,
>>> if
>>>>>>> any,
>>>>>>>    // otherwise to Runtime.getRuntime().maxMemory().
>>>>>>> 
>>>>>>> which goes all the way down to memory/heap.cpp to whatever was
>> left
>>> to
>>>>> the
>>>>>>> reserved memory depending on the flags and the platform used as
>>>> Vladimir
>>>>>>> says.
>>>>>>> 
>>>>>>> Also, depending on which distribution and features are used there
>>> are
>>>>>>> specific guidelines about setting that parameter so mileage might
>>>> vary.
>>>>>>> 
>>>>>>> thanks,
>>>>>>> esteban.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Cloudera, Inc.
>>>>>>> 
>>>>>>> 
>>>>>>> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
>>>>>>> vladrodionov@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>>>> The default value is zero, which means the maximum direct
>>> memory
>>>> is
>>>>>>>> unbounded.
>>>>>>>> 
>>>>>>>> That is not correct. If you do not specify MaxDirectMemorySize,
>>>>> default
>>>>>>> is
>>>>>>>> platform specific
>>>>>>>> 
>>>>>>>> The link above is for JRockit JVM I presume?
>>>>>>>> 
>>>>>>>> On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
>>>>>>> esteban@cloudera.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> I don't think is truly unbounded, IIRC it s limited to the
>>> maximum
>>>>>>>>> allocated heap.
>>>>>>>>> 
>>>>>>>>> thanks,
>>>>>>>>> esteban.
>>>>>>>>> 
>>>>>>>>> --
>>>>>>>>> Cloudera, Inc.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yu...@gmail.com>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/
>>>> optionxx.
>>>>>>> htm :
>>>>>>>>>> 
>>>>>>>>>> java -XX:MaxDirectMemorySize=2g myApp
>>>>>>>>>> 
>>>>>>>>>> Default Value
>>>>>>>>>> 
>>>>>>>>>> The default value is zero, which means the maximum direct
>>> memory
>>>>> is
>>>>>>>>>> unbounded.
>>>>>>>>>> 
>>>>>>>>>> On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
>>>>>>>>>> vladrodionov@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>>>> XXMaxDirectMemorySize is set to the default 0, which
>>> means
>>>>>>>> unlimited
>>>>>>>>>> as
>>>>>>>>>>> far
>>>>>>>>>>>>> as I can tell.
>>>>>>>>>>> 
>>>>>>>>>>> Not sure if this is true. The only conforming that link I
>>>> found
>>>>>>> was
>>>>>>>> for
>>>>>>>>>>> JRockit JVM.
>>>>>>>>>>> 
>>>>>>>>>>> On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
>>>>>>>> djelinski1@gmail.com
>>>>>>>>>> 
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Vladimir,
>>>>>>>>>>>> XXMaxDirectMemorySize is set to the default 0, which
>> means
>>>>>>>> unlimited
>>>>>>>>> as
>>>>>>>>>>> far
>>>>>>>>>>>> as I can tell.
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Daniel
>>>>>>>>>>>> 
>>>>>>>>>>>> 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
>>>>>>>> vladrodionov@gmail.com
>>>>>>>>>> :
>>>>>>>>>>>> 
>>>>>>>>>>>>> Have you try to increase direct memory size for server
>>>>>>> process?
>>>>>>>>>>>>> -XXMaxDirectMemorySize=?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
>>>>>>>>>> djelinski1@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>> I'm running an application doing a lot of Puts (size
>>>>>>> anywhere
>>>>>>>>>>> between 0
>>>>>>>>>>>>> and
>>>>>>>>>>>>>> 10MB, one cell at a time); occasionally I'm getting
>> an
>>>>> error
>>>>>>>> like
>>>>>>>>>> the
>>>>>>>>>>>>>> below:
>>>>>>>>>>>>>> 2017-10-09 04:29:29,811 WARN  [AsyncProcess] -
>> #13368,
>>>>>>>>>>>>>> table=researchplatform:repo_stripe, attempt=1/1
>>>>>>> failed=1ops,
>>>>>>>>> last
>>>>>>>>>>>>>> exception: java.io.IOException: com.google.protobuf.
>>>>>>>>>>> ServiceException:
>>>>>>>>>>>>>> java.lang.OutOfMemoryError: Direct buffer memory on
>>>>>>>>>>>>>> c169dzv.int.westgroup.com,60020,1506476748534,
>>> tracking
>>>>>>>> started
>>>>>>>>>> Mon
>>>>>>>>>>>> Oct
>>>>>>>>>>>>> 09
>>>>>>>>>>>>>> 04:29:29 EDT 2017; not retrying 1 - final failure
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> After that the connection to RegionServer becomes
>>>>> unusable.
>>>>>>>> Every
>>>>>>>>>>>>>> subsequent attempt to execute Put on that connection
>>>>>>> results in
>>>>>>>>>>>>>> CallTimeoutException. I only found the OutOfMemory
>> by
>>>>>>> reducing
>>>>>>>>> the
>>>>>>>>>>>> number
>>>>>>>>>>>>>> of tries to 1.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> The host running HBase appears to have at least a
>> few
>>> GB
>>>>> of
>>>>>>>> free
>>>>>>>>>>> memory
>>>>>>>>>>>>>> available. Server logs do not mention anything about
>>>> this
>>>>>>>> error.
>>>>>>>>>>>> Cluster
>>>>>>>>>>>>> is
>>>>>>>>>>>>>> running HBase 1.2.0-cdh5.10.2.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Is this a known problem? Are there workarounds
>>>> available?
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Daniel
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 


Re: OutOfMemoryError: Direct buffer memory on PUT

Posted by Stack <st...@duboce.net>.
On Wed, Nov 8, 2017 at 3:31 AM, Abhishek Singh Chouhan <
abhishekchouhan121@gmail.com> wrote:

> I faced the same issue and have been debugging this for some time now(the
> logging is not very helpful as daniel mentions :)).
> Looking deeper into this i realized that the side effects also are large
> incorrect byte buffer allocations on the server side apart from call
> timeouts on the client side.
> Have filed HBASE-19215 <https://issues.apache.org/jira/browse/HBASE-19215>
> for
> this
>
>
Thank you lads for the info. Lets carry-on over in HBASE-19215. Good one.
S



> On Wed, Nov 8, 2017 at 4:05 PM, Daniel Jeliński <dj...@gmail.com>
> wrote:
>
> > 2017-11-07 18:22 GMT+01:00 Stack <st...@duboce.net>:
> >
> > > On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <dj...@gmail.com>
> > > wrote:
> > >
> > > > For others that run into similar issue, it turned out that the
> > > > OutOfMemoryError was thrown (and subsequently hidden) on the client
> > side.
> > > > The error was caused by excessive direct memory usage in Java NIO's
> > > > bytebuffer caching (described here:
> > > > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
> > > > -Djdk.nio.maxCachedBufferSize=262144
> > > > allowed the application to complete.
> > > >
> > > >
> > > Suggestions for how to expose the client-side OOME Daniel? We should
> add
> > > note to the thrown exception about "-Djdk.nio.maxCachedBufferSize"
> (and
> > > make sure the exception makes it out!)
> > >
> >
> > Well I found the problem by adding printStackTrace to
> > AsyncProcess.createLog function, which was responsible for logging the
> > original OOME. This is not very elegant, and I wouldn't recommend adding
> it
> > to the official codebase, but the stack trace offers some hints:
> >
> > java.io.IOException: com.google.protobuf.ServiceException:
> > java.lang.OutOfMemoryError: Direct buffer memory
> >
> >                         at
> > org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(
> > ProtobufUtil.java:329)
> >
> >                         at
> > org.apache.hadoop.hbase.client.MultiServerCallable.
> > call(MultiServerCallable.java:130)
> >
> >                         at
> > org.apache.hadoop.hbase.client.MultiServerCallable.
> > call(MultiServerCallable.java:53)
> >
> >                         at
> > org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> > RpcRetryingCaller.java:200)
> >
> >                         at
> > org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$
> > SingleServerRequestRunnable.run(AsyncProcess.java:727)
> >
> >                         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> >
> >                         at java.util.concurrent.FutureTask.run(Unknown
> > Source)
> >
> >                         at
> > java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> >
> >                         at
> > java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> >
> >                         at java.lang.Thread.run(Unknown Source)
> >
> > Caused by: com.google.protobuf.ServiceException:
> > java.lang.OutOfMemoryError: Direct buffer memory
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> > AbstractRpcClient.java:240)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.AbstractRpcClient$
> > BlockingRpcChannelImplementation.callBlockingMethod(
> > AbstractRpcClient.java:336)
> >
> >                         at
> > org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> > BlockingStub.multi(ClientProtos.java:34142)
> >
> >                         at
> > org.apache.hadoop.hbase.client.MultiServerCallable.
> > call(MultiServerCallable.java:128)
> >
> >                         ... 8 more
> >
> > Caused by: java.lang.OutOfMemoryError: Direct buffer memory
> >
> >                         at java.nio.Bits.reserveMemory(Unknown Source)
> >
> >                         at java.nio.DirectByteBuffer.<init>(Unknown
> > Source)
> >
> >                         at java.nio.ByteBuffer.allocateDirect(Unknown
> > Source)
> >
> >                         at sun.nio.ch.Util.getTemporaryDirectBuffer(
> > Unknown
> > Source)
> >
> >                         at sun.nio.ch.IOUtil.write(Unknown Source)
> >
> >                         at sun.nio.ch.SocketChannelImpl.write(Unknown
> > Source)
> >
> >                         at
> > org.apache.hadoop.net.SocketOutputStream$Writer.
> > performIO(SocketOutputStream.java:63)
> >
> >                         at
> > org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> > SocketIOWithTimeout.java:142)
> >
> >                         at
> > org.apache.hadoop.net.SocketOutputStream.write(
> > SocketOutputStream.java:159)
> >
> >                         at
> > org.apache.hadoop.net.SocketOutputStream.write(
> > SocketOutputStream.java:117)
> >
> >                         at
> > org.apache.hadoop.security.SaslOutputStream.write(
> > SaslOutputStream.java:169)
> >
> >                         at java.io.BufferedOutputStream.write(Unknown
> > Source)
> >
> >                         at java.io.DataOutputStream.write(Unknown
> Source)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> > writeRequest(RpcClientImpl.java:921)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(
> > RpcClientImpl.java:874)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)
> >
> >                         at
> > org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> > AbstractRpcClient.java:227)
> >
> >                         ... 11 more
> > This stack trace comes from cdh5.10.2 version, but the master branch is
> > sufficiently similar. So, depending on what we want to achieve, we could:
> > - just replace catch(Throwable e) in AbstractRpcClient.
> callBlockingMethod
> > with something more fine-grained and fail the application
> > - or forward OOME in callBlockingMethod, but add information about
> > maxCachedBufferSize,
> > also failing the application but suggesting possible corrective action to
> > the user
> > - or pass the error to the user, allowing the application to intercept
> it.
> > Not sure yet how to do that, but we would need to do something about the
> > connection becoming unusable after OOME, in case user decides to keep
> > going.
> > What's your take?
> >
> >
> >
> >
> > > Thanks for updating the list,
> > > S
> > >
> > >
> > >
> > > > Yet another proof that correct handling of OOME is hard.
> > > > Thanks,
> > > > Daniel
> > > >
> > > > 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <dj...@gmail.com>:
> > > >
> > > > > Thanks for the hints. I'll see if we can explicitly set
> > > > > MaxDirectMemorySize to a safe number.
> > > > > Thanks,
> > > > > Daniel
> > > > >
> > > > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <esteban@cloudera.com
> >:
> > > > >
> > > > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
> > > > >> classes/sun/misc/VM.java#l184
> > > > >>
> > > > >>     // The initial value of this field is arbitrary; during JRE
> > > > >> initialization
> > > > >>     // it will be reset to the value specified on the command
> line,
> > if
> > > > >> any,
> > > > >>     // otherwise to Runtime.getRuntime().maxMemory().
> > > > >>
> > > > >> which goes all the way down to memory/heap.cpp to whatever was
> left
> > to
> > > > the
> > > > >> reserved memory depending on the flags and the platform used as
> > > Vladimir
> > > > >> says.
> > > > >>
> > > > >> Also, depending on which distribution and features are used there
> > are
> > > > >> specific guidelines about setting that parameter so mileage might
> > > vary.
> > > > >>
> > > > >> thanks,
> > > > >> esteban.
> > > > >>
> > > > >>
> > > > >>
> > > > >> --
> > > > >> Cloudera, Inc.
> > > > >>
> > > > >>
> > > > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
> > > > >> vladrodionov@gmail.com>
> > > > >> wrote:
> > > > >>
> > > > >> > >> The default value is zero, which means the maximum direct
> > memory
> > > is
> > > > >> > unbounded.
> > > > >> >
> > > > >> > That is not correct. If you do not specify MaxDirectMemorySize,
> > > > default
> > > > >> is
> > > > >> > platform specific
> > > > >> >
> > > > >> > The link above is for JRockit JVM I presume?
> > > > >> >
> > > > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
> > > > >> esteban@cloudera.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > I don't think is truly unbounded, IIRC it s limited to the
> > maximum
> > > > >> > > allocated heap.
> > > > >> > >
> > > > >> > > thanks,
> > > > >> > > esteban.
> > > > >> > >
> > > > >> > > --
> > > > >> > > Cloudera, Inc.
> > > > >> > >
> > > > >> > >
> > > > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yu...@gmail.com>
> > > > wrote:
> > > > >> > >
> > > > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/
> > > optionxx.
> > > > >> htm :
> > > > >> > > >
> > > > >> > > > java -XX:MaxDirectMemorySize=2g myApp
> > > > >> > > >
> > > > >> > > > Default Value
> > > > >> > > >
> > > > >> > > > The default value is zero, which means the maximum direct
> > memory
> > > > is
> > > > >> > > > unbounded.
> > > > >> > > >
> > > > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
> > > > >> > > > vladrodionov@gmail.com>
> > > > >> > > > wrote:
> > > > >> > > >
> > > > >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which
> > means
> > > > >> > unlimited
> > > > >> > > > as
> > > > >> > > > > far
> > > > >> > > > > >> as I can tell.
> > > > >> > > > >
> > > > >> > > > > Not sure if this is true. The only conforming that link I
> > > found
> > > > >> was
> > > > >> > for
> > > > >> > > > > JRockit JVM.
> > > > >> > > > >
> > > > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
> > > > >> > djelinski1@gmail.com
> > > > >> > > >
> > > > >> > > > > wrote:
> > > > >> > > > >
> > > > >> > > > > > Vladimir,
> > > > >> > > > > > XXMaxDirectMemorySize is set to the default 0, which
> means
> > > > >> > unlimited
> > > > >> > > as
> > > > >> > > > > far
> > > > >> > > > > > as I can tell.
> > > > >> > > > > > Thanks,
> > > > >> > > > > > Daniel
> > > > >> > > > > >
> > > > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
> > > > >> > vladrodionov@gmail.com
> > > > >> > > >:
> > > > >> > > > > >
> > > > >> > > > > > > Have you try to increase direct memory size for server
> > > > >> process?
> > > > >> > > > > > > -XXMaxDirectMemorySize=?
> > > > >> > > > > > >
> > > > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
> > > > >> > > > djelinski1@gmail.com>
> > > > >> > > > > > > wrote:
> > > > >> > > > > > >
> > > > >> > > > > > > > Hello,
> > > > >> > > > > > > > I'm running an application doing a lot of Puts (size
> > > > >> anywhere
> > > > >> > > > > between 0
> > > > >> > > > > > > and
> > > > >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting
> an
> > > > error
> > > > >> > like
> > > > >> > > > the
> > > > >> > > > > > > > below:
> > > > >> > > > > > > > 2017-10-09 04:29:29,811 WARN  [AsyncProcess] -
> #13368,
> > > > >> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1
> > > > >> failed=1ops,
> > > > >> > > last
> > > > >> > > > > > > > exception: java.io.IOException: com.google.protobuf.
> > > > >> > > > > ServiceException:
> > > > >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on
> > > > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534,
> > tracking
> > > > >> > started
> > > > >> > > > Mon
> > > > >> > > > > > Oct
> > > > >> > > > > > > 09
> > > > >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure
> > > > >> > > > > > > >
> > > > >> > > > > > > > After that the connection to RegionServer becomes
> > > > unusable.
> > > > >> > Every
> > > > >> > > > > > > > subsequent attempt to execute Put on that connection
> > > > >> results in
> > > > >> > > > > > > > CallTimeoutException. I only found the OutOfMemory
> by
> > > > >> reducing
> > > > >> > > the
> > > > >> > > > > > number
> > > > >> > > > > > > > of tries to 1.
> > > > >> > > > > > > >
> > > > >> > > > > > > > The host running HBase appears to have at least a
> few
> > GB
> > > > of
> > > > >> > free
> > > > >> > > > > memory
> > > > >> > > > > > > > available. Server logs do not mention anything about
> > > this
> > > > >> > error.
> > > > >> > > > > > Cluster
> > > > >> > > > > > > is
> > > > >> > > > > > > > running HBase 1.2.0-cdh5.10.2.
> > > > >> > > > > > > >
> > > > >> > > > > > > > Is this a known problem? Are there workarounds
> > > available?
> > > > >> > > > > > > > Thanks,
> > > > >> > > > > > > > Daniel
> > > > >> > > > > > > >
> > > > >> > > > > > >
> > > > >> > > > > >
> > > > >> > > > >
> > > > >> > > >
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: OutOfMemoryError: Direct buffer memory on PUT

Posted by Abhishek Singh Chouhan <ab...@gmail.com>.
I faced the same issue and have been debugging this for some time now(the
logging is not very helpful as daniel mentions :)).
Looking deeper into this i realized that the side effects also are large
incorrect byte buffer allocations on the server side apart from call
timeouts on the client side.
Have filed HBASE-19215 <https://issues.apache.org/jira/browse/HBASE-19215> for
this

On Wed, Nov 8, 2017 at 4:05 PM, Daniel Jeliński <dj...@gmail.com>
wrote:

> 2017-11-07 18:22 GMT+01:00 Stack <st...@duboce.net>:
>
> > On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <dj...@gmail.com>
> > wrote:
> >
> > > For others that run into similar issue, it turned out that the
> > > OutOfMemoryError was thrown (and subsequently hidden) on the client
> side.
> > > The error was caused by excessive direct memory usage in Java NIO's
> > > bytebuffer caching (described here:
> > > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
> > > -Djdk.nio.maxCachedBufferSize=262144
> > > allowed the application to complete.
> > >
> > >
> > Suggestions for how to expose the client-side OOME Daniel? We should add
> > note to the thrown exception about "-Djdk.nio.maxCachedBufferSize" (and
> > make sure the exception makes it out!)
> >
>
> Well I found the problem by adding printStackTrace to
> AsyncProcess.createLog function, which was responsible for logging the
> original OOME. This is not very elegant, and I wouldn't recommend adding it
> to the official codebase, but the stack trace offers some hints:
>
> java.io.IOException: com.google.protobuf.ServiceException:
> java.lang.OutOfMemoryError: Direct buffer memory
>
>                         at
> org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(
> ProtobufUtil.java:329)
>
>                         at
> org.apache.hadoop.hbase.client.MultiServerCallable.
> call(MultiServerCallable.java:130)
>
>                         at
> org.apache.hadoop.hbase.client.MultiServerCallable.
> call(MultiServerCallable.java:53)
>
>                         at
> org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(
> RpcRetryingCaller.java:200)
>
>                         at
> org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$
> SingleServerRequestRunnable.run(AsyncProcess.java:727)
>
>                         at
> java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>
>                         at java.util.concurrent.FutureTask.run(Unknown
> Source)
>
>                         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
>                         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
>                         at java.lang.Thread.run(Unknown Source)
>
> Caused by: com.google.protobuf.ServiceException:
> java.lang.OutOfMemoryError: Direct buffer memory
>
>                         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> AbstractRpcClient.java:240)
>
>                         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient$
> BlockingRpcChannelImplementation.callBlockingMethod(
> AbstractRpcClient.java:336)
>
>                         at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$
> BlockingStub.multi(ClientProtos.java:34142)
>
>                         at
> org.apache.hadoop.hbase.client.MultiServerCallable.
> call(MultiServerCallable.java:128)
>
>                         ... 8 more
>
> Caused by: java.lang.OutOfMemoryError: Direct buffer memory
>
>                         at java.nio.Bits.reserveMemory(Unknown Source)
>
>                         at java.nio.DirectByteBuffer.<init>(Unknown
> Source)
>
>                         at java.nio.ByteBuffer.allocateDirect(Unknown
> Source)
>
>                         at sun.nio.ch.Util.getTemporaryDirectBuffer(
> Unknown
> Source)
>
>                         at sun.nio.ch.IOUtil.write(Unknown Source)
>
>                         at sun.nio.ch.SocketChannelImpl.write(Unknown
> Source)
>
>                         at
> org.apache.hadoop.net.SocketOutputStream$Writer.
> performIO(SocketOutputStream.java:63)
>
>                         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(
> SocketIOWithTimeout.java:142)
>
>                         at
> org.apache.hadoop.net.SocketOutputStream.write(
> SocketOutputStream.java:159)
>
>                         at
> org.apache.hadoop.net.SocketOutputStream.write(
> SocketOutputStream.java:117)
>
>                         at
> org.apache.hadoop.security.SaslOutputStream.write(
> SaslOutputStream.java:169)
>
>                         at java.io.BufferedOutputStream.write(Unknown
> Source)
>
>                         at java.io.DataOutputStream.write(Unknown Source)
>
>                         at
> org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277)
>
>                         at
> org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)
>
>                         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.
> writeRequest(RpcClientImpl.java:921)
>
>                         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(
> RpcClientImpl.java:874)
>
>                         at
> org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)
>
>                         at
> org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(
> AbstractRpcClient.java:227)
>
>                         ... 11 more
> This stack trace comes from cdh5.10.2 version, but the master branch is
> sufficiently similar. So, depending on what we want to achieve, we could:
> - just replace catch(Throwable e) in AbstractRpcClient.callBlockingMethod
> with something more fine-grained and fail the application
> - or forward OOME in callBlockingMethod, but add information about
> maxCachedBufferSize,
> also failing the application but suggesting possible corrective action to
> the user
> - or pass the error to the user, allowing the application to intercept it.
> Not sure yet how to do that, but we would need to do something about the
> connection becoming unusable after OOME, in case user decides to keep
> going.
> What's your take?
>
>
>
>
> > Thanks for updating the list,
> > S
> >
> >
> >
> > > Yet another proof that correct handling of OOME is hard.
> > > Thanks,
> > > Daniel
> > >
> > > 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <dj...@gmail.com>:
> > >
> > > > Thanks for the hints. I'll see if we can explicitly set
> > > > MaxDirectMemorySize to a safe number.
> > > > Thanks,
> > > > Daniel
> > > >
> > > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <es...@cloudera.com>:
> > > >
> > > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
> > > >> classes/sun/misc/VM.java#l184
> > > >>
> > > >>     // The initial value of this field is arbitrary; during JRE
> > > >> initialization
> > > >>     // it will be reset to the value specified on the command line,
> if
> > > >> any,
> > > >>     // otherwise to Runtime.getRuntime().maxMemory().
> > > >>
> > > >> which goes all the way down to memory/heap.cpp to whatever was left
> to
> > > the
> > > >> reserved memory depending on the flags and the platform used as
> > Vladimir
> > > >> says.
> > > >>
> > > >> Also, depending on which distribution and features are used there
> are
> > > >> specific guidelines about setting that parameter so mileage might
> > vary.
> > > >>
> > > >> thanks,
> > > >> esteban.
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> Cloudera, Inc.
> > > >>
> > > >>
> > > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
> > > >> vladrodionov@gmail.com>
> > > >> wrote:
> > > >>
> > > >> > >> The default value is zero, which means the maximum direct
> memory
> > is
> > > >> > unbounded.
> > > >> >
> > > >> > That is not correct. If you do not specify MaxDirectMemorySize,
> > > default
> > > >> is
> > > >> > platform specific
> > > >> >
> > > >> > The link above is for JRockit JVM I presume?
> > > >> >
> > > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
> > > >> esteban@cloudera.com>
> > > >> > wrote:
> > > >> >
> > > >> > > I don't think is truly unbounded, IIRC it s limited to the
> maximum
> > > >> > > allocated heap.
> > > >> > >
> > > >> > > thanks,
> > > >> > > esteban.
> > > >> > >
> > > >> > > --
> > > >> > > Cloudera, Inc.
> > > >> > >
> > > >> > >
> > > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > >> > >
> > > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/
> > optionxx.
> > > >> htm :
> > > >> > > >
> > > >> > > > java -XX:MaxDirectMemorySize=2g myApp
> > > >> > > >
> > > >> > > > Default Value
> > > >> > > >
> > > >> > > > The default value is zero, which means the maximum direct
> memory
> > > is
> > > >> > > > unbounded.
> > > >> > > >
> > > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
> > > >> > > > vladrodionov@gmail.com>
> > > >> > > > wrote:
> > > >> > > >
> > > >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which
> means
> > > >> > unlimited
> > > >> > > > as
> > > >> > > > > far
> > > >> > > > > >> as I can tell.
> > > >> > > > >
> > > >> > > > > Not sure if this is true. The only conforming that link I
> > found
> > > >> was
> > > >> > for
> > > >> > > > > JRockit JVM.
> > > >> > > > >
> > > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
> > > >> > djelinski1@gmail.com
> > > >> > > >
> > > >> > > > > wrote:
> > > >> > > > >
> > > >> > > > > > Vladimir,
> > > >> > > > > > XXMaxDirectMemorySize is set to the default 0, which means
> > > >> > unlimited
> > > >> > > as
> > > >> > > > > far
> > > >> > > > > > as I can tell.
> > > >> > > > > > Thanks,
> > > >> > > > > > Daniel
> > > >> > > > > >
> > > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
> > > >> > vladrodionov@gmail.com
> > > >> > > >:
> > > >> > > > > >
> > > >> > > > > > > Have you try to increase direct memory size for server
> > > >> process?
> > > >> > > > > > > -XXMaxDirectMemorySize=?
> > > >> > > > > > >
> > > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
> > > >> > > > djelinski1@gmail.com>
> > > >> > > > > > > wrote:
> > > >> > > > > > >
> > > >> > > > > > > > Hello,
> > > >> > > > > > > > I'm running an application doing a lot of Puts (size
> > > >> anywhere
> > > >> > > > > between 0
> > > >> > > > > > > and
> > > >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting an
> > > error
> > > >> > like
> > > >> > > > the
> > > >> > > > > > > > below:
> > > >> > > > > > > > 2017-10-09 04:29:29,811 WARN  [AsyncProcess] - #13368,
> > > >> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1
> > > >> failed=1ops,
> > > >> > > last
> > > >> > > > > > > > exception: java.io.IOException: com.google.protobuf.
> > > >> > > > > ServiceException:
> > > >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on
> > > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534,
> tracking
> > > >> > started
> > > >> > > > Mon
> > > >> > > > > > Oct
> > > >> > > > > > > 09
> > > >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure
> > > >> > > > > > > >
> > > >> > > > > > > > After that the connection to RegionServer becomes
> > > unusable.
> > > >> > Every
> > > >> > > > > > > > subsequent attempt to execute Put on that connection
> > > >> results in
> > > >> > > > > > > > CallTimeoutException. I only found the OutOfMemory by
> > > >> reducing
> > > >> > > the
> > > >> > > > > > number
> > > >> > > > > > > > of tries to 1.
> > > >> > > > > > > >
> > > >> > > > > > > > The host running HBase appears to have at least a few
> GB
> > > of
> > > >> > free
> > > >> > > > > memory
> > > >> > > > > > > > available. Server logs do not mention anything about
> > this
> > > >> > error.
> > > >> > > > > > Cluster
> > > >> > > > > > > is
> > > >> > > > > > > > running HBase 1.2.0-cdh5.10.2.
> > > >> > > > > > > >
> > > >> > > > > > > > Is this a known problem? Are there workarounds
> > available?
> > > >> > > > > > > > Thanks,
> > > >> > > > > > > > Daniel
> > > >> > > > > > > >
> > > >> > > > > > >
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: OutOfMemoryError: Direct buffer memory on PUT

Posted by Daniel Jeliński <dj...@gmail.com>.
2017-11-07 18:22 GMT+01:00 Stack <st...@duboce.net>:

> On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <dj...@gmail.com>
> wrote:
>
> > For others that run into similar issue, it turned out that the
> > OutOfMemoryError was thrown (and subsequently hidden) on the client side.
> > The error was caused by excessive direct memory usage in Java NIO's
> > bytebuffer caching (described here:
> > http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
> > -Djdk.nio.maxCachedBufferSize=262144
> > allowed the application to complete.
> >
> >
> Suggestions for how to expose the client-side OOME Daniel? We should add
> note to the thrown exception about "-Djdk.nio.maxCachedBufferSize" (and
> make sure the exception makes it out!)
>

Well I found the problem by adding printStackTrace to
AsyncProcess.createLog function, which was responsible for logging the
original OOME. This is not very elegant, and I wouldn't recommend adding it
to the official codebase, but the stack trace offers some hints:

java.io.IOException: com.google.protobuf.ServiceException:
java.lang.OutOfMemoryError: Direct buffer memory

                        at
org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:329)

                        at
org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:130)

                        at
org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:53)

                        at
org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200)

                        at
org.apache.hadoop.hbase.client.AsyncProcess$AsyncRequestFutureImpl$SingleServerRequestRunnable.run(AsyncProcess.java:727)

                        at
java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)

                        at java.util.concurrent.FutureTask.run(Unknown
Source)

                        at
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

                        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

                        at java.lang.Thread.run(Unknown Source)

Caused by: com.google.protobuf.ServiceException:
java.lang.OutOfMemoryError: Direct buffer memory

                        at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:240)

                        at
org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336)

                        at
org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.multi(ClientProtos.java:34142)

                        at
org.apache.hadoop.hbase.client.MultiServerCallable.call(MultiServerCallable.java:128)

                        ... 8 more

Caused by: java.lang.OutOfMemoryError: Direct buffer memory

                        at java.nio.Bits.reserveMemory(Unknown Source)

                        at java.nio.DirectByteBuffer.<init>(Unknown Source)

                        at java.nio.ByteBuffer.allocateDirect(Unknown
Source)

                        at sun.nio.ch.Util.getTemporaryDirectBuffer(Unknown
Source)

                        at sun.nio.ch.IOUtil.write(Unknown Source)

                        at sun.nio.ch.SocketChannelImpl.write(Unknown
Source)

                        at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:63)

                        at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

                        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:159)

                        at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:117)

                        at
org.apache.hadoop.security.SaslOutputStream.write(SaslOutputStream.java:169)

                        at java.io.BufferedOutputStream.write(Unknown
Source)

                        at java.io.DataOutputStream.write(Unknown Source)

                        at
org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:277)

                        at
org.apache.hadoop.hbase.ipc.IPCUtil.write(IPCUtil.java:266)

                        at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:921)

                        at
org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:874)

                        at
org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1243)

                        at
org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227)

                        ... 11 more
This stack trace comes from cdh5.10.2 version, but the master branch is
sufficiently similar. So, depending on what we want to achieve, we could:
- just replace catch(Throwable e) in AbstractRpcClient.callBlockingMethod
with something more fine-grained and fail the application
- or forward OOME in callBlockingMethod, but add information about
maxCachedBufferSize,
also failing the application but suggesting possible corrective action to
the user
- or pass the error to the user, allowing the application to intercept it.
Not sure yet how to do that, but we would need to do something about the
connection becoming unusable after OOME, in case user decides to keep going.
What's your take?




> Thanks for updating the list,
> S
>
>
>
> > Yet another proof that correct handling of OOME is hard.
> > Thanks,
> > Daniel
> >
> > 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <dj...@gmail.com>:
> >
> > > Thanks for the hints. I'll see if we can explicitly set
> > > MaxDirectMemorySize to a safe number.
> > > Thanks,
> > > Daniel
> > >
> > > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <es...@cloudera.com>:
> > >
> > >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
> > >> classes/sun/misc/VM.java#l184
> > >>
> > >>     // The initial value of this field is arbitrary; during JRE
> > >> initialization
> > >>     // it will be reset to the value specified on the command line, if
> > >> any,
> > >>     // otherwise to Runtime.getRuntime().maxMemory().
> > >>
> > >> which goes all the way down to memory/heap.cpp to whatever was left to
> > the
> > >> reserved memory depending on the flags and the platform used as
> Vladimir
> > >> says.
> > >>
> > >> Also, depending on which distribution and features are used there are
> > >> specific guidelines about setting that parameter so mileage might
> vary.
> > >>
> > >> thanks,
> > >> esteban.
> > >>
> > >>
> > >>
> > >> --
> > >> Cloudera, Inc.
> > >>
> > >>
> > >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
> > >> vladrodionov@gmail.com>
> > >> wrote:
> > >>
> > >> > >> The default value is zero, which means the maximum direct memory
> is
> > >> > unbounded.
> > >> >
> > >> > That is not correct. If you do not specify MaxDirectMemorySize,
> > default
> > >> is
> > >> > platform specific
> > >> >
> > >> > The link above is for JRockit JVM I presume?
> > >> >
> > >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
> > >> esteban@cloudera.com>
> > >> > wrote:
> > >> >
> > >> > > I don't think is truly unbounded, IIRC it s limited to the maximum
> > >> > > allocated heap.
> > >> > >
> > >> > > thanks,
> > >> > > esteban.
> > >> > >
> > >> > > --
> > >> > > Cloudera, Inc.
> > >> > >
> > >> > >
> > >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yu...@gmail.com>
> > wrote:
> > >> > >
> > >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/
> optionxx.
> > >> htm :
> > >> > > >
> > >> > > > java -XX:MaxDirectMemorySize=2g myApp
> > >> > > >
> > >> > > > Default Value
> > >> > > >
> > >> > > > The default value is zero, which means the maximum direct memory
> > is
> > >> > > > unbounded.
> > >> > > >
> > >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
> > >> > > > vladrodionov@gmail.com>
> > >> > > > wrote:
> > >> > > >
> > >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which means
> > >> > unlimited
> > >> > > > as
> > >> > > > > far
> > >> > > > > >> as I can tell.
> > >> > > > >
> > >> > > > > Not sure if this is true. The only conforming that link I
> found
> > >> was
> > >> > for
> > >> > > > > JRockit JVM.
> > >> > > > >
> > >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
> > >> > djelinski1@gmail.com
> > >> > > >
> > >> > > > > wrote:
> > >> > > > >
> > >> > > > > > Vladimir,
> > >> > > > > > XXMaxDirectMemorySize is set to the default 0, which means
> > >> > unlimited
> > >> > > as
> > >> > > > > far
> > >> > > > > > as I can tell.
> > >> > > > > > Thanks,
> > >> > > > > > Daniel
> > >> > > > > >
> > >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
> > >> > vladrodionov@gmail.com
> > >> > > >:
> > >> > > > > >
> > >> > > > > > > Have you try to increase direct memory size for server
> > >> process?
> > >> > > > > > > -XXMaxDirectMemorySize=?
> > >> > > > > > >
> > >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
> > >> > > > djelinski1@gmail.com>
> > >> > > > > > > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hello,
> > >> > > > > > > > I'm running an application doing a lot of Puts (size
> > >> anywhere
> > >> > > > > between 0
> > >> > > > > > > and
> > >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting an
> > error
> > >> > like
> > >> > > > the
> > >> > > > > > > > below:
> > >> > > > > > > > 2017-10-09 04:29:29,811 WARN  [AsyncProcess] - #13368,
> > >> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1
> > >> failed=1ops,
> > >> > > last
> > >> > > > > > > > exception: java.io.IOException: com.google.protobuf.
> > >> > > > > ServiceException:
> > >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on
> > >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534, tracking
> > >> > started
> > >> > > > Mon
> > >> > > > > > Oct
> > >> > > > > > > 09
> > >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure
> > >> > > > > > > >
> > >> > > > > > > > After that the connection to RegionServer becomes
> > unusable.
> > >> > Every
> > >> > > > > > > > subsequent attempt to execute Put on that connection
> > >> results in
> > >> > > > > > > > CallTimeoutException. I only found the OutOfMemory by
> > >> reducing
> > >> > > the
> > >> > > > > > number
> > >> > > > > > > > of tries to 1.
> > >> > > > > > > >
> > >> > > > > > > > The host running HBase appears to have at least a few GB
> > of
> > >> > free
> > >> > > > > memory
> > >> > > > > > > > available. Server logs do not mention anything about
> this
> > >> > error.
> > >> > > > > > Cluster
> > >> > > > > > > is
> > >> > > > > > > > running HBase 1.2.0-cdh5.10.2.
> > >> > > > > > > >
> > >> > > > > > > > Is this a known problem? Are there workarounds
> available?
> > >> > > > > > > > Thanks,
> > >> > > > > > > > Daniel
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: OutOfMemoryError: Direct buffer memory on PUT

Posted by Stack <st...@duboce.net>.
On Mon, Nov 6, 2017 at 6:33 AM, Daniel Jeliński <dj...@gmail.com>
wrote:

> For others that run into similar issue, it turned out that the
> OutOfMemoryError was thrown (and subsequently hidden) on the client side.
> The error was caused by excessive direct memory usage in Java NIO's
> bytebuffer caching (described here:
> http://www.evanjones.ca/java-bytebuffer-leak.html), and setting
> -Djdk.nio.maxCachedBufferSize=262144
> allowed the application to complete.
>
>
Suggestions for how to expose the client-side OOME Daniel? We should add
note to the thrown exception about "-Djdk.nio.maxCachedBufferSize" (and
make sure the exception makes it out!)
Thanks for updating the list,
S



> Yet another proof that correct handling of OOME is hard.
> Thanks,
> Daniel
>
> 2017-10-11 11:33 GMT+02:00 Daniel Jeliński <dj...@gmail.com>:
>
> > Thanks for the hints. I'll see if we can explicitly set
> > MaxDirectMemorySize to a safe number.
> > Thanks,
> > Daniel
> >
> > 2017-10-10 21:10 GMT+02:00 Esteban Gutierrez <es...@cloudera.com>:
> >
> >> http://hg.openjdk.java.net/jdk8/jdk8/jdk/file/tip/src/share/
> >> classes/sun/misc/VM.java#l184
> >>
> >>     // The initial value of this field is arbitrary; during JRE
> >> initialization
> >>     // it will be reset to the value specified on the command line, if
> >> any,
> >>     // otherwise to Runtime.getRuntime().maxMemory().
> >>
> >> which goes all the way down to memory/heap.cpp to whatever was left to
> the
> >> reserved memory depending on the flags and the platform used as Vladimir
> >> says.
> >>
> >> Also, depending on which distribution and features are used there are
> >> specific guidelines about setting that parameter so mileage might vary.
> >>
> >> thanks,
> >> esteban.
> >>
> >>
> >>
> >> --
> >> Cloudera, Inc.
> >>
> >>
> >> On Tue, Oct 10, 2017 at 1:35 PM, Vladimir Rodionov <
> >> vladrodionov@gmail.com>
> >> wrote:
> >>
> >> > >> The default value is zero, which means the maximum direct memory is
> >> > unbounded.
> >> >
> >> > That is not correct. If you do not specify MaxDirectMemorySize,
> default
> >> is
> >> > platform specific
> >> >
> >> > The link above is for JRockit JVM I presume?
> >> >
> >> > On Tue, Oct 10, 2017 at 11:19 AM, Esteban Gutierrez <
> >> esteban@cloudera.com>
> >> > wrote:
> >> >
> >> > > I don't think is truly unbounded, IIRC it s limited to the maximum
> >> > > allocated heap.
> >> > >
> >> > > thanks,
> >> > > esteban.
> >> > >
> >> > > --
> >> > > Cloudera, Inc.
> >> > >
> >> > >
> >> > > On Tue, Oct 10, 2017 at 1:11 PM, Ted Yu <yu...@gmail.com>
> wrote:
> >> > >
> >> > > > From https://docs.oracle.com/cd/E15289_01/doc.40/e15062/optionxx.
> >> htm :
> >> > > >
> >> > > > java -XX:MaxDirectMemorySize=2g myApp
> >> > > >
> >> > > > Default Value
> >> > > >
> >> > > > The default value is zero, which means the maximum direct memory
> is
> >> > > > unbounded.
> >> > > >
> >> > > > On Tue, Oct 10, 2017 at 11:04 AM, Vladimir Rodionov <
> >> > > > vladrodionov@gmail.com>
> >> > > > wrote:
> >> > > >
> >> > > > > >> XXMaxDirectMemorySize is set to the default 0, which means
> >> > unlimited
> >> > > > as
> >> > > > > far
> >> > > > > >> as I can tell.
> >> > > > >
> >> > > > > Not sure if this is true. The only conforming that link I found
> >> was
> >> > for
> >> > > > > JRockit JVM.
> >> > > > >
> >> > > > > On Mon, Oct 9, 2017 at 11:29 PM, Daniel Jeliński <
> >> > djelinski1@gmail.com
> >> > > >
> >> > > > > wrote:
> >> > > > >
> >> > > > > > Vladimir,
> >> > > > > > XXMaxDirectMemorySize is set to the default 0, which means
> >> > unlimited
> >> > > as
> >> > > > > far
> >> > > > > > as I can tell.
> >> > > > > > Thanks,
> >> > > > > > Daniel
> >> > > > > >
> >> > > > > > 2017-10-09 19:30 GMT+02:00 Vladimir Rodionov <
> >> > vladrodionov@gmail.com
> >> > > >:
> >> > > > > >
> >> > > > > > > Have you try to increase direct memory size for server
> >> process?
> >> > > > > > > -XXMaxDirectMemorySize=?
> >> > > > > > >
> >> > > > > > > On Mon, Oct 9, 2017 at 2:12 AM, Daniel Jeliński <
> >> > > > djelinski1@gmail.com>
> >> > > > > > > wrote:
> >> > > > > > >
> >> > > > > > > > Hello,
> >> > > > > > > > I'm running an application doing a lot of Puts (size
> >> anywhere
> >> > > > > between 0
> >> > > > > > > and
> >> > > > > > > > 10MB, one cell at a time); occasionally I'm getting an
> error
> >> > like
> >> > > > the
> >> > > > > > > > below:
> >> > > > > > > > 2017-10-09 04:29:29,811 WARN  [AsyncProcess] - #13368,
> >> > > > > > > > table=researchplatform:repo_stripe, attempt=1/1
> >> failed=1ops,
> >> > > last
> >> > > > > > > > exception: java.io.IOException: com.google.protobuf.
> >> > > > > ServiceException:
> >> > > > > > > > java.lang.OutOfMemoryError: Direct buffer memory on
> >> > > > > > > > c169dzv.int.westgroup.com,60020,1506476748534, tracking
> >> > started
> >> > > > Mon
> >> > > > > > Oct
> >> > > > > > > 09
> >> > > > > > > > 04:29:29 EDT 2017; not retrying 1 - final failure
> >> > > > > > > >
> >> > > > > > > > After that the connection to RegionServer becomes
> unusable.
> >> > Every
> >> > > > > > > > subsequent attempt to execute Put on that connection
> >> results in
> >> > > > > > > > CallTimeoutException. I only found the OutOfMemory by
> >> reducing
> >> > > the
> >> > > > > > number
> >> > > > > > > > of tries to 1.
> >> > > > > > > >
> >> > > > > > > > The host running HBase appears to have at least a few GB
> of
> >> > free
> >> > > > > memory
> >> > > > > > > > available. Server logs do not mention anything about this
> >> > error.
> >> > > > > > Cluster
> >> > > > > > > is
> >> > > > > > > > running HBase 1.2.0-cdh5.10.2.
> >> > > > > > > >
> >> > > > > > > > Is this a known problem? Are there workarounds available?
> >> > > > > > > > Thanks,
> >> > > > > > > > Daniel
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>