You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Geovanie Marquez <ge...@gmail.com> on 2014/05/08 23:32:55 UTC

RPC Client OutOfMemoryError Java Heap Space

Hey group,

There is one job that scans HBase contents and is really resource intensive
using all resources available to yarn (under Resource Manager). In my case,
that is 8GB. My expectation here is that a properly configured cluster
would kill the application or degrade the application performance but never
ever take a region server down. This is intended to be a multi-tenant
environment where developers may submit jobs at will and I would want a
configuration where the cluster services are not exited in this way because
of memory.

The simple solution here, is to change the way the job consumes resources
so that when run it is not so resource greedy. I want to understand how I
can mitigate this situation in general.

**It FAILS with the following config:**
The RPC client has 30 handlers
write buffer of 2MiB
The RegionServer heap is 4GiB
Max Size of all memstores is 0.40 of total heap
HFile Block Cache Size is 0.40
Low watermark for memstore flush is 0.38
HBase Memstore size is 128MiB

**Job still FAILS with the following config:**
Everything else the same except
The RPC client has 10 handlers

**Job still FAILS with the following config:**
Everything else the same except
HFile Block Cache Size is 0.10


When this runs I get the following error stacktrace:
#
#How do I avoid this via configuration.
#

java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to
c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase]
org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to
c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected
exception receiving call responses
#

###Yes, there was an RPC timeout this is what is killing the server
because the timeout is eventually (1minute later) reached.

#

java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
2014-05-08 16:23:55,319 INFO [main]
org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered
from org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry
of OutOfOrderScannerNextException: was there a rpc timeout?
	at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
	at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
	at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

#

## Probably caused by the OOME above

#

Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
Expected nextCallSeq: 1 But the nextCallSeq got from client: 0;
request=scanner_id: 5612205039322936440 number_of_rows: 10000
close_scanner: false next_call_seq: 0
	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)

Re: RPC Client OutOfMemoryError Java Heap Space

Posted by Geovanie Marquez <ge...@gmail.com>.

Is this an expectation problem or a legitimate concern. I have been
studying the memory configurations on cloudera manager and I don't seem to
see where I can improve my situation.




On Thu, May 8, 2014 at 5:35 PM, Geovanie Marquez <geovanie.marquez@gmail.com
> wrote:

> sorry didn't include version
>
> CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47
>
>
> On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez <
> geovanie.marquez@gmail.com> wrote:
>
>> Hey group,
>>
>> There is one job that scans HBase contents and is really resource
>> intensive using all resources available to yarn (under Resource Manager).
>> In my case, that is 8GB. My expectation here is that a properly configured
>> cluster would kill the application or degrade the application performance
>> but never ever take a region server down. This is intended to be a
>> multi-tenant environment where developers may submit jobs at will and I
>> would want a configuration where the cluster services are not exited in
>> this way because of memory.
>>
>> The simple solution here, is to change the way the job consumes resources
>> so that when run it is not so resource greedy. I want to understand how I
>> can mitigate this situation in general.
>>
>> **It FAILS with the following config:**
>> The RPC client has 30 handlers
>> write buffer of 2MiB
>> The RegionServer heap is 4GiB
>> Max Size of all memstores is 0.40 of total heap
>> HFile Block Cache Size is 0.40
>> Low watermark for memstore flush is 0.38
>> HBase Memstore size is 128MiB
>>
>> **Job still FAILS with the following config:**
>> Everything else the same except
>> The RPC client has 10 handlers
>>
>> **Job still FAILS with the following config:**
>> Everything else the same except
>> HFile Block Cache Size is 0.10
>>
>>
>> When this runs I get the following error stacktrace:
>> #
>> #How do I avoid this via configuration.
>> #
>>
>> java.lang.OutOfMemoryError: Java heap space
>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
>> 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase] org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected exception receiving call responses
>> #
>>
>> ###Yes, there was an RPC timeout this is what is killing the server because the timeout is eventually (1minute later) reached.
>>
>> #
>>
>> java.lang.OutOfMemoryError: Java heap space
>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
>> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
>> 2014-05-08 16:23:55,319 INFO [main] org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
>> 	at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
>> 	at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
>> 	at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
>> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
>> 	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>> 	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:415)
>> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>>
>> #
>>
>> ## Probably caused by the OOME above
>>
>> #
>>
>> Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 5612205039322936440 number_of_rows: 10000 close_scanner: false next_call_seq: 0
>> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
>> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
>> 	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
>>
>>
>

Re: RPC Client OutOfMemoryError Java Heap Space

Posted by Geovanie Marquez <ge...@gmail.com>.

@Stack I found it a bit tedious to attempt an update to the hbase book, I
am starting to wonder if I was going down the right path. DocBook is used
to update the page, so should I just update the html directly in some
notepad and resubmit using previous examples of the books sections?

Is there an example I should follow other than the only mapreduce section
already there to update the book?


On Thu, May 15, 2014 at 9:46 AM, Geovanie Marquez <
geovanie.marquez@gmail.com> wrote:

> Thanks for the suggestion  - I'll try to get that out this weekend
> sometime.
>
>
> On Tue, May 13, 2014 at 3:55 PM, Stack <st...@duboce.net> wrote:
>
>> A patch for the refguide would be great, perhaps in the troubleshooting
>> mapreduce section here
>> http://hbase.apache.org/book.html#trouble.mapreduce?
>> St.Ack
>>
>>
>> On Tue, May 13, 2014 at 7:07 AM, Geovanie Marquez <
>> geovanie.marquez@gmail.com> wrote:
>>
>> > The following property does exactly what I wanted our environment to
>> do. I
>> > had a 4GiB Heap and ran the job and no jobs failed. Then I dropped our
>> > cluster heap to 1GiB and reran the same resource intensive task.
>> >
>> > This property must be added to the "HBase Service Advanced Configuration
>> > Snippet (Safety Valve) for hbase-site.xml"
>> >
>> > <property>
>> > <name>hbase.client.scanner.max.result.size</name>
>> > <value>67108864</value>
>> > </property>
>> >
>> > We noted that 64MiB would be enough, but we also experimented 128MiB. I
>> may
>> > do a write-up and elaborate some more on this.
>> >
>> >
>> > On Mon, May 12, 2014 at 1:38 PM, Vladimir Rodionov
>> > <vr...@carrieriq.com>wrote:
>> >
>> > > All your OOME are on the client side (map task). Your map tasks need
>> more
>> > > heap.
>> > > Reduce # of map tasks and increase max heap size per map task.
>> > >
>> > > Best regards,
>> > > Vladimir Rodionov
>> > > Principal Platform Engineer
>> > > Carrier IQ, www.carrieriq.com
>> > > e-mail: vrodionov@carrieriq.com
>> > >
>> > > ________________________________________
>> > > From: Geovanie Marquez [geovanie.marquez@gmail.com]
>> > > Sent: Thursday, May 08, 2014 2:35 PM
>> > > To: user@hbase.apache.org
>> > > Subject: Re: RPC Client OutOfMemoryError Java Heap Space
>> > >
>> > > sorry didn't include version
>> > >
>> > > CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47
>> > >
>> > >
>> > > On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez <
>> > > geovanie.marquez@gmail.com
>> > > > wrote:
>> > >
>> > > > Hey group,
>> > > >
>> > > > There is one job that scans HBase contents and is really resource
>> > > > intensive using all resources available to yarn (under Resource
>> > Manager).
>> > > > In my case, that is 8GB. My expectation here is that a properly
>> > > configured
>> > > > cluster would kill the application or degrade the application
>> > performance
>> > > > but never ever take a region server down. This is intended to be a
>> > > > multi-tenant environment where developers may submit jobs at will
>> and I
>> > > > would want a configuration where the cluster services are not
>> exited in
>> > > > this way because of memory.
>> > > >
>> > > > The simple solution here, is to change the way the job consumes
>> > resources
>> > > > so that when run it is not so resource greedy. I want to understand
>> > how I
>> > > > can mitigate this situation in general.
>> > > >
>> > > > **It FAILS with the following config:**
>> > > > The RPC client has 30 handlers
>> > > > write buffer of 2MiB
>> > > > The RegionServer heap is 4GiB
>> > > > Max Size of all memstores is 0.40 of total heap
>> > > > HFile Block Cache Size is 0.40
>> > > > Low watermark for memstore flush is 0.38
>> > > > HBase Memstore size is 128MiB
>> > > >
>> > > > **Job still FAILS with the following config:**
>> > > > Everything else the same except
>> > > > The RPC client has 10 handlers
>> > > >
>> > > > **Job still FAILS with the following config:**
>> > > > Everything else the same except
>> > > > HFile Block Cache Size is 0.10
>> > > >
>> > > >
>> > > > When this runs I get the following error stacktrace:
>> > > > #
>> > > > #How do I avoid this via configuration.
>> > > > #
>> > > >
>> > > > java.lang.OutOfMemoryError: Java heap space
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
>> > > >       at
>> > >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
>> > > > 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to
>> > > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase]
>> > > org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to
>> > > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected
>> > > exception receiving call responses
>> > > > #
>> > > >
>> > > > ###Yes, there was an RPC timeout this is what is killing the server
>> > > because the timeout is eventually (1minute later) reached.
>> > > >
>> > > > #
>> > > >
>> > > > java.lang.OutOfMemoryError: Java heap space
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
>> > > >       at
>> > >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
>> > > > 2014-05-08 16:23:55,319 INFO [main]
>> > > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered
>> from
>> > > org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of
>> > > OutOfOrderScannerNextException: was there a rpc timeout?
>> > > >       at
>> > >
>> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>> > > >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>> > > >       at
>> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>> > > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>> > > >       at
>> org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>> > > >       at java.security.AccessController.doPrivileged(Native Method)
>> > > >       at javax.security.auth.Subject.doAs(Subject.java:415)
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>> > > >       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>> > > >
>> > > > #
>> > > >
>> > > > ## Probably caused by the OOME above
>> > > >
>> > > > #
>> > > >
>> > > > Caused by:
>> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
>> > Expected
>> > > nextCallSeq: 1 But the nextCallSeq got from client: 0;
>> > request=scanner_id:
>> > > 5612205039322936440 number_of_rows: 10000 close_scanner: false
>> > > next_call_seq: 0
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
>> > > >       at
>> > >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>> > > >       at
>> > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
>> > > >       at
>> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
>> > > >
>> > > >
>> > >
>> > > Confidentiality Notice:  The information contained in this message,
>> > > including any attachments hereto, may be confidential and is intended
>> to
>> > be
>> > > read only by the individual or entity to whom this message is
>> addressed.
>> > If
>> > > the reader of this message is not the intended recipient or an agent
>> or
>> > > designee of the intended recipient, please note that any review, use,
>> > > disclosure or distribution of this message or its attachments, in any
>> > form,
>> > > is strictly prohibited.  If you have received this message in error,
>> > please
>> > > immediately notify the sender and/or Notifications@carrieriq.com and
>> > > delete or destroy any copy of this message and its attachments.
>> > >
>> >
>>
>
>

Re: RPC Client OutOfMemoryError Java Heap Space

Posted by Geovanie Marquez <ge...@gmail.com>.

Thanks for the suggestion  - I'll try to get that out this weekend
sometime.


On Tue, May 13, 2014 at 3:55 PM, Stack <st...@duboce.net> wrote:

> A patch for the refguide would be great, perhaps in the troubleshooting
> mapreduce section here http://hbase.apache.org/book.html#trouble.mapreduce
> ?
> St.Ack
>
>
> On Tue, May 13, 2014 at 7:07 AM, Geovanie Marquez <
> geovanie.marquez@gmail.com> wrote:
>
> > The following property does exactly what I wanted our environment to do.
> I
> > had a 4GiB Heap and ran the job and no jobs failed. Then I dropped our
> > cluster heap to 1GiB and reran the same resource intensive task.
> >
> > This property must be added to the "HBase Service Advanced Configuration
> > Snippet (Safety Valve) for hbase-site.xml"
> >
> > <property>
> > <name>hbase.client.scanner.max.result.size</name>
> > <value>67108864</value>
> > </property>
> >
> > We noted that 64MiB would be enough, but we also experimented 128MiB. I
> may
> > do a write-up and elaborate some more on this.
> >
> >
> > On Mon, May 12, 2014 at 1:38 PM, Vladimir Rodionov
> > <vr...@carrieriq.com>wrote:
> >
> > > All your OOME are on the client side (map task). Your map tasks need
> more
> > > heap.
> > > Reduce # of map tasks and increase max heap size per map task.
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > > From: Geovanie Marquez [geovanie.marquez@gmail.com]
> > > Sent: Thursday, May 08, 2014 2:35 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: RPC Client OutOfMemoryError Java Heap Space
> > >
> > > sorry didn't include version
> > >
> > > CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47
> > >
> > >
> > > On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez <
> > > geovanie.marquez@gmail.com
> > > > wrote:
> > >
> > > > Hey group,
> > > >
> > > > There is one job that scans HBase contents and is really resource
> > > > intensive using all resources available to yarn (under Resource
> > Manager).
> > > > In my case, that is 8GB. My expectation here is that a properly
> > > configured
> > > > cluster would kill the application or degrade the application
> > performance
> > > > but never ever take a region server down. This is intended to be a
> > > > multi-tenant environment where developers may submit jobs at will
> and I
> > > > would want a configuration where the cluster services are not exited
> in
> > > > this way because of memory.
> > > >
> > > > The simple solution here, is to change the way the job consumes
> > resources
> > > > so that when run it is not so resource greedy. I want to understand
> > how I
> > > > can mitigate this situation in general.
> > > >
> > > > **It FAILS with the following config:**
> > > > The RPC client has 30 handlers
> > > > write buffer of 2MiB
> > > > The RegionServer heap is 4GiB
> > > > Max Size of all memstores is 0.40 of total heap
> > > > HFile Block Cache Size is 0.40
> > > > Low watermark for memstore flush is 0.38
> > > > HBase Memstore size is 128MiB
> > > >
> > > > **Job still FAILS with the following config:**
> > > > Everything else the same except
> > > > The RPC client has 10 handlers
> > > >
> > > > **Job still FAILS with the following config:**
> > > > Everything else the same except
> > > > HFile Block Cache Size is 0.10
> > > >
> > > >
> > > > When this runs I get the following error stacktrace:
> > > > #
> > > > #How do I avoid this via configuration.
> > > > #
> > > >
> > > > java.lang.OutOfMemoryError: Java heap space
> > > >       at
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> > > >       at
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > > > 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to
> > > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase]
> > > org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to
> > > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected
> > > exception receiving call responses
> > > > #
> > > >
> > > > ###Yes, there was an RPC timeout this is what is killing the server
> > > because the timeout is eventually (1minute later) reached.
> > > >
> > > > #
> > > >
> > > > java.lang.OutOfMemoryError: Java heap space
> > > >       at
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> > > >       at
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > > > 2014-05-08 16:23:55,319 INFO [main]
> > > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> > > org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of
> > > OutOfOrderScannerNextException: was there a rpc timeout?
> > > >       at
> > >
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
> > > >       at
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
> > > >       at
> > >
> >
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
> > > >       at
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
> > > >       at
> > >
> >
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> > > >       at
> > >
> >
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> > > >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > >       at
> > org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> > > >       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > > >       at java.security.AccessController.doPrivileged(Native Method)
> > > >       at javax.security.auth.Subject.doAs(Subject.java:415)
> > > >       at
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> > > >       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> > > >
> > > > #
> > > >
> > > > ## Probably caused by the OOME above
> > > >
> > > > #
> > > >
> > > > Caused by:
> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > Expected
> > > nextCallSeq: 1 But the nextCallSeq got from client: 0;
> > request=scanner_id:
> > > 5612205039322936440 number_of_rows: 10000 close_scanner: false
> > > next_call_seq: 0
> > > >       at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
> > > >       at
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > > >       at
> > org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > > >       at
> > > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > > >
> > > >
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
>

Re: RPC Client OutOfMemoryError Java Heap Space

Posted by Stack <st...@duboce.net>.

A patch for the refguide would be great, perhaps in the troubleshooting
mapreduce section here http://hbase.apache.org/book.html#trouble.mapreduce?
St.Ack


On Tue, May 13, 2014 at 7:07 AM, Geovanie Marquez <
geovanie.marquez@gmail.com> wrote:

> The following property does exactly what I wanted our environment to do. I
> had a 4GiB Heap and ran the job and no jobs failed. Then I dropped our
> cluster heap to 1GiB and reran the same resource intensive task.
>
> This property must be added to the "HBase Service Advanced Configuration
> Snippet (Safety Valve) for hbase-site.xml"
>
> <property>
> <name>hbase.client.scanner.max.result.size</name>
> <value>67108864</value>
> </property>
>
> We noted that 64MiB would be enough, but we also experimented 128MiB. I may
> do a write-up and elaborate some more on this.
>
>
> On Mon, May 12, 2014 at 1:38 PM, Vladimir Rodionov
> <vr...@carrieriq.com>wrote:
>
> > All your OOME are on the client side (map task). Your map tasks need more
> > heap.
> > Reduce # of map tasks and increase max heap size per map task.
> >
> > Best regards,
> > Vladimir Rodionov
> > Principal Platform Engineer
> > Carrier IQ, www.carrieriq.com
> > e-mail: vrodionov@carrieriq.com
> >
> > ________________________________________
> > From: Geovanie Marquez [geovanie.marquez@gmail.com]
> > Sent: Thursday, May 08, 2014 2:35 PM
> > To: user@hbase.apache.org
> > Subject: Re: RPC Client OutOfMemoryError Java Heap Space
> >
> > sorry didn't include version
> >
> > CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47
> >
> >
> > On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez <
> > geovanie.marquez@gmail.com
> > > wrote:
> >
> > > Hey group,
> > >
> > > There is one job that scans HBase contents and is really resource
> > > intensive using all resources available to yarn (under Resource
> Manager).
> > > In my case, that is 8GB. My expectation here is that a properly
> > configured
> > > cluster would kill the application or degrade the application
> performance
> > > but never ever take a region server down. This is intended to be a
> > > multi-tenant environment where developers may submit jobs at will and I
> > > would want a configuration where the cluster services are not exited in
> > > this way because of memory.
> > >
> > > The simple solution here, is to change the way the job consumes
> resources
> > > so that when run it is not so resource greedy. I want to understand
> how I
> > > can mitigate this situation in general.
> > >
> > > **It FAILS with the following config:**
> > > The RPC client has 30 handlers
> > > write buffer of 2MiB
> > > The RegionServer heap is 4GiB
> > > Max Size of all memstores is 0.40 of total heap
> > > HFile Block Cache Size is 0.40
> > > Low watermark for memstore flush is 0.38
> > > HBase Memstore size is 128MiB
> > >
> > > **Job still FAILS with the following config:**
> > > Everything else the same except
> > > The RPC client has 10 handlers
> > >
> > > **Job still FAILS with the following config:**
> > > Everything else the same except
> > > HFile Block Cache Size is 0.10
> > >
> > >
> > > When this runs I get the following error stacktrace:
> > > #
> > > #How do I avoid this via configuration.
> > > #
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > >       at
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> > >       at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > > 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to
> > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase]
> > org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to
> > c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected
> > exception receiving call responses
> > > #
> > >
> > > ###Yes, there was an RPC timeout this is what is killing the server
> > because the timeout is eventually (1minute later) reached.
> > >
> > > #
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > >       at
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> > >       at
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > > 2014-05-08 16:23:55,319 INFO [main]
> > org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> > org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of
> > OutOfOrderScannerNextException: was there a rpc timeout?
> > >       at
> > org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
> > >       at
> >
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
> > >       at
> >
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
> > >       at
> >
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
> > >       at
> >
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> > >       at
> >
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> > >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > >       at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> > >       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> > >       at java.security.AccessController.doPrivileged(Native Method)
> > >       at javax.security.auth.Subject.doAs(Subject.java:415)
> > >       at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> > >       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> > >
> > > #
> > >
> > > ## Probably caused by the OOME above
> > >
> > > #
> > >
> > > Caused by:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> > org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> Expected
> > nextCallSeq: 1 But the nextCallSeq got from client: 0;
> request=scanner_id:
> > 5612205039322936440 number_of_rows: 10000 close_scanner: false
> > next_call_seq: 0
> > >       at
> >
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
> > >       at
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> > >       at
> org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> > >       at
> > org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> > >
> > >
> >
> > Confidentiality Notice:  The information contained in this message,
> > including any attachments hereto, may be confidential and is intended to
> be
> > read only by the individual or entity to whom this message is addressed.
> If
> > the reader of this message is not the intended recipient or an agent or
> > designee of the intended recipient, please note that any review, use,
> > disclosure or distribution of this message or its attachments, in any
> form,
> > is strictly prohibited.  If you have received this message in error,
> please
> > immediately notify the sender and/or Notifications@carrieriq.com and
> > delete or destroy any copy of this message and its attachments.
> >
>

Re: RPC Client OutOfMemoryError Java Heap Space

Posted by Geovanie Marquez <ge...@gmail.com>.

The following property does exactly what I wanted our environment to do. I
had a 4GiB Heap and ran the job and no jobs failed. Then I dropped our
cluster heap to 1GiB and reran the same resource intensive task.

This property must be added to the "HBase Service Advanced Configuration
Snippet (Safety Valve) for hbase-site.xml"

<property>
<name>hbase.client.scanner.max.result.size</name>
<value>67108864</value>
</property>

We noted that 64MiB would be enough, but we also experimented 128MiB. I may
do a write-up and elaborate some more on this.


On Mon, May 12, 2014 at 1:38 PM, Vladimir Rodionov
<vr...@carrieriq.com>wrote:

> All your OOME are on the client side (map task). Your map tasks need more
> heap.
> Reduce # of map tasks and increase max heap size per map task.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Geovanie Marquez [geovanie.marquez@gmail.com]
> Sent: Thursday, May 08, 2014 2:35 PM
> To: user@hbase.apache.org
> Subject: Re: RPC Client OutOfMemoryError Java Heap Space
>
> sorry didn't include version
>
> CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47
>
>
> On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez <
> geovanie.marquez@gmail.com
> > wrote:
>
> > Hey group,
> >
> > There is one job that scans HBase contents and is really resource
> > intensive using all resources available to yarn (under Resource Manager).
> > In my case, that is 8GB. My expectation here is that a properly
> configured
> > cluster would kill the application or degrade the application performance
> > but never ever take a region server down. This is intended to be a
> > multi-tenant environment where developers may submit jobs at will and I
> > would want a configuration where the cluster services are not exited in
> > this way because of memory.
> >
> > The simple solution here, is to change the way the job consumes resources
> > so that when run it is not so resource greedy. I want to understand how I
> > can mitigate this situation in general.
> >
> > **It FAILS with the following config:**
> > The RPC client has 30 handlers
> > write buffer of 2MiB
> > The RegionServer heap is 4GiB
> > Max Size of all memstores is 0.40 of total heap
> > HFile Block Cache Size is 0.40
> > Low watermark for memstore flush is 0.38
> > HBase Memstore size is 128MiB
> >
> > **Job still FAILS with the following config:**
> > Everything else the same except
> > The RPC client has 10 handlers
> >
> > **Job still FAILS with the following config:**
> > Everything else the same except
> > HFile Block Cache Size is 0.10
> >
> >
> > When this runs I get the following error stacktrace:
> > #
> > #How do I avoid this via configuration.
> > #
> >
> > java.lang.OutOfMemoryError: Java heap space
> >       at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> >       at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to
> c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase]
> org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to
> c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected
> exception receiving call responses
> > #
> >
> > ###Yes, there was an RPC timeout this is what is killing the server
> because the timeout is eventually (1minute later) reached.
> >
> > #
> >
> > java.lang.OutOfMemoryError: Java heap space
> >       at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> >       at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> > 2014-05-08 16:23:55,319 INFO [main]
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from
> org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of
> OutOfOrderScannerNextException: was there a rpc timeout?
> >       at
> org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
> >       at
> org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
> >       at
> org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
> >       at
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
> >       at
> org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> >       at
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> >       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> >       at java.security.AccessController.doPrivileged(Native Method)
> >       at javax.security.auth.Subject.doAs(Subject.java:415)
> >       at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> >       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
> >
> > #
> >
> > ## Probably caused by the OOME above
> >
> > #
> >
> > Caused by:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException:
> org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected
> nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id:
> 5612205039322936440 number_of_rows: 10000 close_scanner: false
> next_call_seq: 0
> >       at
> org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
> >       at
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> >       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> >       at
> org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
> >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

RE: RPC Client OutOfMemoryError Java Heap Space

Posted by Vladimir Rodionov <vr...@carrieriq.com>.

All your OOME are on the client side (map task). Your map tasks need more heap.
Reduce # of map tasks and increase max heap size per map task.

Best regards,
Vladimir Rodionov
Principal Platform Engineer
Carrier IQ, www.carrieriq.com
e-mail: vrodionov@carrieriq.com

________________________________________
From: Geovanie Marquez [geovanie.marquez@gmail.com]
Sent: Thursday, May 08, 2014 2:35 PM
To: user@hbase.apache.org
Subject: Re: RPC Client OutOfMemoryError Java Heap Space

sorry didn't include version

CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47


On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez <geovanie.marquez@gmail.com
> wrote:

> Hey group,
>
> There is one job that scans HBase contents and is really resource
> intensive using all resources available to yarn (under Resource Manager).
> In my case, that is 8GB. My expectation here is that a properly configured
> cluster would kill the application or degrade the application performance
> but never ever take a region server down. This is intended to be a
> multi-tenant environment where developers may submit jobs at will and I
> would want a configuration where the cluster services are not exited in
> this way because of memory.
>
> The simple solution here, is to change the way the job consumes resources
> so that when run it is not so resource greedy. I want to understand how I
> can mitigate this situation in general.
>
> **It FAILS with the following config:**
> The RPC client has 30 handlers
> write buffer of 2MiB
> The RegionServer heap is 4GiB
> Max Size of all memstores is 0.40 of total heap
> HFile Block Cache Size is 0.40
> Low watermark for memstore flush is 0.38
> HBase Memstore size is 128MiB
>
> **Job still FAILS with the following config:**
> Everything else the same except
> The RPC client has 10 handlers
>
> **Job still FAILS with the following config:**
> Everything else the same except
> HFile Block Cache Size is 0.10
>
>
> When this runs I get the following error stacktrace:
> #
> #How do I avoid this via configuration.
> #
>
> java.lang.OutOfMemoryError: Java heap space
>       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
>       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase] org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected exception receiving call responses
> #
>
> ###Yes, there was an RPC timeout this is what is killing the server because the timeout is eventually (1minute later) reached.
>
> #
>
> java.lang.OutOfMemoryError: Java heap space
>       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
>       at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> 2014-05-08 16:23:55,319 INFO [main] org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
>       at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
>       at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
>       at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
>       at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
>       at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
>       at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
>       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>       at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:415)
>       at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
>       at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
> #
>
> ## Probably caused by the OOME above
>
> #
>
> Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 5612205039322936440 number_of_rows: 10000 close_scanner: false next_call_seq: 0
>       at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
>       at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
>       at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
>       at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
>
>

Confidentiality Notice:  The information contained in this message, including any attachments hereto, may be confidential and is intended to be read only by the individual or entity to whom this message is addressed. If the reader of this message is not the intended recipient or an agent or designee of the intended recipient, please note that any review, use, disclosure or distribution of this message or its attachments, in any form, is strictly prohibited.  If you have received this message in error, please immediately notify the sender and/or Notifications@carrieriq.com and delete or destroy any copy of this message and its attachments.

Re: RPC Client OutOfMemoryError Java Heap Space

Posted by Geovanie Marquez <ge...@gmail.com>.

sorry didn't include version

CDH5 version - CDH-5.0.0-1.cdh5.0.0.p0.47


On Thu, May 8, 2014 at 5:32 PM, Geovanie Marquez <geovanie.marquez@gmail.com
> wrote:

> Hey group,
>
> There is one job that scans HBase contents and is really resource
> intensive using all resources available to yarn (under Resource Manager).
> In my case, that is 8GB. My expectation here is that a properly configured
> cluster would kill the application or degrade the application performance
> but never ever take a region server down. This is intended to be a
> multi-tenant environment where developers may submit jobs at will and I
> would want a configuration where the cluster services are not exited in
> this way because of memory.
>
> The simple solution here, is to change the way the job consumes resources
> so that when run it is not so resource greedy. I want to understand how I
> can mitigate this situation in general.
>
> **It FAILS with the following config:**
> The RPC client has 30 handlers
> write buffer of 2MiB
> The RegionServer heap is 4GiB
> Max Size of all memstores is 0.40 of total heap
> HFile Block Cache Size is 0.40
> Low watermark for memstore flush is 0.38
> HBase Memstore size is 128MiB
>
> **Job still FAILS with the following config:**
> Everything else the same except
> The RPC client has 10 handlers
>
> **Job still FAILS with the following config:**
> Everything else the same except
> HFile Block Cache Size is 0.10
>
>
> When this runs I get the following error stacktrace:
> #
> #How do I avoid this via configuration.
> #
>
> java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> 2014-05-08 16:23:54,705 WARN [IPC Client (1242056950) connection to c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase] org.apache.hadoop.ipc.RpcClient: IPC Client (1242056950) connection to c1d001.in.wellcentive.com/10.2.4.21:60020 from hbase: unexpected exception receiving call responses
> #
>
> ###Yes, there was an RPC timeout this is what is killing the server because the timeout is eventually (1minute later) reached.
>
> #
>
> java.lang.OutOfMemoryError: Java heap space
> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1100)
> 	at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:721)
> 2014-05-08 16:23:55,319 INFO [main] org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl: recovered from org.apache.hadoop.hbase.DoNotRetryIOException: Failed after retry of OutOfOrderScannerNextException: was there a rpc timeout?
> 	at org.apache.hadoop.hbase.client.ClientScanner.next(ClientScanner.java:384)
> 	at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue(TableRecordReaderImpl.java:194)
> 	at org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue(TableRecordReader.java:138)
> 	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:533)
> 	at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
> 	at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
> 	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
> 	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
>
> #
>
> ## Probably caused by the OOME above
>
> #
>
> Caused by: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: org.apache.hadoop.hbase.exceptions.OutOfOrderScannerNextException: Expected nextCallSeq: 1 But the nextCallSeq got from client: 0; request=scanner_id: 5612205039322936440 number_of_rows: 10000 close_scanner: false next_call_seq: 0
> 	at org.apache.hadoop.hbase.regionserver.HRegionServer.scan(HRegionServer.java:3018)
> 	at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:26929)
> 	at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2175)
> 	at org.apache.hadoop.hbase.ipc.RpcServer$Handler.run(RpcServer.java:1879)
>
>