You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Bryan Keller <br...@gmail.com> on 2012/12/14 23:59:08 UTC

HBaseClient.call() hang

I have encountered a problem with HBaseClient.call() hanging. This occurs when one of my regionservers goes down while performing a table scan.

What exacerbates this problem is that the scan I am performing uses filters, and the region size of the table is large (4gb). Because of this, it can take several minutes for a row to be returned when calling scanner.next(). Apparently there is no keep alive message being sent back to the scanner while the region server is busy, so I had to increase the hbase.rpc.timeout value to a large number (60 min), otherwise the next() call will timeout waiting for the regionserver to send something back.

The result is that this HBaseClient.call() hang is made much worse, because it won't time out for 60 minutes.

I have a couple of questions:

1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed that call.wait() is not using any timeout so it will wait indefinitely until interrupted externally

2. Is there a solution where I do not need to set hbase.rpc.timeout to a very large number? My only thought would be to forego using filters and do the filtering client side, which seems pretty inefficient

Here is a stack dump of the thread that was hung:

Thread 10609: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
 - org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable, java.net.InetSocketAddress, java.lang.Class, org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted frame)
 - org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150 (Interpreted frame)
 - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
 - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92 (Interpreted frame)
 - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42 (Interpreted frame)
 - org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable) @bci=36, line=1325 (Interpreted frame)
 - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117, line=1299 (Compiled frame)
 - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue() @bci=41, line=150 (Interpreted frame)
 - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue() @bci=4, line=142 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() @bci=4, line=458 (Interpreted frame)
 - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, line=76 (Interpreted frame)
 - org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() @bci=4, line=85 (Interpreted frame)
 - org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context) @bci=6, line=139 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325 (Interpreted frame)
 - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted frame)
 - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
 - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
 - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1332 (Interpreted frame)
 - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776, line=262 (Interpreted frame)

Re: HBaseClient.call() hang

Posted by Bryan Keller <br...@gmail.com>.

Forgot to mention that. It's version 0.92.1 (Cloudera CDH4.1.1), running on CentOS 6 64 bit, Java 1.6.0_31

On Dec 14, 2012, at 5:31 PM, lars hofhansl <lh...@yahoo.com> wrote:

> Hey Bryan, 
> 
> 
> which version of HBase it this?
> 
> -- Lars
> 
> 
> 
> ________________________________
> From: Bryan Keller <br...@gmail.com>
> To: "user@hbase.apache.org" <us...@hbase.apache.org> 
> Sent: Friday, December 14, 2012 2:59 PM
> Subject: HBaseClient.call() hang
> 
> I have encountered a problem with HBaseClient.call() hanging. This occurs when one of my regionservers goes down while performing a table scan.
> 
> What exacerbates this problem is that the scan I am performing uses filters, and the region size of the table is large (4gb). Because of this, it can take several minutes for a row to be returned when calling scanner.next(). Apparently there is no keep alive message being sent back to the scanner while the region server is busy, so I had to increase the hbase.rpc.timeout value to a large number (60 min), otherwise the next() call will timeout waiting for the regionserver to send something back.
> 
> The result is that this HBaseClient.call() hang is made much worse, because it won't time out for 60 minutes.
> 
> I have a couple of questions:
> 
> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed that call.wait() is not using any timeout so it will wait indefinitely until interrupted externally
> 
> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a very large number? My only thought would be to forego using filters and do the filtering client side, which seems pretty inefficient
> 
> Here is a stack dump of the thread that was hung:
> 
> Thread 10609: (state = BLOCKED)
> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
> - org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable, java.net.InetSocketAddress, java.lang.Class, org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted frame)
> - org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150 (Interpreted frame)
> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92 (Interpreted frame)
> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42 (Interpreted frame)
> - org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable) @bci=36, line=1325 (Interpreted frame)
> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117, line=1299 (Compiled frame)
> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue() @bci=41, line=150 (Interpreted frame)
> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue() @bci=4, line=142 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() @bci=4, line=458 (Interpreted frame)
> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, line=76 (Interpreted frame)
> - org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() @bci=4, line=85 (Interpreted frame)
> - org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context) @bci=6, line=139 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted frame)
> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325 (Interpreted frame)
> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted frame)
> - java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
> - javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
> - org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1332 (Interpreted frame)
> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776, line=262 (Interpreted frame)

Re: HBaseClient.call() hang

Posted by lars hofhansl <lh...@yahoo.com>.

Hey Bryan, 


which version of HBase it this?

-- Lars



________________________________
 From: Bryan Keller <br...@gmail.com>
To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Sent: Friday, December 14, 2012 2:59 PM
Subject: HBaseClient.call() hang
 
I have encountered a problem with HBaseClient.call() hanging. This occurs when one of my regionservers goes down while performing a table scan.

What exacerbates this problem is that the scan I am performing uses filters, and the region size of the table is large (4gb). Because of this, it can take several minutes for a row to be returned when calling scanner.next(). Apparently there is no keep alive message being sent back to the scanner while the region server is busy, so I had to increase the hbase.rpc.timeout value to a large number (60 min), otherwise the next() call will timeout waiting for the regionserver to send something back.

The result is that this HBaseClient.call() hang is made much worse, because it won't time out for 60 minutes.

I have a couple of questions:

1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed that call.wait() is not using any timeout so it will wait indefinitely until interrupted externally

2. Is there a solution where I do not need to set hbase.rpc.timeout to a very large number? My only thought would be to forego using filters and do the filtering client side, which seems pretty inefficient

Here is a stack dump of the thread that was hung:

Thread 10609: (state = BLOCKED)
- java.lang.Object.wait(long) @bci=0 (Interpreted frame)
- java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
- org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable, java.net.InetSocketAddress, java.lang.Class, org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted frame)
- org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object, java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150 (Interpreted frame)
- $Proxy12.next(long, int) @bci=26 (Interpreted frame)
- org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92 (Interpreted frame)
- org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42 (Interpreted frame)
- org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable) @bci=36, line=1325 (Interpreted frame)
- org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117, line=1299 (Compiled frame)
- org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue() @bci=41, line=150 (Interpreted frame)
- org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue() @bci=4, line=142 (Interpreted frame)
- org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue() @bci=4, line=458 (Interpreted frame)
- org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4, line=76 (Interpreted frame)
- org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue() @bci=4, line=85 (Interpreted frame)
- org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context) @bci=6, line=139 (Interpreted frame)
- org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex, org.apache.hadoop.mapred.TaskUmbilicalProtocol, org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted frame)
- org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf, org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325 (Interpreted frame)
- org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted frame)
- java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction, java.security.AccessControlContext) @bci=0 (Interpreted frame)
- javax.security.auth.Subject.doAs(javax.security.auth.Subject, java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted frame)
- org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction) @bci=14, line=1332 (Interpreted frame)
- org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776, line=262 (Interpreted frame)

Re: HBaseClient.call() hang

Posted by Azuryy Yu <az...@gmail.com>.

Don't increase RS timeout to avoid this issue. what size of your block
size? and can you paste your JVM options here?

I also met a long GC problem, but I tuned jvm options, it works very well
now.


On Tue, Dec 18, 2012 at 1:18 AM, Bryan Keller <br...@gmail.com> wrote:

> It seems there was a cascading effect. The regionservers were busy with
> scanning a table, which resulted in some long GC's. The GC's were long
> enough to trigger the Zookeeper timeout on at least one regionserver, which
> resulted in the regionserver shutting itself down. This then caused the
> Object.wait() call which got stuck, and only exited after the very long RPC
> timeout.
>
> I have done a fair amount of work optimizing the GCs, and I increased the
> regionserver timeouts, which should help with the regionserver shutdowns.
> But if a regionserver does shut down for some other reason, this will still
> result in the Object.wait() hang.
>
> One approach might be to have the regionservers send back a keep-alive, or
> progress, message during a scan, and that message would reset the RPC
> timer. The regionserver could do this every x number of rows processed
> server-side. Then the RPC timeout could be something more sensible rather
> than being set to the longest time it takes to scan a region.
>
> HBASE-5416 looks useful, it will make scans faster, but the problem I'm
> encountering will still be present, but perhaps I could set the RPC timeout
> a bit lower. HBASE-6313 might fix the hang, in which case I could live with
> the longer RPC timeout setting.
>
>
> On Dec 14, 2012, at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Bryan:
> >
> > bq. My only thought would be to forego using filters
> > Please keep using filters.
> >
> > I and Sergey are working on HBASE-5416: Improve performance of scans with
> > some kind of filters
> > This feature allows you to specify one column family as being essential.
> > The other column family is only returned to client when essential column
> > family matches. I wonder if this may be of help to you.
> >
> > You mentioned regionserver going down or being busy. I assume it was not
> > often that regionserver(s) went down. For busy region server, did you try
> > jstack'ing regionserver process ?
> >
> > Thanks
> >
> > On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
> >
> >> I have encountered a problem with HBaseClient.call() hanging. This
> occurs
> >> when one of my regionservers goes down while performing a table scan.
> >>
> >> What exacerbates this problem is that the scan I am performing uses
> >> filters, and the region size of the table is large (4gb). Because of
> this,
> >> it can take several minutes for a row to be returned when calling
> >> scanner.next(). Apparently there is no keep alive message being sent
> back
> >> to the scanner while the region server is busy, so I had to increase the
> >> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
> >> call will timeout waiting for the regionserver to send something back.
> >>
> >> The result is that this HBaseClient.call() hang is made much worse,
> >> because it won't time out for 60 minutes.
> >>
> >> I have a couple of questions:
> >>
> >> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I
> noticed
> >> that call.wait() is not using any timeout so it will wait indefinitely
> >> until interrupted externally
> >>
> >> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
> >> very large number? My only thought would be to forego using filters and
> do
> >> the filtering client side, which seems pretty inefficient
> >>
> >> Here is a stack dump of the thread that was hung:
> >>
> >> Thread 10609: (state = BLOCKED)
> >> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
> >> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
> >> java.net.InetSocketAddress, java.lang.Class,
> >> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904
> (Interpreted
> >> frame)
> >> -
> >>
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
> >> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
> >> (Interpreted frame)
> >> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
> >> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
> >> (Interpreted frame)
> >> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
> >> (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
> >> @bci=36, line=1325 (Interpreted frame)
> >> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
> >> line=1299 (Compiled frame)
> >> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
> >> @bci=41, line=150 (Interpreted frame)
> >> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
> >> @bci=4, line=142 (Interpreted frame)
> >> -
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
> >> @bci=4, line=458 (Interpreted frame)
> >> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
> >> line=76 (Interpreted frame)
> >> -
> >> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
> >> @bci=4, line=85 (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
> >> @bci=6, line=139 (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
> >> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
> >> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
> >> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645
> (Interpreted
> >> frame)
> >> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
> >> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
> >> (Interpreted frame)
> >> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
> >> frame)
> >> -
> >>
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
> >> java.security.AccessControlContext) @bci=0 (Interpreted frame)
> >> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
> >> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
> >> frame)
> >> -
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
> >> @bci=14, line=1332 (Interpreted frame)
> >> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
> >> line=262 (Interpreted frame)
> >>
> >>
>
>

Re: HBaseClient.call() hang

Posted by Bryan Keller <br...@gmail.com>.

Yes, that is what I would expect, but the client is stuck in the Object.wait() call and doesn't get notified until the RPC timeout passes.

On Dec 18, 2012, at 12:27 PM, "Mesika, Asaf" <as...@gmail.com> wrote:

> One thing I don't get:
> If the RS went down, then the RPC connection should have been reset, thus causing the client to interrupt, right? It shouldn't be a matter of timeout at all.
> 
> On Dec 17, 2012, at 7:18 PM, Bryan Keller wrote:
> 
>> It seems there was a cascading effect. The regionservers were busy with scanning a table, which resulted in some long GC's. The GC's were long enough to trigger the Zookeeper timeout on at least one regionserver, which resulted in the regionserver shutting itself down. This then caused the Object.wait() call which got stuck, and only exited after the very long RPC timeout.
>> 
>> I have done a fair amount of work optimizing the GCs, and I increased the regionserver timeouts, which should help with the regionserver shutdowns. But if a regionserver does shut down for some other reason, this will still result in the Object.wait() hang.
>> 
>> One approach might be to have the regionservers send back a keep-alive, or progress, message during a scan, and that message would reset the RPC timer. The regionserver could do this every x number of rows processed server-side. Then the RPC timeout could be something more sensible rather than being set to the longest time it takes to scan a region.
>> 
>> HBASE-5416 looks useful, it will make scans faster, but the problem I'm encountering will still be present, but perhaps I could set the RPC timeout a bit lower. HBASE-6313 might fix the hang, in which case I could live with the longer RPC timeout setting.
>> 
>> 
>> On Dec 14, 2012, at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:
>> 
>>> Bryan:
>>> 
>>> bq. My only thought would be to forego using filters
>>> Please keep using filters.
>>> 
>>> I and Sergey are working on HBASE-5416: Improve performance of scans with
>>> some kind of filters
>>> This feature allows you to specify one column family as being essential.
>>> The other column family is only returned to client when essential column
>>> family matches. I wonder if this may be of help to you.
>>> 
>>> You mentioned regionserver going down or being busy. I assume it was not
>>> often that regionserver(s) went down. For busy region server, did you try
>>> jstack'ing regionserver process ?
>>> 
>>> Thanks
>>> 
>>> On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
>>> 
>>>> I have encountered a problem with HBaseClient.call() hanging. This occurs
>>>> when one of my regionservers goes down while performing a table scan.
>>>> 
>>>> What exacerbates this problem is that the scan I am performing uses
>>>> filters, and the region size of the table is large (4gb). Because of this,
>>>> it can take several minutes for a row to be returned when calling
>>>> scanner.next(). Apparently there is no keep alive message being sent back
>>>> to the scanner while the region server is busy, so I had to increase the
>>>> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
>>>> call will timeout waiting for the regionserver to send something back.
>>>> 
>>>> The result is that this HBaseClient.call() hang is made much worse,
>>>> because it won't time out for 60 minutes.
>>>> 
>>>> I have a couple of questions:
>>>> 
>>>> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed
>>>> that call.wait() is not using any timeout so it will wait indefinitely
>>>> until interrupted externally
>>>> 
>>>> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
>>>> very large number? My only thought would be to forego using filters and do
>>>> the filtering client side, which seems pretty inefficient
>>>> 
>>>> Here is a stack dump of the thread that was hung:
>>>> 
>>>> Thread 10609: (state = BLOCKED)
>>>> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>>>> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>>>> -
>>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
>>>> java.net.InetSocketAddress, java.lang.Class,
>>>> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted
>>>> frame)
>>>> -
>>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
>>>> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
>>>> (Interpreted frame)
>>>> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>>>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
>>>> (Interpreted frame)
>>>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
>>>> (Interpreted frame)
>>>> -
>>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
>>>> @bci=36, line=1325 (Interpreted frame)
>>>> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
>>>> line=1299 (Compiled frame)
>>>> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
>>>> @bci=41, line=150 (Interpreted frame)
>>>> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
>>>> @bci=4, line=142 (Interpreted frame)
>>>> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
>>>> @bci=4, line=458 (Interpreted frame)
>>>> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
>>>> line=76 (Interpreted frame)
>>>> -
>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
>>>> @bci=4, line=85 (Interpreted frame)
>>>> -
>>>> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
>>>> @bci=6, line=139 (Interpreted frame)
>>>> -
>>>> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
>>>> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
>>>> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
>>>> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted
>>>> frame)
>>>> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
>>>> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
>>>> (Interpreted frame)
>>>> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
>>>> frame)
>>>> -
>>>> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>>>> java.security.AccessControlContext) @bci=0 (Interpreted frame)
>>>> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
>>>> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
>>>> frame)
>>>> -
>>>> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>>>> @bci=14, line=1332 (Interpreted frame)
>>>> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
>>>> line=262 (Interpreted frame)
>>>> 
>>>> 
>> 
>

Re: HBaseClient.call() hang

Posted by "Mesika, Asaf" <as...@gmail.com>.

One thing I don't get:
If the RS went down, then the RPC connection should have been reset, thus causing the client to interrupt, right? It shouldn't be a matter of timeout at all.

On Dec 17, 2012, at 7:18 PM, Bryan Keller wrote:

> It seems there was a cascading effect. The regionservers were busy with scanning a table, which resulted in some long GC's. The GC's were long enough to trigger the Zookeeper timeout on at least one regionserver, which resulted in the regionserver shutting itself down. This then caused the Object.wait() call which got stuck, and only exited after the very long RPC timeout.
> 
> I have done a fair amount of work optimizing the GCs, and I increased the regionserver timeouts, which should help with the regionserver shutdowns. But if a regionserver does shut down for some other reason, this will still result in the Object.wait() hang.
> 
> One approach might be to have the regionservers send back a keep-alive, or progress, message during a scan, and that message would reset the RPC timer. The regionserver could do this every x number of rows processed server-side. Then the RPC timeout could be something more sensible rather than being set to the longest time it takes to scan a region.
> 
> HBASE-5416 looks useful, it will make scans faster, but the problem I'm encountering will still be present, but perhaps I could set the RPC timeout a bit lower. HBASE-6313 might fix the hang, in which case I could live with the longer RPC timeout setting.
> 
> 
> On Dec 14, 2012, at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:
> 
>> Bryan:
>> 
>> bq. My only thought would be to forego using filters
>> Please keep using filters.
>> 
>> I and Sergey are working on HBASE-5416: Improve performance of scans with
>> some kind of filters
>> This feature allows you to specify one column family as being essential.
>> The other column family is only returned to client when essential column
>> family matches. I wonder if this may be of help to you.
>> 
>> You mentioned regionserver going down or being busy. I assume it was not
>> often that regionserver(s) went down. For busy region server, did you try
>> jstack'ing regionserver process ?
>> 
>> Thanks
>> 
>> On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
>> 
>>> I have encountered a problem with HBaseClient.call() hanging. This occurs
>>> when one of my regionservers goes down while performing a table scan.
>>> 
>>> What exacerbates this problem is that the scan I am performing uses
>>> filters, and the region size of the table is large (4gb). Because of this,
>>> it can take several minutes for a row to be returned when calling
>>> scanner.next(). Apparently there is no keep alive message being sent back
>>> to the scanner while the region server is busy, so I had to increase the
>>> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
>>> call will timeout waiting for the regionserver to send something back.
>>> 
>>> The result is that this HBaseClient.call() hang is made much worse,
>>> because it won't time out for 60 minutes.
>>> 
>>> I have a couple of questions:
>>> 
>>> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed
>>> that call.wait() is not using any timeout so it will wait indefinitely
>>> until interrupted externally
>>> 
>>> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
>>> very large number? My only thought would be to forego using filters and do
>>> the filtering client side, which seems pretty inefficient
>>> 
>>> Here is a stack dump of the thread that was hung:
>>> 
>>> Thread 10609: (state = BLOCKED)
>>> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>>> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>>> -
>>> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
>>> java.net.InetSocketAddress, java.lang.Class,
>>> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted
>>> frame)
>>> -
>>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
>>> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
>>> (Interpreted frame)
>>> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
>>> (Interpreted frame)
>>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
>>> (Interpreted frame)
>>> -
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
>>> @bci=36, line=1325 (Interpreted frame)
>>> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
>>> line=1299 (Compiled frame)
>>> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
>>> @bci=41, line=150 (Interpreted frame)
>>> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
>>> @bci=4, line=142 (Interpreted frame)
>>> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
>>> @bci=4, line=458 (Interpreted frame)
>>> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
>>> line=76 (Interpreted frame)
>>> -
>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
>>> @bci=4, line=85 (Interpreted frame)
>>> -
>>> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
>>> @bci=6, line=139 (Interpreted frame)
>>> -
>>> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
>>> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
>>> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
>>> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted
>>> frame)
>>> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
>>> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
>>> (Interpreted frame)
>>> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
>>> frame)
>>> -
>>> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>>> java.security.AccessControlContext) @bci=0 (Interpreted frame)
>>> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
>>> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
>>> frame)
>>> -
>>> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>>> @bci=14, line=1332 (Interpreted frame)
>>> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
>>> line=262 (Interpreted frame)
>>> 
>>> 
>

Re: HBaseClient.call() hang

Posted by Bryan Keller <br...@gmail.com>.

The regionserver shutdowns I have pretty much under control. My main problems is with the Object.wait() hang and RPC timeout with very selective filters. My region sizes are 4gb for this particular table.


On Dec 17, 2012, at 4:50 PM, Azuryy Yu <az...@gmail.com> wrote:

> Don't increase RS timeout to avoid this issue. what size of your block
> size? and can you paste your JVM options here?
> 
> I also met a long GC problem, but I tuned jvm options, it works very well
> now.
> 
> 
> On Tue, Dec 18, 2012 at 1:18 AM, Bryan Keller <br...@gmail.com> wrote:
> 
>> It seems there was a cascading effect. The regionservers were busy with
>> scanning a table, which resulted in some long GC's. The GC's were long
>> enough to trigger the Zookeeper timeout on at least one regionserver, which
>> resulted in the regionserver shutting itself down. This then caused the
>> Object.wait() call which got stuck, and only exited after the very long RPC
>> timeout.
>> 
>> I have done a fair amount of work optimizing the GCs, and I increased the
>> regionserver timeouts, which should help with the regionserver shutdowns.
>> But if a regionserver does shut down for some other reason, this will still
>> result in the Object.wait() hang.
>> 
>> One approach might be to have the regionservers send back a keep-alive, or
>> progress, message during a scan, and that message would reset the RPC
>> timer. The regionserver could do this every x number of rows processed
>> server-side. Then the RPC timeout could be something more sensible rather
>> than being set to the longest time it takes to scan a region.
>> 
>> HBASE-5416 looks useful, it will make scans faster, but the problem I'm
>> encountering will still be present, but perhaps I could set the RPC timeout
>> a bit lower. HBASE-6313 might fix the hang, in which case I could live with
>> the longer RPC timeout setting.
>> 
>> 
>> On Dec 14, 2012, at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:
>> 
>>> Bryan:
>>> 
>>> bq. My only thought would be to forego using filters
>>> Please keep using filters.
>>> 
>>> I and Sergey are working on HBASE-5416: Improve performance of scans with
>>> some kind of filters
>>> This feature allows you to specify one column family as being essential.
>>> The other column family is only returned to client when essential column
>>> family matches. I wonder if this may be of help to you.
>>> 
>>> You mentioned regionserver going down or being busy. I assume it was not
>>> often that regionserver(s) went down. For busy region server, did you try
>>> jstack'ing regionserver process ?
>>> 
>>> Thanks
>>> 
>>> On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
>>> 
>>>> I have encountered a problem with HBaseClient.call() hanging. This
>> occurs
>>>> when one of my regionservers goes down while performing a table scan.
>>>> 
>>>> What exacerbates this problem is that the scan I am performing uses
>>>> filters, and the region size of the table is large (4gb). Because of
>> this,
>>>> it can take several minutes for a row to be returned when calling
>>>> scanner.next(). Apparently there is no keep alive message being sent
>> back
>>>> to the scanner while the region server is busy, so I had to increase the
>>>> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
>>>> call will timeout waiting for the regionserver to send something back.
>>>> 
>>>> The result is that this HBaseClient.call() hang is made much worse,
>>>> because it won't time out for 60 minutes.
>>>> 
>>>> I have a couple of questions:
>>>> 
>>>> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I
>> noticed
>>>> that call.wait() is not using any timeout so it will wait indefinitely
>>>> until interrupted externally
>>>> 
>>>> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
>>>> very large number? My only thought would be to forego using filters and
>> do
>>>> the filtering client side, which seems pretty inefficient
>>>> 
>>>> Here is a stack dump of the thread that was hung:
>>>> 
>>>> Thread 10609: (state = BLOCKED)
>>>> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>>>> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>>>> -
>>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
>>>> java.net.InetSocketAddress, java.lang.Class,
>>>> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904
>> (Interpreted
>>>> frame)
>>>> -
>>>> 
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
>>>> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
>>>> (Interpreted frame)
>>>> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>>>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
>>>> (Interpreted frame)
>>>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
>>>> (Interpreted frame)
>>>> -
>>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
>>>> @bci=36, line=1325 (Interpreted frame)
>>>> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
>>>> line=1299 (Compiled frame)
>>>> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
>>>> @bci=41, line=150 (Interpreted frame)
>>>> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
>>>> @bci=4, line=142 (Interpreted frame)
>>>> -
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
>>>> @bci=4, line=458 (Interpreted frame)
>>>> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
>>>> line=76 (Interpreted frame)
>>>> -
>>>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
>>>> @bci=4, line=85 (Interpreted frame)
>>>> -
>>>> 
>> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
>>>> @bci=6, line=139 (Interpreted frame)
>>>> -
>>>> 
>> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
>>>> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
>>>> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
>>>> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645
>> (Interpreted
>>>> frame)
>>>> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
>>>> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
>>>> (Interpreted frame)
>>>> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
>>>> frame)
>>>> -
>>>> 
>> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>>>> java.security.AccessControlContext) @bci=0 (Interpreted frame)
>>>> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
>>>> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
>>>> frame)
>>>> -
>>>> 
>> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>>>> @bci=14, line=1332 (Interpreted frame)
>>>> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
>>>> line=262 (Interpreted frame)
>>>> 
>>>> 
>> 
>>

Re: HBaseClient.call() hang

Posted by Azuryy Yu <az...@gmail.com>.

Don't increase RS timeout to avoid this issue. what size of your block
size? and can you paste your JVM options here?

I also met a long GC problem, but I tuned jvm options, it works very well
now.


On Tue, Dec 18, 2012 at 1:18 AM, Bryan Keller <br...@gmail.com> wrote:

> It seems there was a cascading effect. The regionservers were busy with
> scanning a table, which resulted in some long GC's. The GC's were long
> enough to trigger the Zookeeper timeout on at least one regionserver, which
> resulted in the regionserver shutting itself down. This then caused the
> Object.wait() call which got stuck, and only exited after the very long RPC
> timeout.
>
> I have done a fair amount of work optimizing the GCs, and I increased the
> regionserver timeouts, which should help with the regionserver shutdowns.
> But if a regionserver does shut down for some other reason, this will still
> result in the Object.wait() hang.
>
> One approach might be to have the regionservers send back a keep-alive, or
> progress, message during a scan, and that message would reset the RPC
> timer. The regionserver could do this every x number of rows processed
> server-side. Then the RPC timeout could be something more sensible rather
> than being set to the longest time it takes to scan a region.
>
> HBASE-5416 looks useful, it will make scans faster, but the problem I'm
> encountering will still be present, but perhaps I could set the RPC timeout
> a bit lower. HBASE-6313 might fix the hang, in which case I could live with
> the longer RPC timeout setting.
>
>
> On Dec 14, 2012, at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > Bryan:
> >
> > bq. My only thought would be to forego using filters
> > Please keep using filters.
> >
> > I and Sergey are working on HBASE-5416: Improve performance of scans with
> > some kind of filters
> > This feature allows you to specify one column family as being essential.
> > The other column family is only returned to client when essential column
> > family matches. I wonder if this may be of help to you.
> >
> > You mentioned regionserver going down or being busy. I assume it was not
> > often that regionserver(s) went down. For busy region server, did you try
> > jstack'ing regionserver process ?
> >
> > Thanks
> >
> > On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
> >
> >> I have encountered a problem with HBaseClient.call() hanging. This
> occurs
> >> when one of my regionservers goes down while performing a table scan.
> >>
> >> What exacerbates this problem is that the scan I am performing uses
> >> filters, and the region size of the table is large (4gb). Because of
> this,
> >> it can take several minutes for a row to be returned when calling
> >> scanner.next(). Apparently there is no keep alive message being sent
> back
> >> to the scanner while the region server is busy, so I had to increase the
> >> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
> >> call will timeout waiting for the regionserver to send something back.
> >>
> >> The result is that this HBaseClient.call() hang is made much worse,
> >> because it won't time out for 60 minutes.
> >>
> >> I have a couple of questions:
> >>
> >> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I
> noticed
> >> that call.wait() is not using any timeout so it will wait indefinitely
> >> until interrupted externally
> >>
> >> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
> >> very large number? My only thought would be to forego using filters and
> do
> >> the filtering client side, which seems pretty inefficient
> >>
> >> Here is a stack dump of the thread that was hung:
> >>
> >> Thread 10609: (state = BLOCKED)
> >> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
> >> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
> >> java.net.InetSocketAddress, java.lang.Class,
> >> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904
> (Interpreted
> >> frame)
> >> -
> >>
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
> >> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
> >> (Interpreted frame)
> >> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
> >> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
> >> (Interpreted frame)
> >> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
> >> (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
> >> @bci=36, line=1325 (Interpreted frame)
> >> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
> >> line=1299 (Compiled frame)
> >> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
> >> @bci=41, line=150 (Interpreted frame)
> >> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
> >> @bci=4, line=142 (Interpreted frame)
> >> -
> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
> >> @bci=4, line=458 (Interpreted frame)
> >> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
> >> line=76 (Interpreted frame)
> >> -
> >> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
> >> @bci=4, line=85 (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
> >> @bci=6, line=139 (Interpreted frame)
> >> -
> >>
> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
> >> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
> >> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
> >> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645
> (Interpreted
> >> frame)
> >> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
> >> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
> >> (Interpreted frame)
> >> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
> >> frame)
> >> -
> >>
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
> >> java.security.AccessControlContext) @bci=0 (Interpreted frame)
> >> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
> >> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
> >> frame)
> >> -
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
> >> @bci=14, line=1332 (Interpreted frame)
> >> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
> >> line=262 (Interpreted frame)
> >>
> >>
>
>

Re: HBaseClient.call() hang

Posted by Bryan Keller <br...@gmail.com>.

It seems there was a cascading effect. The regionservers were busy with scanning a table, which resulted in some long GC's. The GC's were long enough to trigger the Zookeeper timeout on at least one regionserver, which resulted in the regionserver shutting itself down. This then caused the Object.wait() call which got stuck, and only exited after the very long RPC timeout.

I have done a fair amount of work optimizing the GCs, and I increased the regionserver timeouts, which should help with the regionserver shutdowns. But if a regionserver does shut down for some other reason, this will still result in the Object.wait() hang.

One approach might be to have the regionservers send back a keep-alive, or progress, message during a scan, and that message would reset the RPC timer. The regionserver could do this every x number of rows processed server-side. Then the RPC timeout could be something more sensible rather than being set to the longest time it takes to scan a region.

HBASE-5416 looks useful, it will make scans faster, but the problem I'm encountering will still be present, but perhaps I could set the RPC timeout a bit lower. HBASE-6313 might fix the hang, in which case I could live with the longer RPC timeout setting.


On Dec 14, 2012, at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:

> Bryan:
> 
> bq. My only thought would be to forego using filters
> Please keep using filters.
> 
> I and Sergey are working on HBASE-5416: Improve performance of scans with
> some kind of filters
> This feature allows you to specify one column family as being essential.
> The other column family is only returned to client when essential column
> family matches. I wonder if this may be of help to you.
> 
> You mentioned regionserver going down or being busy. I assume it was not
> often that regionserver(s) went down. For busy region server, did you try
> jstack'ing regionserver process ?
> 
> Thanks
> 
> On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
> 
>> I have encountered a problem with HBaseClient.call() hanging. This occurs
>> when one of my regionservers goes down while performing a table scan.
>> 
>> What exacerbates this problem is that the scan I am performing uses
>> filters, and the region size of the table is large (4gb). Because of this,
>> it can take several minutes for a row to be returned when calling
>> scanner.next(). Apparently there is no keep alive message being sent back
>> to the scanner while the region server is busy, so I had to increase the
>> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
>> call will timeout waiting for the regionserver to send something back.
>> 
>> The result is that this HBaseClient.call() hang is made much worse,
>> because it won't time out for 60 minutes.
>> 
>> I have a couple of questions:
>> 
>> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed
>> that call.wait() is not using any timeout so it will wait indefinitely
>> until interrupted externally
>> 
>> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
>> very large number? My only thought would be to forego using filters and do
>> the filtering client side, which seems pretty inefficient
>> 
>> Here is a stack dump of the thread that was hung:
>> 
>> Thread 10609: (state = BLOCKED)
>> - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>> - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>> -
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
>> java.net.InetSocketAddress, java.lang.Class,
>> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted
>> frame)
>> -
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
>> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
>> (Interpreted frame)
>> - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
>> (Interpreted frame)
>> - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
>> (Interpreted frame)
>> -
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
>> @bci=36, line=1325 (Interpreted frame)
>> - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
>> line=1299 (Compiled frame)
>> - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
>> @bci=41, line=150 (Interpreted frame)
>> - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
>> @bci=4, line=142 (Interpreted frame)
>> - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
>> @bci=4, line=458 (Interpreted frame)
>> - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
>> line=76 (Interpreted frame)
>> -
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
>> @bci=4, line=85 (Interpreted frame)
>> -
>> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
>> @bci=6, line=139 (Interpreted frame)
>> -
>> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
>> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
>> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
>> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted
>> frame)
>> - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
>> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
>> (Interpreted frame)
>> - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
>> frame)
>> -
>> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>> java.security.AccessControlContext) @bci=0 (Interpreted frame)
>> - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
>> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
>> frame)
>> -
>> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>> @bci=14, line=1332 (Interpreted frame)
>> - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
>> line=262 (Interpreted frame)
>> 
>>

RE: HBaseClient.call() hang

Posted by Bijieshan <bi...@huawei.com>.

Bryan:

>> I have encountered a problem with HBaseClient.call() hanging. This occurs
>> when one of my regionservers goes down while performing a table scan.

Have you checked the issue of HBASE-6313?

Jieshan

-----Original Message-----
From: Ted Yu [mailto:yuzhihong@gmail.com] 
Sent: Saturday, December 15, 2012 2:00 PM
To: user@hbase.apache.org
Subject: Re: HBaseClient.call() hang

I should have mentioned that original patches for HBASE-5416 were
contributed by Max Lapan.

On Fri, Dec 14, 2012 at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:

> Bryan:
>
> bq. My only thought would be to forego using filters
> Please keep using filters.
>
> I and Sergey are working on HBASE-5416: Improve performance of scans with
> some kind of filters
> This feature allows you to specify one column family as being essential.
> The other column family is only returned to client when essential column
> family matches. I wonder if this may be of help to you.
>
> You mentioned regionserver going down or being busy. I assume it was not
> often that regionserver(s) went down. For busy region server, did you try
> jstack'ing regionserver process ?
>
> Thanks
>
>
> On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
>
>> I have encountered a problem with HBaseClient.call() hanging. This occurs
>> when one of my regionservers goes down while performing a table scan.
>>
>> What exacerbates this problem is that the scan I am performing uses
>> filters, and the region size of the table is large (4gb). Because of this,
>> it can take several minutes for a row to be returned when calling
>> scanner.next(). Apparently there is no keep alive message being sent back
>> to the scanner while the region server is busy, so I had to increase the
>> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
>> call will timeout waiting for the regionserver to send something back.
>>
>> The result is that this HBaseClient.call() hang is made much worse,
>> because it won't time out for 60 minutes.
>>
>> I have a couple of questions:
>>
>> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed
>> that call.wait() is not using any timeout so it will wait indefinitely
>> until interrupted externally
>>
>> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
>> very large number? My only thought would be to forego using filters and do
>> the filtering client side, which seems pretty inefficient
>>
>> Here is a stack dump of the thread that was hung:
>>
>> Thread 10609: (state = BLOCKED)
>>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>>  - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>>  -
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
>> java.net.InetSocketAddress, java.lang.Class,
>> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted
>> frame)
>>  -
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
>> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
>> (Interpreted frame)
>>  - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
>> (Interpreted frame)
>>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
>> (Interpreted frame)
>>  -
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
>> @bci=36, line=1325 (Interpreted frame)
>>  - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
>> line=1299 (Compiled frame)
>>  - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
>> @bci=41, line=150 (Interpreted frame)
>>  - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
>> @bci=4, line=142 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
>> @bci=4, line=458 (Interpreted frame)
>>  - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
>> line=76 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
>> @bci=4, line=85 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
>> @bci=6, line=139 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
>> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
>> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
>> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted
>> frame)
>>  - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
>> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
>> (Interpreted frame)
>>  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
>> frame)
>>  -
>> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>> java.security.AccessControlContext) @bci=0 (Interpreted frame)
>>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
>> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
>> frame)
>>  -
>> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>> @bci=14, line=1332 (Interpreted frame)
>>  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
>> line=262 (Interpreted frame)
>>
>>
>

Re: HBaseClient.call() hang

Posted by Ted Yu <yu...@gmail.com>.

I should have mentioned that original patches for HBASE-5416 were
contributed by Max Lapan.

On Fri, Dec 14, 2012 at 9:49 PM, Ted Yu <yu...@gmail.com> wrote:

> Bryan:
>
> bq. My only thought would be to forego using filters
> Please keep using filters.
>
> I and Sergey are working on HBASE-5416: Improve performance of scans with
> some kind of filters
> This feature allows you to specify one column family as being essential.
> The other column family is only returned to client when essential column
> family matches. I wonder if this may be of help to you.
>
> You mentioned regionserver going down or being busy. I assume it was not
> often that regionserver(s) went down. For busy region server, did you try
> jstack'ing regionserver process ?
>
> Thanks
>
>
> On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:
>
>> I have encountered a problem with HBaseClient.call() hanging. This occurs
>> when one of my regionservers goes down while performing a table scan.
>>
>> What exacerbates this problem is that the scan I am performing uses
>> filters, and the region size of the table is large (4gb). Because of this,
>> it can take several minutes for a row to be returned when calling
>> scanner.next(). Apparently there is no keep alive message being sent back
>> to the scanner while the region server is busy, so I had to increase the
>> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
>> call will timeout waiting for the regionserver to send something back.
>>
>> The result is that this HBaseClient.call() hang is made much worse,
>> because it won't time out for 60 minutes.
>>
>> I have a couple of questions:
>>
>> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed
>> that call.wait() is not using any timeout so it will wait indefinitely
>> until interrupted externally
>>
>> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
>> very large number? My only thought would be to forego using filters and do
>> the filtering client side, which seems pretty inefficient
>>
>> Here is a stack dump of the thread that was hung:
>>
>> Thread 10609: (state = BLOCKED)
>>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>>  - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>>  -
>> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
>> java.net.InetSocketAddress, java.lang.Class,
>> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted
>> frame)
>>  -
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
>> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
>> (Interpreted frame)
>>  - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
>> (Interpreted frame)
>>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
>> (Interpreted frame)
>>  -
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
>> @bci=36, line=1325 (Interpreted frame)
>>  - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
>> line=1299 (Compiled frame)
>>  - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
>> @bci=41, line=150 (Interpreted frame)
>>  - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
>> @bci=4, line=142 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
>> @bci=4, line=458 (Interpreted frame)
>>  - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
>> line=76 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
>> @bci=4, line=85 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
>> @bci=6, line=139 (Interpreted frame)
>>  -
>> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
>> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
>> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
>> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted
>> frame)
>>  - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
>> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
>> (Interpreted frame)
>>  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
>> frame)
>>  -
>> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
>> java.security.AccessControlContext) @bci=0 (Interpreted frame)
>>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
>> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
>> frame)
>>  -
>> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
>> @bci=14, line=1332 (Interpreted frame)
>>  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
>> line=262 (Interpreted frame)
>>
>>
>

Re: HBaseClient.call() hang

Posted by Ted Yu <yu...@gmail.com>.

Bryan:

bq. My only thought would be to forego using filters
Please keep using filters.

I and Sergey are working on HBASE-5416: Improve performance of scans with
some kind of filters
This feature allows you to specify one column family as being essential.
The other column family is only returned to client when essential column
family matches. I wonder if this may be of help to you.

You mentioned regionserver going down or being busy. I assume it was not
often that regionserver(s) went down. For busy region server, did you try
jstack'ing regionserver process ?

Thanks

On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller <br...@gmail.com> wrote:

> I have encountered a problem with HBaseClient.call() hanging. This occurs
> when one of my regionservers goes down while performing a table scan.
>
> What exacerbates this problem is that the scan I am performing uses
> filters, and the region size of the table is large (4gb). Because of this,
> it can take several minutes for a row to be returned when calling
> scanner.next(). Apparently there is no keep alive message being sent back
> to the scanner while the region server is busy, so I had to increase the
> hbase.rpc.timeout value to a large number (60 min), otherwise the next()
> call will timeout waiting for the regionserver to send something back.
>
> The result is that this HBaseClient.call() hang is made much worse,
> because it won't time out for 60 minutes.
>
> I have a couple of questions:
>
> 1. Any thoughts on why the HBaseClient.call() is getting stuck? I noticed
> that call.wait() is not using any timeout so it will wait indefinitely
> until interrupted externally
>
> 2. Is there a solution where I do not need to set hbase.rpc.timeout to a
> very large number? My only thought would be to forego using filters and do
> the filtering client side, which seems pretty inefficient
>
> Here is a stack dump of the thread that was hung:
>
> Thread 10609: (state = BLOCKED)
>  - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
>  - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
>  -
> org.apache.hadoop.hbase.ipc.HBaseClient.call(org.apache.hadoop.io.Writable,
> java.net.InetSocketAddress, java.lang.Class,
> org.apache.hadoop.hbase.security.User, int) @bci=51, line=904 (Interpreted
> frame)
>  -
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(java.lang.Object,
> java.lang.reflect.Method, java.lang.Object[]) @bci=52, line=150
> (Interpreted frame)
>  - $Proxy12.next(long, int) @bci=26 (Interpreted frame)
>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=72, line=92
> (Interpreted frame)
>  - org.apache.hadoop.hbase.client.ScannerCallable.call() @bci=1, line=42
> (Interpreted frame)
>  -
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(org.apache.hadoop.hbase.client.ServerCallable)
> @bci=36, line=1325 (Interpreted frame)
>  - org.apache.hadoop.hbase.client.HTable$ClientScanner.next() @bci=117,
> line=1299 (Compiled frame)
>  - org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.nextKeyValue()
> @bci=41, line=150 (Interpreted frame)
>  - org.apache.hadoop.hbase.mapreduce.TableRecordReader.nextKeyValue()
> @bci=4, line=142 (Interpreted frame)
>  - org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue()
> @bci=4, line=458 (Interpreted frame)
>  - org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue() @bci=4,
> line=76 (Interpreted frame)
>  -
> org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue()
> @bci=4, line=85 (Interpreted frame)
>  -
> org.apache.hadoop.mapreduce.Mapper.run(org.apache.hadoop.mapreduce.Mapper$Context)
> @bci=6, line=139 (Interpreted frame)
>  -
> org.apache.hadoop.mapred.MapTask.runNewMapper(org.apache.hadoop.mapred.JobConf,
> org.apache.hadoop.mapreduce.split.JobSplit$TaskSplitIndex,
> org.apache.hadoop.mapred.TaskUmbilicalProtocol,
> org.apache.hadoop.mapred.Task$TaskReporter) @bci=201, line=645 (Interpreted
> frame)
>  - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
> org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=100, line=325
> (Interpreted frame)
>  - org.apache.hadoop.mapred.Child$4.run() @bci=29, line=268 (Interpreted
> frame)
>  -
> java.security.AccessController.doPrivileged(java.security.PrivilegedExceptionAction,
> java.security.AccessControlContext) @bci=0 (Interpreted frame)
>  - javax.security.auth.Subject.doAs(javax.security.auth.Subject,
> java.security.PrivilegedExceptionAction) @bci=42, line=396 (Interpreted
> frame)
>  -
> org.apache.hadoop.security.UserGroupInformation.doAs(java.security.PrivilegedExceptionAction)
> @bci=14, line=1332 (Interpreted frame)
>  - org.apache.hadoop.mapred.Child.main(java.lang.String[]) @bci=776,
> line=262 (Interpreted frame)
>
>