You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Ted Tuttle <te...@mentacapital.com> on 2014/12/03 16:13:48 UTC

client timeout

Hello-

We are seeing recurring timeouts in communications with one our RSs.  The error we see in our logs is:

Caused by: java.net.SocketTimeoutException: Call to <rs host>./<rs ip>:<port> failed on socket timeout exceptio\
n: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.S\
ocketChannel[connected local=/<client ip>:<port> remote=<rs host>./<rs ip>:<port>]
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
        at com.sun.proxy.$Proxy9.multi(Unknown Source)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535)
        at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

Any ideas on what could be wrong w/ this RS?  The RS is not unusually busy.

Thanks,
Ted


Re: client timeout

Posted by Nicolas Liochon <nk...@gmail.com>.
fwiw "CallerDisconnectedException: Aborting call multi" means that:
- the query was under execution on the server
- the client reached its timeout and disconnected
- the server saw that and stopped the execution of the query.

So it's the consequence of a slow execution, not the cause.
It would worth checking how much time these queries usually take, plus all
the usual cause for slowness (disks issues? data locality?. May be the
multi are too big? A lot of hfiles to read? caches misses? and so on).




On Fri, Dec 5, 2014 at 1:05 AM, Ted Tuttle <te...@mentacapital.com> wrote:

> Sort of :(
>
> The server-side errors below started showing up on node5.  We took node5
> down and then they started on node7.  Then we brought back node5  and were
> still seeing the errors on node7.  So we bounced node7.  The errors went
> back to node5.  We bounced node5 a second time and that cleared the problem.
>
> We are back in working order now.
>
> And working hard on upgrading to v0.98
>
> From: lars hofhansl [mailto:larsh@apache.org]
> Sent: Thursday, December 04, 2014 3:42 PM
> To: user@hbase.apache.org
> Cc: Development
> Subject: Re: client timeout
>
> Only on that one region server? Weird. Does this persist when you bounce
> it?
>
>
>
> ________________________________
> From: Ted Tuttle <te...@mentacapital.com>>
> To: lars hofhansl <la...@apache.org>>; "
> user@hbase.apache.org<ma...@hbase.apache.org>" <
> user@hbase.apache.org<ma...@hbase.apache.org>>
> Cc: Development <Development@mentacapital.com<mailto:
> Development@mentacapital.com>>
> Sent: Wednesday, December 3, 2014 1:21 PM
> Subject: RE: client timeout
>
> Still on v0.94.16
>
> We are seeing loads of these:
>
> 2014-12-03 12:28:32,696 ERROR
> org.apache.hadoop.hbase.regionserver.HRegionServer:
> org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call
> multi(org.apache.hadoop.hbase.client.MultiAction@55428f05<mailto:
> org.apache.hadoop.hbase.client.MultiAction@55428f05><mailto:
> org.apache.hadoop.hbase.client.MultiAction@55428f05<mailto:
> org.apache.hadoop.hbase.client.MultiAction@55428f05>>), rpc version=1,
> client version=29, methodsFingerPrint=-540141542 from <ip>:<port>after
> 131914 ms, since caller disconnected
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3944)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3854)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3835)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3878)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4804)
>         at
> org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4777)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2194)
>         at
> org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3754)
>         at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
>         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:606)
>         at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
>         at
> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)
>
>
> From: lars hofhansl [mailto:larsh@apache.org<ma...@apache.org>]
> Sent: Wednesday, December 03, 2014 11:31 AM
> To: user@hbase.apache.org<ma...@hbase.apache.org>
> Cc: Development
> Subject: Re: client timeout
>
> Bad disk or network?
>
> Anything in the logs (HBase, HDFS, and System logs)?
>
> HBase 0.94, still?
> The easiest way to just kill the region servers, the others will pick up
> the regions.
>
> -- Lars
>
> ________________________________
> From: Ted Tuttle <ted@mentacapital.com<mailto:ted@mentacapital.com
> ><ma...@mentacapital.com>>>
> To: "user@hbase.apache.org<ma...@hbase.apache.org><mailto:
> user@hbase.apache.org<ma...@hbase.apache.org>>" <
> user@hbase.apache.org<ma...@hbase.apache.org><mailto:
> user@hbase.apache.org<ma...@hbase.apache.org>>>
> Cc: Development <Development@mentacapital.com<mailto:
> Development@mentacapital.com><mailto:Development@mentacapital.com<mailto:
> Development@mentacapital.com>>>
>
>
> Sent: Wednesday, December 3, 2014 7:13 AM
> Subject: client timeout
>
> Hello-
>
> We are seeing recurring timeouts in communications with one our RSs.  The
> error we see in our logs is:
>
> Caused by: java.net.SocketTimeoutException: Call to <rs host>./<rs
> ip>:<port> failed on socket timeout exceptio\
> n: java.net.SocketTimeoutException: 120000 millis timeout while waiting
> for channel to be ready for read. ch : java.nio.channels.S\
> ocketChannel[connected local=/<client ip>:<port> remote=<rs host>./<rs
> ip>:<port>]
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016)
>         at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
>         at com.sun.proxy.$Proxy9.multi(Unknown Source)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535)
>         at
> org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532)
>         at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:724)
>
> Any ideas on what could be wrong w/ this RS?  The RS is not unusually busy.
>
> Thanks,
> Ted
>
>

RE: client timeout

Posted by Ted Tuttle <te...@mentacapital.com>.
Sort of :(

The server-side errors below started showing up on node5.  We took node5 down and then they started on node7.  Then we brought back node5  and were still seeing the errors on node7.  So we bounced node7.  The errors went back to node5.  We bounced node5 a second time and that cleared the problem.

We are back in working order now.

And working hard on upgrading to v0.98

From: lars hofhansl [mailto:larsh@apache.org]
Sent: Thursday, December 04, 2014 3:42 PM
To: user@hbase.apache.org
Cc: Development
Subject: Re: client timeout

Only on that one region server? Weird. Does this persist when you bounce it?



________________________________
From: Ted Tuttle <te...@mentacapital.com>>
To: lars hofhansl <la...@apache.org>>; "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Cc: Development <De...@mentacapital.com>>
Sent: Wednesday, December 3, 2014 1:21 PM
Subject: RE: client timeout

Still on v0.94.16

We are seeing loads of these:

2014-12-03 12:28:32,696 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@55428f05<ma...@55428f05>>), rpc version=1, client version=29, methodsFingerPrint=-540141542 from <ip>:<port>after 131914 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3944)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3854)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3835)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3878)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4804)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4777)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2194)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3754)
        at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)


From: lars hofhansl [mailto:larsh@apache.org<ma...@apache.org>]
Sent: Wednesday, December 03, 2014 11:31 AM
To: user@hbase.apache.org<ma...@hbase.apache.org>
Cc: Development
Subject: Re: client timeout

Bad disk or network?

Anything in the logs (HBase, HDFS, and System logs)?

HBase 0.94, still?
The easiest way to just kill the region servers, the others will pick up the regions.

-- Lars

________________________________
From: Ted Tuttle <te...@mentacapital.com>>>
To: "user@hbase.apache.org<ma...@hbase.apache.org>>" <us...@hbase.apache.org>>>
Cc: Development <De...@mentacapital.com>>>


Sent: Wednesday, December 3, 2014 7:13 AM
Subject: client timeout

Hello-

We are seeing recurring timeouts in communications with one our RSs.  The error we see in our logs is:

Caused by: java.net.SocketTimeoutException: Call to <rs host>./<rs ip>:<port> failed on socket timeout exceptio\
n: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.S\
ocketChannel[connected local=/<client ip>:<port> remote=<rs host>./<rs ip>:<port>]
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
        at com.sun.proxy.$Proxy9.multi(Unknown Source)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535)
        at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

Any ideas on what could be wrong w/ this RS?  The RS is not unusually busy.

Thanks,
Ted


Re: client timeout

Posted by lars hofhansl <la...@apache.org>.
Only on that one region server? Weird. Does this persist when you bounce it?

      From: Ted Tuttle <te...@mentacapital.com>
 To: lars hofhansl <la...@apache.org>; "user@hbase.apache.org" <us...@hbase.apache.org> 
Cc: Development <De...@mentacapital.com> 
 Sent: Wednesday, December 3, 2014 1:21 PM
 Subject: RE: client timeout
   
Still on v0.94.16

We are seeing loads of these:

2014-12-03 12:28:32,696 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@55428f05<ma...@55428f05>), rpc version=1, client version=29, methodsFingerPrint=-540141542 from <ip>:<port>after 131914 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3944)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3854)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3835)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3878)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4804)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4777)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2194)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3754)
        at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)


From: lars hofhansl [mailto:larsh@apache.org]
Sent: Wednesday, December 03, 2014 11:31 AM
To: user@hbase.apache.org
Cc: Development
Subject: Re: client timeout

Bad disk or network?

Anything in the logs (HBase, HDFS, and System logs)?

HBase 0.94, still?
The easiest way to just kill the region servers, the others will pick up the regions.

-- Lars

________________________________
From: Ted Tuttle <te...@mentacapital.com>>
To: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Cc: Development <De...@mentacapital.com>>


Sent: Wednesday, December 3, 2014 7:13 AM
Subject: client timeout

Hello-

We are seeing recurring timeouts in communications with one our RSs.  The error we see in our logs is:

Caused by: java.net.SocketTimeoutException: Call to <rs host>./<rs ip>:<port> failed on socket timeout exceptio\
n: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.S\
ocketChannel[connected local=/<client ip>:<port> remote=<rs host>./<rs ip>:<port>]
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
        at com.sun.proxy.$Proxy9.multi(Unknown Source)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535)
        at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

Any ideas on what could be wrong w/ this RS?  The RS is not unusually busy.

Thanks,
Ted



  

RE: client timeout

Posted by Ted Tuttle <te...@mentacapital.com>.
Still on v0.94.16

We are seeing loads of these:

2014-12-03 12:28:32,696 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
org.apache.hadoop.hbase.ipc.CallerDisconnectedException: Aborting call multi(org.apache.hadoop.hbase.client.MultiAction@55428f05<ma...@55428f05>), rpc version=1, client version=29, methodsFingerPrint=-540141542 from <ip>:<port>after 131914 ms, since caller disconnected
        at org.apache.hadoop.hbase.ipc.HBaseServer$Call.throwExceptionIfCallerDisconnected(HBaseServer.java:436)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:3944)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:3854)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3835)
        at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.next(HRegion.java:3878)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4804)
        at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:4777)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:2194)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.multi(HRegionServer.java:3754)
        at sun.reflect.GeneratedMethodAccessor36.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:320)
        at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1426)


From: lars hofhansl [mailto:larsh@apache.org]
Sent: Wednesday, December 03, 2014 11:31 AM
To: user@hbase.apache.org
Cc: Development
Subject: Re: client timeout

Bad disk or network?

Anything in the logs (HBase, HDFS, and System logs)?

HBase 0.94, still?
The easiest way to just kill the region servers, the others will pick up the regions.

-- Lars

________________________________
From: Ted Tuttle <te...@mentacapital.com>>
To: "user@hbase.apache.org<ma...@hbase.apache.org>" <us...@hbase.apache.org>>
Cc: Development <De...@mentacapital.com>>
Sent: Wednesday, December 3, 2014 7:13 AM
Subject: client timeout

Hello-

We are seeing recurring timeouts in communications with one our RSs.  The error we see in our logs is:

Caused by: java.net.SocketTimeoutException: Call to <rs host>./<rs ip>:<port> failed on socket timeout exceptio\
n: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.S\
ocketChannel[connected local=/<client ip>:<port> remote=<rs host>./<rs ip>:<port>]
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
        at com.sun.proxy.$Proxy9.multi(Unknown Source)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535)
        at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

Any ideas on what could be wrong w/ this RS?  The RS is not unusually busy.

Thanks,
Ted


Re: client timeout

Posted by lars hofhansl <la...@apache.org>.
Bad disk or network?
Anything in the logs (HBase, HDFS, and System logs)?
HBase 0.94, still?The easiest way to just kill the region servers, the others will pick up the regions.

-- Lars
      From: Ted Tuttle <te...@mentacapital.com>
 To: "user@hbase.apache.org" <us...@hbase.apache.org> 
Cc: Development <De...@mentacapital.com> 
 Sent: Wednesday, December 3, 2014 7:13 AM
 Subject: client timeout
   
Hello-

We are seeing recurring timeouts in communications with one our RSs.  The error we see in our logs is:

Caused by: java.net.SocketTimeoutException: Call to <rs host>./<rs ip>:<port> failed on socket timeout exceptio\
n: java.net.SocketTimeoutException: 120000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.S\
ocketChannel[connected local=/<client ip>:<port> remote=<rs host>./<rs ip>:<port>]
        at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1043)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:1016)
        at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:87)
        at com.sun.proxy.$Proxy9.multi(Unknown Source)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1537)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1535)
        at org.apache.hadoop.hbase.client.ServerCallable.withoutRetries(ServerCallable.java:229)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1544)
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1532)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:724)

Any ideas on what could be wrong w/ this RS?  The RS is not unusually busy.

Thanks,
Ted