You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Tom Brown <to...@gmail.com> on 2014/06/18 19:19:59 UTC

Master timeout during list tables

Hello all,

I'm trying to view the master status of a 6 node (0.94.10; hadoop 1.1.2)
cluster but I keep getting a timeout exception.

The rest of the cluster is operating quite normally. From the exception, it
seems like the "list tables" function (required to display the web UI) is
timing out for some reason.

>From the shell, I'm able to scan the entire .META. table, so the table
information is conceivably available. I don't understand the rest of the
architecture well enough to know what might be causing this timeout during
"list".

Any suggestions?

java.net.SocketTimeoutException: Call to
hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on socket
timeout exception: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
	at org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
	at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
	at $Proxy11.getHTableDescriptors(Unknown Source)
	at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
	at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
	at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
	at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
	at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
	at org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
	at org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
	at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
	at org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
	at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
	at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
	at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
	at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
	at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
	at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
	at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
	at org.mortbay.jetty.Server.handle(Server.java:326)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
	at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
	at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
	at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
	at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
	at java.io.FilterInputStream.read(Unknown Source)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
	at java.io.BufferedInputStream.fill(Unknown Source)
	at java.io.BufferedInputStream.read(Unknown Source)
	at java.io.DataInputStream.readInt(Unknown Source)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
	at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)



The master log file is unhelpful. Almost all of it is notices about
skipping load balancing, but at least the exception appears in the
log:


2014-06-18 16:18:50,359 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
because balanced cluster; servers=6 regions=20 average=3.3333333 mos
tloaded=4 leastloaded=3
2014-06-18 16:18:50,359 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
because balanced cluster; servers=6 regions=1 average=0.16666667 mos
tloaded=1 leastloaded=0
2014-06-18 16:18:50,360 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
because balanced cluster; servers=6 regions=17 average=2.8333333 mos
tloaded=3 leastloaded=2
2014-06-18 16:18:50,360 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
because balanced cluster; servers=6 regions=12 average=2.0 mostloade
d=2 leastloaded=2
2014-06-18 16:18:50,360 INFO
org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
because balanced cluster; servers=6 regions=68 average=11.333333 mos
tloaded=12 leastloaded=11
2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
java.net.SocketTimeoutException: Call to
hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
ocket timeout exception: java.net.SocketTimeoutException: 60000 millis
timeout while waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[co
nnected local=/10.100.101.221:36674
remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]




--Tom

Re: Master timeout during list tables

Posted by Tom Brown <to...@gmail.com>.

We have determined that this issue is being caused by our HDFS (running on
an Isilon cluster). For some reason, its HDFS daemon seems to get
overwhelmed and just locks up. When we tell it to use more threads to
process HDFS connections, everything unblocks. Aside from the fact that
this behavior is apparently buggy on their part, I would like to know a
little more about how HBase interacts with the HDFS cluster so we can make
sure we're appropriately configuring out stuff.

How many connections will a single regionserver make to HDFS? Is it one
shared connection for the whole RS, one connection for each region served,
multiple connections per region, or some other formula?

--Tom


On Wed, Jun 25, 2014 at 1:48 PM, Ted Yu <yu...@gmail.com> wrote:

> Can you look at the tail of master log to see which WAL takes long time to
> split ?
>
> Checking Namenode log if needed.
>
> Cheers
>
> On Wed, Jun 25, 2014 at 12:09 PM, Tom Brown <to...@gmail.com> wrote:
>
> > Yes, that stack is still there:
> >
> >    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at
> >
> >
> org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
> > - locked <0x00000000bfa0a068> (a
> > org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
> > at
> >
> >
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
> > at
> >
> >
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
> > at
> >
> >
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
> >
> > What does that mean? That HDFS is behaving badly, or something else
> > entirely?
> >
> > --Tom
> >
> >
> >
> > On Wed, Jun 25, 2014 at 11:45 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > Looks like master was stuck in FileSystem.listStatus() call.
> > > I noticed the following - did this show up if you take jstack one more
> > time
> > > ?
> > >
> > >    1.         at
> > >
> > >
> >
>  org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
> > >    2.         - locked <0x00000000bfa0a068> (a
> > >    org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
> > >    3.         at
> > >
> > >
> >
>  org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
> > >    4.         at
> > >
> > >
> >
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
> > >    5.         at
> > >
> > >
> >
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
> > >    6.         at
> > >
> > >
> >
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282)
> > >    7.         at
> > >
> > >
> >
>  org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127)
> > >
> > >
> > >
> > > On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown <to...@gmail.com>
> > wrote:
> > >
> > > > Before I was able to acquire a stack trace, I restarted the master.
> > > > However, the issue has just happened again and I was able to get a
> > stack
> > > > trace:
> > > >
> > > > http://pastebin.com/Mz5c6AML
> > > >
> > > > (The pastebin is set to never expire, so anyone viewing an archived
> > > version
> > > > of this message should still be able to see the stack)
> > > >
> > > > The version of hbase is 0.94.10.
> > > >
> > > > Thanks!
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <ti...@gmail.com>
> wrote:
> > > >
> > > > > Hi Tom,
> > > > > Can you collect your master jvm stacktrace when problem happens and
> > put
> > > > it
> > > > > to pastbin?
> > > > > what is your hbase version?
> > > > >
> > > > >
> > > > > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <to...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Could this happen if the master is running too many RPC tasks and
> > > can't
> > > > > > keep up? What about if there's too many connections to the
> server?
> > > > > >
> > > > > > --Tom
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <
> tombrown52@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > That server is the master and is not a regionserver.
> > > > > > >
> > > > > > > --Tom
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > >> Have you checked region server log on 10.100.101.221
> > > > > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000>
> ?
> > > > > > >>
> > > > > > >> Cheers
> > > > > > >>
> > > > > > >>
> > > > > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <
> > tombrown52@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > Hello all,
> > > > > > >> >
> > > > > > >> > I'm trying to view the master status of a 6 node (0.94.10;
> > > hadoop
> > > > > > 1.1.2)
> > > > > > >> > cluster but I keep getting a timeout exception.
> > > > > > >> >
> > > > > > >> > The rest of the cluster is operating quite normally. From
> the
> > > > > > >> exception, it
> > > > > > >> > seems like the "list tables" function (required to display
> the
> > > web
> > > > > UI)
> > > > > > >> is
> > > > > > >> > timing out for some reason.
> > > > > > >> >
> > > > > > >> > From the shell, I'm able to scan the entire .META. table, so
> > the
> > > > > table
> > > > > > >> > information is conceivably available. I don't understand the
> > > rest
> > > > of
> > > > > > the
> > > > > > >> > architecture well enough to know what might be causing this
> > > > timeout
> > > > > > >> during
> > > > > > >> > "list".
> > > > > > >> >
> > > > > > >> > Any suggestions?
> > > > > > >> >
> > > > > > >> > java.net.SocketTimeoutException: Call to
> > > > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed
> on
> > > > > socket
> > > > > > >> > timeout exception: java.net.SocketTimeoutException: 60000
> > millis
> > > > > > >> > timeout while waiting for channel to be ready for read. ch :
> > > > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > > > 10.100.101.221:36722
> > > > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> > > > > > >> >         at
> > > > > > >> >
> > > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> > > > > > >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > >
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> > > > > > >> >         at
> > > > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > > > > > >> >         at
> > > > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > > > > > >> >         at
> > > > > > >> >
> > > > >
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > > > >> >         at
> > > > > > >> >
> > > > > >
> > > >
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > > > > >> >         at
> > > > > > >> >
> > > > > >
> > > >
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > > > > >> >         at
> > > > > > >> >
> > > > > >
> > > >
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > > > > >> >         at
> > > > > > >> >
> > > > >
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > > > > >> >         at
> > > > > > >> >
> > > > > >
> > > >
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > > > > >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> > > > > > >> >         at
> > > > > > >> >
> > > > > >
> > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > > > > >> >         at
> > > > > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > > > > >> >         at
> > > > > > >>
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > > > > >> >         at
> > > > > > >>
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > > > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis
> > timeout
> > > > > while
> > > > > > >> > waiting for channel to be ready for read. ch :
> > > > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > > > 10.100.101.221:36722
> > > > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > > > > >> >         at
> > > > > > >> >
> > > > > >
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > > > > >> >         at
> > > > > > >> >
> > > > > >
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > > > > >> >         at java.io.FilterInputStream.read(Unknown Source)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> > > > > > >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> > > > > > >> >         at java.io.BufferedInputStream.read(Unknown Source)
> > > > > > >> >         at java.io.DataInputStream.readInt(Unknown Source)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> > > > > > >> >         at
> > > > > > >> >
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > The master log file is unhelpful. Almost all of it is
> notices
> > > > about
> > > > > > >> > skipping load balancing, but at least the exception appears
> in
> > > the
> > > > > > >> > log:
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > > balancing
> > > > > > >> > because balanced cluster; servers=6 regions=20
> > average=3.3333333
> > > > mos
> > > > > > >> > tloaded=4 leastloaded=3
> > > > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > > balancing
> > > > > > >> > because balanced cluster; servers=6 regions=1
> > average=0.16666667
> > > > mos
> > > > > > >> > tloaded=1 leastloaded=0
> > > > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > > balancing
> > > > > > >> > because balanced cluster; servers=6 regions=17
> > average=2.8333333
> > > > mos
> > > > > > >> > tloaded=3 leastloaded=2
> > > > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > > balancing
> > > > > > >> > because balanced cluster; servers=6 regions=12 average=2.0
> > > > mostloade
> > > > > > >> > d=2 leastloaded=2
> > > > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > > balancing
> > > > > > >> > because balanced cluster; servers=6 regions=68
> > average=11.333333
> > > > mos
> > > > > > >> > tloaded=12 leastloaded=11
> > > > > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log:
> /master-status:
> > > > > > >> > java.net.SocketTimeoutException: Call to
> > > > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed
> > on s
> > > > > > >> > ocket timeout exception: java.net.SocketTimeoutException:
> > 60000
> > > > > millis
> > > > > > >> > timeout while waiting for channel to be ready for read. ch :
> > > > > > >> > java.nio.channels.SocketChannel[co
> > > > > > >> > nnected local=/10.100.101.221:36674
> > > > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > --Tom
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Master timeout during list tables

Posted by Ted Yu <yu...@gmail.com>.

Can you look at the tail of master log to see which WAL takes long time to
split ?

Checking Namenode log if needed.

Cheers

On Wed, Jun 25, 2014 at 12:09 PM, Tom Brown <to...@gmail.com> wrote:

> Yes, that stack is still there:
>
>    java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
> - locked <0x00000000bfa0a068> (a
> org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
> at
>
> org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
> at
>
> org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
>
> What does that mean? That HDFS is behaving badly, or something else
> entirely?
>
> --Tom
>
>
>
> On Wed, Jun 25, 2014 at 11:45 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > Looks like master was stuck in FileSystem.listStatus() call.
> > I noticed the following - did this show up if you take jstack one more
> time
> > ?
> >
> >    1.         at
> >
> >
>  org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
> >    2.         - locked <0x00000000bfa0a068> (a
> >    org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
> >    3.         at
> >
> >
>  org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
> >    4.         at
> >
> >
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
> >    5.         at
> >
> >
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
> >    6.         at
> >
> >
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282)
> >    7.         at
> >
> >
>  org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127)
> >
> >
> >
> > On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown <to...@gmail.com>
> wrote:
> >
> > > Before I was able to acquire a stack trace, I restarted the master.
> > > However, the issue has just happened again and I was able to get a
> stack
> > > trace:
> > >
> > > http://pastebin.com/Mz5c6AML
> > >
> > > (The pastebin is set to never expire, so anyone viewing an archived
> > version
> > > of this message should still be able to see the stack)
> > >
> > > The version of hbase is 0.94.10.
> > >
> > > Thanks!
> > >
> > > --Tom
> > >
> > >
> > > On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <ti...@gmail.com> wrote:
> > >
> > > > Hi Tom,
> > > > Can you collect your master jvm stacktrace when problem happens and
> put
> > > it
> > > > to pastbin?
> > > > what is your hbase version?
> > > >
> > > >
> > > > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <to...@gmail.com>
> > wrote:
> > > >
> > > > > Could this happen if the master is running too many RPC tasks and
> > can't
> > > > > keep up? What about if there's too many connections to the server?
> > > > >
> > > > > --Tom
> > > > >
> > > > >
> > > > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <to...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > That server is the master and is not a regionserver.
> > > > > >
> > > > > > --Tom
> > > > > >
> > > > > >
> > > > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> Have you checked region server log on 10.100.101.221
> > > > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
> > > > > >>
> > > > > >> Cheers
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <
> tombrown52@gmail.com
> > >
> > > > > wrote:
> > > > > >>
> > > > > >> > Hello all,
> > > > > >> >
> > > > > >> > I'm trying to view the master status of a 6 node (0.94.10;
> > hadoop
> > > > > 1.1.2)
> > > > > >> > cluster but I keep getting a timeout exception.
> > > > > >> >
> > > > > >> > The rest of the cluster is operating quite normally. From the
> > > > > >> exception, it
> > > > > >> > seems like the "list tables" function (required to display the
> > web
> > > > UI)
> > > > > >> is
> > > > > >> > timing out for some reason.
> > > > > >> >
> > > > > >> > From the shell, I'm able to scan the entire .META. table, so
> the
> > > > table
> > > > > >> > information is conceivably available. I don't understand the
> > rest
> > > of
> > > > > the
> > > > > >> > architecture well enough to know what might be causing this
> > > timeout
> > > > > >> during
> > > > > >> > "list".
> > > > > >> >
> > > > > >> > Any suggestions?
> > > > > >> >
> > > > > >> > java.net.SocketTimeoutException: Call to
> > > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on
> > > > socket
> > > > > >> > timeout exception: java.net.SocketTimeoutException: 60000
> millis
> > > > > >> > timeout while waiting for channel to be ready for read. ch :
> > > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > > 10.100.101.221:36722
> > > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> > > > > >> >         at
> > > > > >> >
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> > > > > >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > >
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> > > > > >> >         at
> > > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > > > > >> >         at
> > > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > > > > >> >         at
> > > > > >> >
> > > >
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > > >> >         at
> > > > > >> >
> > > > >
> > >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > > > >> >         at
> > > > > >> >
> > > > >
> > >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > > > >> >         at
> > > > > >> >
> > > > >
> > >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > > > >> >         at
> > > > > >> >
> > > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > > > >> >         at
> > > > > >> >
> > > > >
> > >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > > > >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> > > > > >> >         at
> > > > > >> >
> > > > >
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > > > >> >         at
> > > > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > > > >> >         at
> > > > > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > > > >> >         at
> > > > > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis
> timeout
> > > > while
> > > > > >> > waiting for channel to be ready for read. ch :
> > > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > > 10.100.101.221:36722
> > > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > > > >> >         at
> > > > > >> >
> > > > >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > > > >> >         at
> > > > > >> >
> > > > >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > > > >> >         at java.io.FilterInputStream.read(Unknown Source)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> > > > > >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> > > > > >> >         at java.io.BufferedInputStream.read(Unknown Source)
> > > > > >> >         at java.io.DataInputStream.readInt(Unknown Source)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> > > > > >> >         at
> > > > > >> >
> > > > > >>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > The master log file is unhelpful. Almost all of it is notices
> > > about
> > > > > >> > skipping load balancing, but at least the exception appears in
> > the
> > > > > >> > log:
> > > > > >> >
> > > > > >> >
> > > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > balancing
> > > > > >> > because balanced cluster; servers=6 regions=20
> average=3.3333333
> > > mos
> > > > > >> > tloaded=4 leastloaded=3
> > > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > balancing
> > > > > >> > because balanced cluster; servers=6 regions=1
> average=0.16666667
> > > mos
> > > > > >> > tloaded=1 leastloaded=0
> > > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > balancing
> > > > > >> > because balanced cluster; servers=6 regions=17
> average=2.8333333
> > > mos
> > > > > >> > tloaded=3 leastloaded=2
> > > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > balancing
> > > > > >> > because balanced cluster; servers=6 regions=12 average=2.0
> > > mostloade
> > > > > >> > d=2 leastloaded=2
> > > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > > balancing
> > > > > >> > because balanced cluster; servers=6 regions=68
> average=11.333333
> > > mos
> > > > > >> > tloaded=12 leastloaded=11
> > > > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> > > > > >> > java.net.SocketTimeoutException: Call to
> > > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed
> on s
> > > > > >> > ocket timeout exception: java.net.SocketTimeoutException:
> 60000
> > > > millis
> > > > > >> > timeout while waiting for channel to be ready for read. ch :
> > > > > >> > java.nio.channels.SocketChannel[co
> > > > > >> > nnected local=/10.100.101.221:36674
> > > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >> > --Tom
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Master timeout during list tables

Posted by Tom Brown <to...@gmail.com>.

Yes, that stack is still there:

   java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
at
org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
- locked <0x00000000bfa0a068> (a
org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
at
org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
at
org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)

What does that mean? That HDFS is behaving badly, or something else
entirely?

--Tom



On Wed, Jun 25, 2014 at 11:45 AM, Ted Yu <yu...@gmail.com> wrote:

> Looks like master was stuck in FileSystem.listStatus() call.
> I noticed the following - did this show up if you take jstack one more time
> ?
>
>    1.         at
>
>  org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
>    2.         - locked <0x00000000bfa0a068> (a
>    org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
>    3.         at
>
>  org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
>    4.         at
>
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
>    5.         at
>
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
>    6.         at
>
>  org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282)
>    7.         at
>
>  org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127)
>
>
>
> On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown <to...@gmail.com> wrote:
>
> > Before I was able to acquire a stack trace, I restarted the master.
> > However, the issue has just happened again and I was able to get a stack
> > trace:
> >
> > http://pastebin.com/Mz5c6AML
> >
> > (The pastebin is set to never expire, so anyone viewing an archived
> version
> > of this message should still be able to see the stack)
> >
> > The version of hbase is 0.94.10.
> >
> > Thanks!
> >
> > --Tom
> >
> >
> > On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <ti...@gmail.com> wrote:
> >
> > > Hi Tom,
> > > Can you collect your master jvm stacktrace when problem happens and put
> > it
> > > to pastbin?
> > > what is your hbase version?
> > >
> > >
> > > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <to...@gmail.com>
> wrote:
> > >
> > > > Could this happen if the master is running too many RPC tasks and
> can't
> > > > keep up? What about if there's too many connections to the server?
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <to...@gmail.com>
> > > wrote:
> > > >
> > > > > That server is the master and is not a regionserver.
> > > > >
> > > > > --Tom
> > > > >
> > > > >
> > > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com>
> > wrote:
> > > > >
> > > > >> Have you checked region server log on 10.100.101.221
> > > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
> > > > >>
> > > > >> Cheers
> > > > >>
> > > > >>
> > > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <tombrown52@gmail.com
> >
> > > > wrote:
> > > > >>
> > > > >> > Hello all,
> > > > >> >
> > > > >> > I'm trying to view the master status of a 6 node (0.94.10;
> hadoop
> > > > 1.1.2)
> > > > >> > cluster but I keep getting a timeout exception.
> > > > >> >
> > > > >> > The rest of the cluster is operating quite normally. From the
> > > > >> exception, it
> > > > >> > seems like the "list tables" function (required to display the
> web
> > > UI)
> > > > >> is
> > > > >> > timing out for some reason.
> > > > >> >
> > > > >> > From the shell, I'm able to scan the entire .META. table, so the
> > > table
> > > > >> > information is conceivably available. I don't understand the
> rest
> > of
> > > > the
> > > > >> > architecture well enough to know what might be causing this
> > timeout
> > > > >> during
> > > > >> > "list".
> > > > >> >
> > > > >> > Any suggestions?
> > > > >> >
> > > > >> > java.net.SocketTimeoutException: Call to
> > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on
> > > socket
> > > > >> > timeout exception: java.net.SocketTimeoutException: 60000 millis
> > > > >> > timeout while waiting for channel to be ready for read. ch :
> > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > 10.100.101.221:36722
> > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> > > > >> >         at
> > > > >> >
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> > > > >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> > > > >> >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > > > >> >         at
> > > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > > > >> >         at
> > > > >> >
> > > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > > >> >         at
> > > > >> >
> > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > > >> >         at
> > > > >> >
> > > >
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > > >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> > > > >> >         at
> > > > >> >
> > > >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > > >> >         at
> > > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > > >> >         at
> > > > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > > >> >         at
> > > > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> > > while
> > > > >> > waiting for channel to be ready for read. ch :
> > > > >> > java.nio.channels.SocketChannel[connected local=/
> > > 10.100.101.221:36722
> > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > > >> >         at
> > > > >> >
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > > >> >         at
> > > > >> >
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > > >> >         at java.io.FilterInputStream.read(Unknown Source)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> > > > >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> > > > >> >         at java.io.BufferedInputStream.read(Unknown Source)
> > > > >> >         at java.io.DataInputStream.readInt(Unknown Source)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> > > > >> >         at
> > > > >> >
> > > > >>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > The master log file is unhelpful. Almost all of it is notices
> > about
> > > > >> > skipping load balancing, but at least the exception appears in
> the
> > > > >> > log:
> > > > >> >
> > > > >> >
> > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=20 average=3.3333333
> > mos
> > > > >> > tloaded=4 leastloaded=3
> > > > >> > 2014-06-18 16:18:50,359 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=1 average=0.16666667
> > mos
> > > > >> > tloaded=1 leastloaded=0
> > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=17 average=2.8333333
> > mos
> > > > >> > tloaded=3 leastloaded=2
> > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=12 average=2.0
> > mostloade
> > > > >> > d=2 leastloaded=2
> > > > >> > 2014-06-18 16:18:50,360 INFO
> > > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> > balancing
> > > > >> > because balanced cluster; servers=6 regions=68 average=11.333333
> > mos
> > > > >> > tloaded=12 leastloaded=11
> > > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> > > > >> > java.net.SocketTimeoutException: Call to
> > > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
> > > > >> > ocket timeout exception: java.net.SocketTimeoutException: 60000
> > > millis
> > > > >> > timeout while waiting for channel to be ready for read. ch :
> > > > >> > java.nio.channels.SocketChannel[co
> > > > >> > nnected local=/10.100.101.221:36674
> > > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > --Tom
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Master timeout during list tables

Posted by Ted Yu <yu...@gmail.com>.

Looks like master was stuck in FileSystem.listStatus() call.
I noticed the following - did this show up if you take jstack one more time
?

   1.         at
   org.apache.hadoop.hbase.master.SplitLogManager.waitForSplittingCompletion(SplitLogManager.java:372)
   2.         - locked <0x00000000bfa0a068> (a
   org.apache.hadoop.hbase.master.SplitLogManager$TaskBatch)
   3.         at
   org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
   4.         at
   org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:371)
   5.         at
   org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:337)
   6.         at
   org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:282)
   7.         at
   org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:127)



On Wed, Jun 25, 2014 at 10:09 AM, Tom Brown <to...@gmail.com> wrote:

> Before I was able to acquire a stack trace, I restarted the master.
> However, the issue has just happened again and I was able to get a stack
> trace:
>
> http://pastebin.com/Mz5c6AML
>
> (The pastebin is set to never expire, so anyone viewing an archived version
> of this message should still be able to see the stack)
>
> The version of hbase is 0.94.10.
>
> Thanks!
>
> --Tom
>
>
> On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <ti...@gmail.com> wrote:
>
> > Hi Tom,
> > Can you collect your master jvm stacktrace when problem happens and put
> it
> > to pastbin?
> > what is your hbase version?
> >
> >
> > On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <to...@gmail.com> wrote:
> >
> > > Could this happen if the master is running too many RPC tasks and can't
> > > keep up? What about if there's too many connections to the server?
> > >
> > > --Tom
> > >
> > >
> > > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <to...@gmail.com>
> > wrote:
> > >
> > > > That server is the master and is not a regionserver.
> > > >
> > > > --Tom
> > > >
> > > >
> > > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com>
> wrote:
> > > >
> > > >> Have you checked region server log on 10.100.101.221
> > > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
> > > >>
> > > >> Cheers
> > > >>
> > > >>
> > > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <to...@gmail.com>
> > > wrote:
> > > >>
> > > >> > Hello all,
> > > >> >
> > > >> > I'm trying to view the master status of a 6 node (0.94.10; hadoop
> > > 1.1.2)
> > > >> > cluster but I keep getting a timeout exception.
> > > >> >
> > > >> > The rest of the cluster is operating quite normally. From the
> > > >> exception, it
> > > >> > seems like the "list tables" function (required to display the web
> > UI)
> > > >> is
> > > >> > timing out for some reason.
> > > >> >
> > > >> > From the shell, I'm able to scan the entire .META. table, so the
> > table
> > > >> > information is conceivably available. I don't understand the rest
> of
> > > the
> > > >> > architecture well enough to know what might be causing this
> timeout
> > > >> during
> > > >> > "list".
> > > >> >
> > > >> > Any suggestions?
> > > >> >
> > > >> > java.net.SocketTimeoutException: Call to
> > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on
> > socket
> > > >> > timeout exception: java.net.SocketTimeoutException: 60000 millis
> > > >> > timeout while waiting for channel to be ready for read. ch :
> > > >> > java.nio.channels.SocketChannel[connected local=/
> > 10.100.101.221:36722
> > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> > > >> >         at
> > > >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> > > >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> > > >> >         at
> > > >> >
> > > >>
> > >
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> > > >> >         at
> > > javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > > >> >         at
> > > javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > > >> >         at
> > > >> >
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > >> >         at
> > > >> >
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > >> >         at
> > > >> >
> > >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> > > >> >         at
> > > >> >
> > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > >> >         at
> > org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > >> >         at
> > > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > >> >         at
> > > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > >> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> > while
> > > >> > waiting for channel to be ready for read. ch :
> > > >> > java.nio.channels.SocketChannel[connected local=/
> > 10.100.101.221:36722
> > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > >> >         at
> > > >> >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > >> >         at
> > > >> >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > >> >         at java.io.FilterInputStream.read(Unknown Source)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> > > >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> > > >> >         at java.io.BufferedInputStream.read(Unknown Source)
> > > >> >         at java.io.DataInputStream.readInt(Unknown Source)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> > > >> >         at
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> > > >> >
> > > >> >
> > > >> >
> > > >> > The master log file is unhelpful. Almost all of it is notices
> about
> > > >> > skipping load balancing, but at least the exception appears in the
> > > >> > log:
> > > >> >
> > > >> >
> > > >> > 2014-06-18 16:18:50,359 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=20 average=3.3333333
> mos
> > > >> > tloaded=4 leastloaded=3
> > > >> > 2014-06-18 16:18:50,359 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=1 average=0.16666667
> mos
> > > >> > tloaded=1 leastloaded=0
> > > >> > 2014-06-18 16:18:50,360 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=17 average=2.8333333
> mos
> > > >> > tloaded=3 leastloaded=2
> > > >> > 2014-06-18 16:18:50,360 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=12 average=2.0
> mostloade
> > > >> > d=2 leastloaded=2
> > > >> > 2014-06-18 16:18:50,360 INFO
> > > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load
> balancing
> > > >> > because balanced cluster; servers=6 regions=68 average=11.333333
> mos
> > > >> > tloaded=12 leastloaded=11
> > > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> > > >> > java.net.SocketTimeoutException: Call to
> > > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
> > > >> > ocket timeout exception: java.net.SocketTimeoutException: 60000
> > millis
> > > >> > timeout while waiting for channel to be ready for read. ch :
> > > >> > java.nio.channels.SocketChannel[co
> > > >> > nnected local=/10.100.101.221:36674
> > > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --Tom
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Master timeout during list tables

Posted by Tom Brown <to...@gmail.com>.

Before I was able to acquire a stack trace, I restarted the master.
However, the issue has just happened again and I was able to get a stack
trace:

http://pastebin.com/Mz5c6AML

(The pastebin is set to never expire, so anyone viewing an archived version
of this message should still be able to see the stack)

The version of hbase is 0.94.10.

Thanks!

--Tom


On Wed, Jun 18, 2014 at 8:55 PM, Qiang Tian <ti...@gmail.com> wrote:

> Hi Tom,
> Can you collect your master jvm stacktrace when problem happens and put it
> to pastbin?
> what is your hbase version?
>
>
> On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <to...@gmail.com> wrote:
>
> > Could this happen if the master is running too many RPC tasks and can't
> > keep up? What about if there's too many connections to the server?
> >
> > --Tom
> >
> >
> > On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <to...@gmail.com>
> wrote:
> >
> > > That server is the master and is not a regionserver.
> > >
> > > --Tom
> > >
> > >
> > > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > >> Have you checked region server log on 10.100.101.221
> > >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
> > >>
> > >> Cheers
> > >>
> > >>
> > >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <to...@gmail.com>
> > wrote:
> > >>
> > >> > Hello all,
> > >> >
> > >> > I'm trying to view the master status of a 6 node (0.94.10; hadoop
> > 1.1.2)
> > >> > cluster but I keep getting a timeout exception.
> > >> >
> > >> > The rest of the cluster is operating quite normally. From the
> > >> exception, it
> > >> > seems like the "list tables" function (required to display the web
> UI)
> > >> is
> > >> > timing out for some reason.
> > >> >
> > >> > From the shell, I'm able to scan the entire .META. table, so the
> table
> > >> > information is conceivably available. I don't understand the rest of
> > the
> > >> > architecture well enough to know what might be causing this timeout
> > >> during
> > >> > "list".
> > >> >
> > >> > Any suggestions?
> > >> >
> > >> > java.net.SocketTimeoutException: Call to
> > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on
> socket
> > >> > timeout exception: java.net.SocketTimeoutException: 60000 millis
> > >> > timeout while waiting for channel to be ready for read. ch :
> > >> > java.nio.channels.SocketChannel[connected local=/
> 10.100.101.221:36722
> > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> > >> >         at
> > >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> > >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> > >> >         at
> > >> >
> > >>
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> > >> >         at
> > javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> > >> >         at
> > javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> > >> >         at
> > >> >
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >> >         at
> > >> >
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > >> >         at
> > >> >
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > >> >         at
> > >> >
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > >> >         at
> > >> >
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > >> >         at
> > >> >
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> > >> >         at
> > >> >
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > >> >         at
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > >> >         at
> > >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > >> >         at
> > >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > >> >         at
> > >> >
> > >>
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > >> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> while
> > >> > waiting for channel to be ready for read. ch :
> > >> > java.nio.channels.SocketChannel[connected local=/
> 10.100.101.221:36722
> > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > >> >         at
> > >> >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > >> >         at
> > >> >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > >> >         at java.io.FilterInputStream.read(Unknown Source)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> > >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> > >> >         at java.io.BufferedInputStream.read(Unknown Source)
> > >> >         at java.io.DataInputStream.readInt(Unknown Source)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> > >> >         at
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> > >> >
> > >> >
> > >> >
> > >> > The master log file is unhelpful. Almost all of it is notices about
> > >> > skipping load balancing, but at least the exception appears in the
> > >> > log:
> > >> >
> > >> >
> > >> > 2014-06-18 16:18:50,359 INFO
> > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > >> > because balanced cluster; servers=6 regions=20 average=3.3333333 mos
> > >> > tloaded=4 leastloaded=3
> > >> > 2014-06-18 16:18:50,359 INFO
> > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > >> > because balanced cluster; servers=6 regions=1 average=0.16666667 mos
> > >> > tloaded=1 leastloaded=0
> > >> > 2014-06-18 16:18:50,360 INFO
> > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > >> > because balanced cluster; servers=6 regions=17 average=2.8333333 mos
> > >> > tloaded=3 leastloaded=2
> > >> > 2014-06-18 16:18:50,360 INFO
> > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > >> > because balanced cluster; servers=6 regions=12 average=2.0 mostloade
> > >> > d=2 leastloaded=2
> > >> > 2014-06-18 16:18:50,360 INFO
> > >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > >> > because balanced cluster; servers=6 regions=68 average=11.333333 mos
> > >> > tloaded=12 leastloaded=11
> > >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> > >> > java.net.SocketTimeoutException: Call to
> > >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
> > >> > ocket timeout exception: java.net.SocketTimeoutException: 60000
> millis
> > >> > timeout while waiting for channel to be ready for read. ch :
> > >> > java.nio.channels.SocketChannel[co
> > >> > nnected local=/10.100.101.221:36674
> > >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > --Tom
> > >> >
> > >>
> > >
> > >
> >
>

Re: Master timeout during list tables

Posted by Qiang Tian <ti...@gmail.com>.

Hi Tom,
Can you collect your master jvm stacktrace when problem happens and put it
to pastbin?
what is your hbase version?


On Thu, Jun 19, 2014 at 1:34 AM, Tom Brown <to...@gmail.com> wrote:

> Could this happen if the master is running too many RPC tasks and can't
> keep up? What about if there's too many connections to the server?
>
> --Tom
>
>
> On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <to...@gmail.com> wrote:
>
> > That server is the master and is not a regionserver.
> >
> > --Tom
> >
> >
> > On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> >> Have you checked region server log on 10.100.101.221
> >> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
> >>
> >> Cheers
> >>
> >>
> >> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <to...@gmail.com>
> wrote:
> >>
> >> > Hello all,
> >> >
> >> > I'm trying to view the master status of a 6 node (0.94.10; hadoop
> 1.1.2)
> >> > cluster but I keep getting a timeout exception.
> >> >
> >> > The rest of the cluster is operating quite normally. From the
> >> exception, it
> >> > seems like the "list tables" function (required to display the web UI)
> >> is
> >> > timing out for some reason.
> >> >
> >> > From the shell, I'm able to scan the entire .META. table, so the table
> >> > information is conceivably available. I don't understand the rest of
> the
> >> > architecture well enough to know what might be causing this timeout
> >> during
> >> > "list".
> >> >
> >> > Any suggestions?
> >> >
> >> > java.net.SocketTimeoutException: Call to
> >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on socket
> >> > timeout exception: java.net.SocketTimeoutException: 60000 millis
> >> > timeout while waiting for channel to be ready for read. ch :
> >> > java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
> >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> >> >         at
> >> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> >> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> >> >         at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> >> >         at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >> >         at
> >> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> >> >         at
> >> >
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> >> >         at
> >> >
> >>
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> >> >         at
> >> >
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >> >         at
> >> >
> >>
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> >> >         at
> >> >
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >> >         at
> >> >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >> >         at
> >> >
> >>
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >> >         at
> >> >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >> >         at
> >> >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >> >         at
> >> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> >> >         at
> >> >
> >>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >> >         at
> >> >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> >> >         at
> >> >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> >> >         at
> >> >
> >>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >> >         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> >> >         at
> >> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> >> >         at
> >> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> >> >         at
> >> >
> >>
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> >> >         at
> >> >
> >>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> >> > waiting for channel to be ready for read. ch :
> >> > java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
> >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> >> >         at
> >> >
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >> >         at
> >> >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >> >         at
> >> >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >> >         at java.io.FilterInputStream.read(Unknown Source)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> >> >         at java.io.BufferedInputStream.fill(Unknown Source)
> >> >         at java.io.BufferedInputStream.read(Unknown Source)
> >> >         at java.io.DataInputStream.readInt(Unknown Source)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> >> >         at
> >> >
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> >> >
> >> >
> >> >
> >> > The master log file is unhelpful. Almost all of it is notices about
> >> > skipping load balancing, but at least the exception appears in the
> >> > log:
> >> >
> >> >
> >> > 2014-06-18 16:18:50,359 INFO
> >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> >> > because balanced cluster; servers=6 regions=20 average=3.3333333 mos
> >> > tloaded=4 leastloaded=3
> >> > 2014-06-18 16:18:50,359 INFO
> >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> >> > because balanced cluster; servers=6 regions=1 average=0.16666667 mos
> >> > tloaded=1 leastloaded=0
> >> > 2014-06-18 16:18:50,360 INFO
> >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> >> > because balanced cluster; servers=6 regions=17 average=2.8333333 mos
> >> > tloaded=3 leastloaded=2
> >> > 2014-06-18 16:18:50,360 INFO
> >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> >> > because balanced cluster; servers=6 regions=12 average=2.0 mostloade
> >> > d=2 leastloaded=2
> >> > 2014-06-18 16:18:50,360 INFO
> >> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> >> > because balanced cluster; servers=6 regions=68 average=11.333333 mos
> >> > tloaded=12 leastloaded=11
> >> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> >> > java.net.SocketTimeoutException: Call to
> >> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
> >> > ocket timeout exception: java.net.SocketTimeoutException: 60000 millis
> >> > timeout while waiting for channel to be ready for read. ch :
> >> > java.nio.channels.SocketChannel[co
> >> > nnected local=/10.100.101.221:36674
> >> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> >> >
> >> >
> >> >
> >> >
> >> > --Tom
> >> >
> >>
> >
> >
>

Re: Master timeout during list tables

Posted by Tom Brown <to...@gmail.com>.

Could this happen if the master is running too many RPC tasks and can't
keep up? What about if there's too many connections to the server?

--Tom


On Wed, Jun 18, 2014 at 11:33 AM, Tom Brown <to...@gmail.com> wrote:

> That server is the master and is not a regionserver.
>
> --Tom
>
>
> On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com> wrote:
>
>> Have you checked region server log on 10.100.101.221
>> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
>>
>> Cheers
>>
>>
>> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <to...@gmail.com> wrote:
>>
>> > Hello all,
>> >
>> > I'm trying to view the master status of a 6 node (0.94.10; hadoop 1.1.2)
>> > cluster but I keep getting a timeout exception.
>> >
>> > The rest of the cluster is operating quite normally. From the
>> exception, it
>> > seems like the "list tables" function (required to display the web UI)
>> is
>> > timing out for some reason.
>> >
>> > From the shell, I'm able to scan the entire .META. table, so the table
>> > information is conceivably available. I don't understand the rest of the
>> > architecture well enough to know what might be causing this timeout
>> during
>> > "list".
>> >
>> > Any suggestions?
>> >
>> > java.net.SocketTimeoutException: Call to
>> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on socket
>> > timeout exception: java.net.SocketTimeoutException: 60000 millis
>> > timeout while waiting for channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
>> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
>> >         at
>> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
>> >         at $Proxy11.getHTableDescriptors(Unknown Source)
>> >         at
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
>> >         at
>> >
>> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
>> >         at
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
>> >         at
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
>> >         at
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
>> >         at
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
>> >         at
>> >
>> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
>> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>> >         at
>> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>> >         at
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>> >         at
>> >
>> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
>> >         at
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> >         at
>> >
>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
>> >         at
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> >         at
>> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>> >         at
>> >
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>> >         at
>> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>> >         at
>> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>> >         at
>> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>> >         at
>> >
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> >         at
>> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> >         at org.mortbay.jetty.Server.handle(Server.java:326)
>> >         at
>> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>> >         at
>> >
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>> >         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>> >         at
>> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>> >         at
>> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>> >         at
>> >
>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>> >         at
>> >
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>> > waiting for channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
>> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
>> >         at
>> >
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>> >         at
>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>> >         at
>> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>> >         at java.io.FilterInputStream.read(Unknown Source)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
>> >         at java.io.BufferedInputStream.fill(Unknown Source)
>> >         at java.io.BufferedInputStream.read(Unknown Source)
>> >         at java.io.DataInputStream.readInt(Unknown Source)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
>> >         at
>> >
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
>> >
>> >
>> >
>> > The master log file is unhelpful. Almost all of it is notices about
>> > skipping load balancing, but at least the exception appears in the
>> > log:
>> >
>> >
>> > 2014-06-18 16:18:50,359 INFO
>> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
>> > because balanced cluster; servers=6 regions=20 average=3.3333333 mos
>> > tloaded=4 leastloaded=3
>> > 2014-06-18 16:18:50,359 INFO
>> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
>> > because balanced cluster; servers=6 regions=1 average=0.16666667 mos
>> > tloaded=1 leastloaded=0
>> > 2014-06-18 16:18:50,360 INFO
>> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
>> > because balanced cluster; servers=6 regions=17 average=2.8333333 mos
>> > tloaded=3 leastloaded=2
>> > 2014-06-18 16:18:50,360 INFO
>> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
>> > because balanced cluster; servers=6 regions=12 average=2.0 mostloade
>> > d=2 leastloaded=2
>> > 2014-06-18 16:18:50,360 INFO
>> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
>> > because balanced cluster; servers=6 regions=68 average=11.333333 mos
>> > tloaded=12 leastloaded=11
>> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
>> > java.net.SocketTimeoutException: Call to
>> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
>> > ocket timeout exception: java.net.SocketTimeoutException: 60000 millis
>> > timeout while waiting for channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[co
>> > nnected local=/10.100.101.221:36674
>> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
>> >
>> >
>> >
>> >
>> > --Tom
>> >
>>
>
>

Re: Master timeout during list tables

Posted by Tom Brown <to...@gmail.com>.

That server is the master and is not a regionserver.

--Tom


On Wed, Jun 18, 2014 at 11:29 AM, Ted Yu <yu...@gmail.com> wrote:

> Have you checked region server log on 10.100.101.221
> <http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?
>
> Cheers
>
>
> On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <to...@gmail.com> wrote:
>
> > Hello all,
> >
> > I'm trying to view the master status of a 6 node (0.94.10; hadoop 1.1.2)
> > cluster but I keep getting a timeout exception.
> >
> > The rest of the cluster is operating quite normally. From the exception,
> it
> > seems like the "list tables" function (required to display the web UI) is
> > timing out for some reason.
> >
> > From the shell, I'm able to scan the entire .META. table, so the table
> > information is conceivably available. I don't understand the rest of the
> > architecture well enough to know what might be causing this timeout
> during
> > "list".
> >
> > Any suggestions?
> >
> > java.net.SocketTimeoutException: Call to
> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on socket
> > timeout exception: java.net.SocketTimeoutException: 60000 millis
> > timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
> >         at
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
> >         at
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
> >         at $Proxy11.getHTableDescriptors(Unknown Source)
> >         at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
> >         at
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
> >         at
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
> >         at
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
> >         at
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
> >         at
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
> >         at
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
> >         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
> >         at
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> >         at
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> >         at
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
> >         at
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >         at
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> >         at
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >         at
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >         at
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >         at
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >         at
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >         at
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> >         at
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >         at
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >         at org.mortbay.jetty.Server.handle(Server.java:326)
> >         at
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> >         at
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> >         at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> >         at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> >         at
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> >         at
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> >         at
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >         at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >         at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >         at java.io.FilterInputStream.read(Unknown Source)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
> >         at java.io.BufferedInputStream.fill(Unknown Source)
> >         at java.io.BufferedInputStream.read(Unknown Source)
> >         at java.io.DataInputStream.readInt(Unknown Source)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
> >         at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
> >
> >
> >
> > The master log file is unhelpful. Almost all of it is notices about
> > skipping load balancing, but at least the exception appears in the
> > log:
> >
> >
> > 2014-06-18 16:18:50,359 INFO
> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > because balanced cluster; servers=6 regions=20 average=3.3333333 mos
> > tloaded=4 leastloaded=3
> > 2014-06-18 16:18:50,359 INFO
> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > because balanced cluster; servers=6 regions=1 average=0.16666667 mos
> > tloaded=1 leastloaded=0
> > 2014-06-18 16:18:50,360 INFO
> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > because balanced cluster; servers=6 regions=17 average=2.8333333 mos
> > tloaded=3 leastloaded=2
> > 2014-06-18 16:18:50,360 INFO
> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > because balanced cluster; servers=6 regions=12 average=2.0 mostloade
> > d=2 leastloaded=2
> > 2014-06-18 16:18:50,360 INFO
> > org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> > because balanced cluster; servers=6 regions=68 average=11.333333 mos
> > tloaded=12 leastloaded=11
> > 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> > java.net.SocketTimeoutException: Call to
> > hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
> > ocket timeout exception: java.net.SocketTimeoutException: 60000 millis
> > timeout while waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[co
> > nnected local=/10.100.101.221:36674
> > remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
> >
> >
> >
> >
> > --Tom
> >
>

Re: Master timeout during list tables

Posted by Ted Yu <yu...@gmail.com>.

Have you checked region server log on 10.100.101.221
<http://hdpmgr001.pse.movenetworks.com/10.100.101.221:60000> ?

Cheers


On Wed, Jun 18, 2014 at 10:19 AM, Tom Brown <to...@gmail.com> wrote:

> Hello all,
>
> I'm trying to view the master status of a 6 node (0.94.10; hadoop 1.1.2)
> cluster but I keep getting a timeout exception.
>
> The rest of the cluster is operating quite normally. From the exception, it
> seems like the "list tables" function (required to display the web UI) is
> timing out for some reason.
>
> From the shell, I'm able to scan the entire .META. table, so the table
> information is conceivably available. I don't understand the rest of the
> architecture well enough to know what might be causing this timeout during
> "list".
>
> Any suggestions?
>
> java.net.SocketTimeoutException: Call to
> hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on socket
> timeout exception: java.net.SocketTimeoutException: 60000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
> remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:1026)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:999)
>         at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
>         at $Proxy11.getHTableDescriptors(Unknown Source)
>         at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:1870)
>         at
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:279)
>         at
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:504)
>         at
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:297)
>         at
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:399)
>         at
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:389)
>         at
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:82)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
>         at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
>         at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>         at
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:101)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
>         at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>         at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>         at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>         at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>         at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>         at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>         at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>         at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>         at org.mortbay.jetty.Server.handle(Server.java:326)
>         at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>         at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>         at
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>         at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/10.100.101.221:36722
> remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
>         at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>         at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>         at java.io.FilterInputStream.read(Unknown Source)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:373)
>         at java.io.BufferedInputStream.fill(Unknown Source)
>         at java.io.BufferedInputStream.read(Unknown Source)
>         at java.io.DataInputStream.readInt(Unknown Source)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:646)
>         at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:580)
>
>
>
> The master log file is unhelpful. Almost all of it is notices about
> skipping load balancing, but at least the exception appears in the
> log:
>
>
> 2014-06-18 16:18:50,359 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> because balanced cluster; servers=6 regions=20 average=3.3333333 mos
> tloaded=4 leastloaded=3
> 2014-06-18 16:18:50,359 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> because balanced cluster; servers=6 regions=1 average=0.16666667 mos
> tloaded=1 leastloaded=0
> 2014-06-18 16:18:50,360 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> because balanced cluster; servers=6 regions=17 average=2.8333333 mos
> tloaded=3 leastloaded=2
> 2014-06-18 16:18:50,360 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> because balanced cluster; servers=6 regions=12 average=2.0 mostloade
> d=2 leastloaded=2
> 2014-06-18 16:18:50,360 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing
> because balanced cluster; servers=6 regions=68 average=11.333333 mos
> tloaded=12 leastloaded=11
> 2014-06-18 16:20:36,118 WARN org.mortbay.log: /master-status:
> java.net.SocketTimeoutException: Call to
> hdpmgr001.pse.movenetworks.com/10.100.101.221:60000 failed on s
> ocket timeout exception: java.net.SocketTimeoutException: 60000 millis
> timeout while waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[co
> nnected local=/10.100.101.221:36674
> remote=hdpmgr001.pse.movenetworks.com/10.100.101.221:60000]
>
>
>
>
> --Tom
>