You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Louis Hust <lo...@gmail.com> on 2015/05/16 05:22:03 UTC

HMaster restart with error

Hi all,

I use hbase0.96.0 with hadoop 2.2.0,
and the custom said they can not write into hbase cluster,
So i stop the HMaster and start it soon,

But it seems that the HMaster not response to request, following is the
HMaster log:

{logs}
2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16] master.RegionStates:
Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
ts=1431663212637, server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, ts=1431663213136, server=
l-hbase26.data.cn8.qunar.com,60020,1431462615651}
2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4] master.RegionStates:
Onlined 9036a3befee90eeffb9082f90a4a9afa on l-hbase26.data.cn8.qunar.com
,60020,1431462615651
2015-05-15 12:14:40,699 INFO  [main-EventThread]
zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [l-hbase27.data.cn8.qunar.com,60020,1431663208899]
2015-05-15 12:15:04,899 INFO  [main-EventThread]
zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [l-hbase25.data.cn8.qunar.com,60020,1431663193865]
2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
client.HConnectionManager$HConnectionImplementation: Checking master
connection
com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call
to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700 remote=
l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
at
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
at
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
at org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
at
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
at
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
at
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
at
org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
at
org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: java.net.SocketTimeoutException: Call to
l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700 remote=
l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
at
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
... 37 more
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700 remote=
l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at java.io.FilterInputStream.read(FilterInputStream.java:133)
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
at java.io.DataInputStream.readInt(DataInputStream.java:387)
at
org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
Fri May 15 12:15:58 CST 2015 Terminating master
{/logs}
So what the exception means? Why? and how to solve the problem?

Re: HMaster restart with error

Posted by Louis Hust <lo...@gmail.com>.

But from the log:

2015-05-15 12:16:40,522 INFO
 [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
handler.ServerShutdownHandler: Finished processing of shutdown of
l-hbase31.data.cn8.qunar.com,60020,1427789773001
2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
client.HConnectionManager$HConnectionImplementation: Checking master
connection

What the Hmaster doing between 12:16:40 and 12:17:11? It's about 30s.

2015-05-18 22:23 GMT+08:00 Ted Yu <yu...@gmail.com>:

> The exception originated from Web UI corresponding
> to HBaseAdmin.listTables().
> At that moment, master was unable to process the request - it needed some
> more time.
>
> Cheers
>
> On Sun, May 17, 2015 at 11:03 PM, Louis Hust <lo...@gmail.com> wrote:
>
> > Yes, ted, can you tell me what the following excpetion means in
> > l-namenode1.log?
> >
> > 2015-05-15 12:16:40,522 INFO
> >  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> > handler.ServerShutdownHandler: Finished processing of shutdown of
> > l-hbase31.data.cn8.qunar.com,60020,1427789773001
> > 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
> > client.HConnectionManager$HConnectionImplementation: Checking master
> > connection
> >
> > Does this mean the cluster was not operational?
> >
> >
> > 2015-05-18 11:45 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > After l-namenode1 became active master , it assigned regions:
> > >
> > > 2015-05-15 12:16:40,432 INFO  [master:l-namenode1:60000]
> > > master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff
> > > state=OFFLINE, ts=1431663400432, server=null} to
> > > {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server=
> > > l-hbase31.data.cn8.qunar.com,60020,1431462584879}
> > >
> > > However, l-hbase31 went down:
> > >
> > > 2015-05-15 12:16:40,508 INFO
> > >  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> > > handler.ServerShutdownHandler: Splitting logs for
> > > l-hbase31.data.cn8.qunar.com,60020,1427789773001   before assignment.
> > >
> > > l-namenode1 was restarted :
> > >
> > > 2015-05-15 12:20:25,322 INFO  [main] util.VersionInfo: HBase
> > 0.96.0-hadoop2
> > > 2015-05-15 12:20:25,323 INFO  [main] util.VersionInfo: Subversion
> > > https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434
> > >
> > > However, it went down due to zookeeper session expiration:
> > >
> > > 2015-05-15 12:20:25,580 WARN  [main] zookeeper.ZooKeeperNodeTracker:
> > Can't
> > > get or delete the master znode
> > > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > > KeeperErrorCode = Session expired for /hbase/master
> > >
> > > It started again after that and AssignmentManager did a lot of
> > assignments.
> > >
> > > Looks like the cluster was operational this time.
> > >
> > > Cheers
> > >
> > > On Sun, May 17, 2015 at 8:24 AM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > bq. the backup master take over at 2015-05-15 12:16:40,024 ?
> > > >
> > > > The switch of active master should be earlier than 12:16:40,024 -
> > shortly
> > > > after 12:15:58
> > > >
> > > > l-namenode1 would do some initialization (such as waiting for region
> > > > servers count to settle) after it became active master.
> > > >
> > > > I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home)
> but
> > > > the download progress was very slow.
> > > >
> > > > Will try downloading later in the day.
> > > >
> > > > Do you have access to pastebin ?
> > > >
> > > > Cheers
> > > >
> > > > On Sun, May 17, 2015 at 2:07 AM, Louis Hust <lo...@gmail.com>
> > > wrote:
> > > >
> > > >> Hi, ted,
> > > >>
> > > >> Thanks for your reply!!
> > > >>
> > > >> I found the log in l-namenode2.dba.cn8 during the restarting
> progress:
> > > >> 2015-05-15 12:11:36,540 INFO  [master:l-namenode2:60000]
> > > >> master.ServerManager: Finished waiting for region servers count to
> > > settle;
> > > >> checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
> > > >> 2147483647, master is running.
> > > >>
> > > >> So this means the HMaster ready for handle request at 12:11:36?
> > > >>
> > > >> The backup master is l-namenode1.dba.cn8 and you can get the log at
> :
> > > >>
> > > >> http://pan.baidu.com/s/1eQlKXj0
> > > >>
> > > >> After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
> > > >> the backup master l-namenode1 take over, and i found log:
> > > >>
> > > >> 2015-05-15 12:16:40,024 INFO  [master:l-namenode1:60000]
> > > >> master.ServerManager: Finished waiting for region servers count to
> > > settle;
> > > >> checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
> > > >> 2147483647, master is running.
> > > >>
> > > >> So the backup master take over at 2015-05-15 12:16:40,024 ?
> > > >>
> > > >> But it seems the l-namenode2 not working normally with the exception
> > in
> > > >> log:
> > > >>
> > > >> 2015-05-15 12:16:40,522 INFO
> > > >>  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> > > >> handler.ServerShutdownHandler: Finished processing of shutdown of
> > > >> l-hbase31.data.cn8.qunar.com,60020,1427789773001
> > > >> 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
> > > >> client.HConnectionManager$HConnectionImplementation: Checking master
> > > >> connection
> > > >> com.google.protobuf.ServiceException: java.net.ConnectException:
> > > >> Connection
> > > >> refused
> > > >> at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > > >> at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > > >> at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > > >> at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > > >> at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > > >> at
> > > >>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > > >>
> > > >> Is the exception means the HMaster not working normally or somewhat?
> > > >>
> > > >>
> > > >>
> > > >> 2015-05-17 11:06 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > >>
> > > >> > bq. the HMaster is handling two region server down, and not ready
> to
> > > >> handle
> > > >> > client request?
> > > >> >
> > > >> > I didn't mean that - for a functioning master, handling region
> > server
> > > >> > shutdown is part of the master's job.
> > > >> >
> > > >> > You should see something similar to the following in (functioning)
> > > >> master
> > > >> > log:
> > > >> >
> > > >> > 2015-05-13 04:06:36,266 INFO  [master:c6401:60000]
> > > master.ServerManager:
> > > >> > Finished waiting for region servers count to settle; checked in 1,
> > > slept
> > > >> > for 71582 ms, expecting minimum of 1, maximum of 2147483647,
> master
> > > is
> > > >> > running.
> > > >> >
> > > >> > bq. wait the backend HMaster to take over
> > > >> >
> > > >> > Was there exception in backup master log after it took over ?
> > > >> >
> > > >> > On Sat, May 16, 2015 at 6:44 PM, Louis Hust <louis.hust@gmail.com
> >
> > > >> wrote:
> > > >> >
> > > >> > > hi,ted,
> > > >> > >
> > > >> > > Thanks very much!
> > > >> > >
> > > >> > > Namenode process was not running on
> > > >> l-namenode2.dba.cn8(192.168.39.22),
> > > >> > > just HMaster run on it。
> > > >> > > So you means that at 2015-05-15 12:15:04,  the HMaster is
> handling
> > > two
> > > >> > > region server down, and not
> > > >> > > ready to handle client request? And how can i tell when the
> > HMaster
> > > is
> > > >> > > ready to handle client request
> > > >> > > from the logs?
> > > >> > >
> > > >> > > I stop the Hmaster at 12:15:58 cause the HMaster can not
> handling
> > > >> > request,
> > > >> > > so i want to stop it and wait
> > > >> > > the backend HMaster to take over.
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > >
> > > >> > > 2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > >> > >
> > > >> > > > In the period you identified, master was assigning regions.
> > > >> > > > e.g.
> > > >> > > >
> > > >> > > > 2015-05-15 12:13:09,683 INFO
> > > >> > > > [l-namenode2.dba.cn8.qunar.com
> > > >> > > ,60000,1431663090427-GeneralBulkAssigner-0]
> > > >> > > > master.RegionStates: Transitioned
> > > {c634280ce287b2d6cebd88b61accf685
> > > >> > > > state=OFFLINE, ts=1431663189621, server=null} to
> > > >> > > > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN,
> > > >> ts=1431663189683,
> > > >> > > > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > >> > > > 2015-05-15 12:13:09,683 INFO
> > > >> > > > [l-namenode2.dba.cn8.qunar.com
> > > >> > > ,60000,1431663090427-GeneralBulkAssigner-2]
> > > >> > > > master.RegionStates: Transitioned
> > > {2f60b1b4e51d32ef98ad19690f13a565
> > > >> > > > state=OFFLINE, ts=1431663189621, server=null} to
> > > >> > > > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN,
> > > >> ts=1431663189683,
> > > >> > > > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
> > > >> > > >
> > > >> > > > Then two region servers went down:
> > > >> > > >
> > > >> > > > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > > deleted,
> > > >> > > > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
> > > >> > > >  1431663208899]
> > > >> > > > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > > deleted,
> > > >> > > > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
> > > >> > > >  1431663193865]
> > > >> > > >
> > > >> > > > Master was stopped afterwards:
> > > >> > > >
> > > >> > > > Fri May 15 12:15:58 CST 2015 Terminating master
> > > >> > > >
> > > >> > > > Namenode process was running on l-namenode2.dba.cn8, right ?
> > > >> > > >
> > > >> > > > Cheers
> > > >> > > >
> > > >> > > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <
> > louis.hust@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > hi, TED,
> > > >> > > > > Any idea?
> > > >> > > > > When the HMaster restart, how can i know when it is really
> can
> > > >> handle
> > > >> > > > > request from application? is there any mark in logs?
> > > >> > > > >
> > > >> > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust <louis.hust@gmail.com
> >:
> > > >> > > > >
> > > >> > > > > > @Ted,
> > > >> > > > > > plz see the log from 12:11:29 to 12:15:28, this timerange
> > the
> > > >> > HMaster
> > > >> > > > is
> > > >> > > > > > in restarting stage, but can not handle request from
> client?
> > > Is
> > > >> the
> > > >> > > > > HMaster
> > > >> > > > > > recovering or do something else?
> > > >> > > > > >
> > > >> > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <
> louis.hust@gmail.com
> > >:
> > > >> > > > > >
> > > >> > > > > >> OK, you can get the log from
> > > >> > > > > >> http://pan.baidu.com/s/1pqS6E
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > >> > > > > >>
> > > >> > > > > >>> Can you check server log on 192.168.39.22
> > > >> > > > > >>> <
> http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000
> > >
> > > ?
> > > >> > > > > >>>
> > > >> > > > > >>> That should give you some clue.
> > > >> > > > > >>>
> > > >> > > > > >>> Cheers
> > > >> > > > > >>>
> > > >> > > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <
> > > >> > louis.hust@gmail.com>
> > > >> > > > > >>> wrote:
> > > >> > > > > >>>
> > > >> > > > > >>> > Hi all,
> > > >> > > > > >>> >
> > > >> > > > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> > > >> > > > > >>> > and the custom said they can not write into hbase
> > cluster,
> > > >> > > > > >>> > So i stop the HMaster and start it soon,
> > > >> > > > > >>> >
> > > >> > > > > >>> > But it seems that the HMaster not response to request,
> > > >> > following
> > > >> > > is
> > > >> > > > > the
> > > >> > > > > >>> > HMaster log:
> > > >> > > > > >>> >
> > > >> > > > > >>> > {logs}
> > > >> > > > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> > > >> > > > > >>> master.RegionStates:
> > > >> > > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa
> > > >> state=OPENING,
> > > >> > > > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> > > >> > > > > >>> ,60020,1431462615651}
> > > >> > > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
> > > >> > > ts=1431663213136,
> > > >> > > > > >>> server=
> > > >> > > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > >> > > > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> > > >> > > > > >>> master.RegionStates:
> > > >> > > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> > > >> > > > > >>> l-hbase26.data.cn8.qunar.com
> > > >> > > > > >>> > ,60020,1431462615651
> > > >> > > > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral
> > node
> > > >> > > deleted,
> > > >> > > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> > > >> > > > > >>> ,60020,1431663208899]
> > > >> > > > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral
> > node
> > > >> > > deleted,
> > > >> > > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> > > >> > > > > >>> ,60020,1431663193865]
> > > >> > > > > >>> > 2015-05-15 12:15:24,465 WARN
> > [249240421@qtp-591022857-33
> > > ]
> > > >> > > > > >>> > client.HConnectionManager$HConnectionImplementation:
> > > >> Checking
> > > >> > > > master
> > > >> > > > > >>> > connection
> > > >> > > > > >>> > com.google.protobuf.ServiceException:
> > > >> > > > > java.net.SocketTimeoutException:
> > > >> > > > > >>> Call
> > > >> > > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000
> > > failed
> > > >> > > > because
> > > >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout
> > > while
> > > >> > > waiting
> > > >> > > > > for
> > > >> > > > > >>> > channel to be ready for read. ch :
> > > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > >> > > > 192.168.39.22:47700
> > > >> > > > > >>> > remote=
> > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > >
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > > >> > > > > >>> > at
> > > >> javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > > >> > > > > >>> > at
> > > >> javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > >
> > > >>
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >> > > > > >>> > at
> > > >> > > > >
> > > >> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > >
> > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > >> > > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > >
> > > >>
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > >> > > > > >>> > at
> > > >> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > >> > > > > >>> > at
> > > >> > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > >> > > > > >>> > at
> > > >> > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > >> > > > > >>> > Caused by: java.net.SocketTimeoutException: Call to
> > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000
> > failed
> > > >> > because
> > > >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout
> > > while
> > > >> > > waiting
> > > >> > > > > for
> > > >> > > > > >>> > channel to be ready for read. ch :
> > > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > >> > > > 192.168.39.22:47700
> > > >> > > > > >>> > remote=
> > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > > >> > > > > >>> > at
> > > >> > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > > >> > > > > >>> > ... 37 more
> > > >> > > > > >>> > Caused by: java.net.SocketTimeoutException: 60000
> millis
> > > >> > timeout
> > > >> > > > > while
> > > >> > > > > >>> > waiting for channel to be ready for read. ch :
> > > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > >> > > > 192.168.39.22:47700
> > > >> > > > > >>> > remote=
> > > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > > >> > > > > >>> > at
> > > >> java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >> > > > > >>> > at
> > > >> java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > > >> > > > > >>> > at
> > > >> > java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > > >> > > > > >>> > at
> > > >> > java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > > >> > > > > >>> > at
> > > java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > >> > > > > >>> > at
> > > >> > > > > >>> >
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > > >> > > > > >>> > at
> > > >> > > > > >>>
> > > >> > > > >
> > > >> > >
> > > >>
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > > >> > > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> > > >> > > > > >>> > {/logs}
> > > >> > > > > >>> > So what the exception means? Why? and how to solve the
> > > >> problem?
> > > >> > > > > >>> >
> > > >> > > > > >>>
> > > >> > > > > >>
> > > >> > > > > >>
> > > >> > > > > >
> > > >> > > > >
> > > >> > > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: HMaster restart with error

Posted by Ted Yu <yu...@gmail.com>.

The exception originated from Web UI corresponding
to HBaseAdmin.listTables().
At that moment, master was unable to process the request - it needed some
more time.

Cheers

On Sun, May 17, 2015 at 11:03 PM, Louis Hust <lo...@gmail.com> wrote:

> Yes, ted, can you tell me what the following excpetion means in
> l-namenode1.log?
>
> 2015-05-15 12:16:40,522 INFO
>  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> handler.ServerShutdownHandler: Finished processing of shutdown of
> l-hbase31.data.cn8.qunar.com,60020,1427789773001
> 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
> client.HConnectionManager$HConnectionImplementation: Checking master
> connection
>
> Does this mean the cluster was not operational?
>
>
> 2015-05-18 11:45 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > After l-namenode1 became active master , it assigned regions:
> >
> > 2015-05-15 12:16:40,432 INFO  [master:l-namenode1:60000]
> > master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff
> > state=OFFLINE, ts=1431663400432, server=null} to
> > {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server=
> > l-hbase31.data.cn8.qunar.com,60020,1431462584879}
> >
> > However, l-hbase31 went down:
> >
> > 2015-05-15 12:16:40,508 INFO
> >  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> > handler.ServerShutdownHandler: Splitting logs for
> > l-hbase31.data.cn8.qunar.com,60020,1427789773001   before assignment.
> >
> > l-namenode1 was restarted :
> >
> > 2015-05-15 12:20:25,322 INFO  [main] util.VersionInfo: HBase
> 0.96.0-hadoop2
> > 2015-05-15 12:20:25,323 INFO  [main] util.VersionInfo: Subversion
> > https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434
> >
> > However, it went down due to zookeeper session expiration:
> >
> > 2015-05-15 12:20:25,580 WARN  [main] zookeeper.ZooKeeperNodeTracker:
> Can't
> > get or delete the master znode
> > org.apache.zookeeper.KeeperException$SessionExpiredException:
> > KeeperErrorCode = Session expired for /hbase/master
> >
> > It started again after that and AssignmentManager did a lot of
> assignments.
> >
> > Looks like the cluster was operational this time.
> >
> > Cheers
> >
> > On Sun, May 17, 2015 at 8:24 AM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > bq. the backup master take over at 2015-05-15 12:16:40,024 ?
> > >
> > > The switch of active master should be earlier than 12:16:40,024 -
> shortly
> > > after 12:15:58
> > >
> > > l-namenode1 would do some initialization (such as waiting for region
> > > servers count to settle) after it became active master.
> > >
> > > I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but
> > > the download progress was very slow.
> > >
> > > Will try downloading later in the day.
> > >
> > > Do you have access to pastebin ?
> > >
> > > Cheers
> > >
> > > On Sun, May 17, 2015 at 2:07 AM, Louis Hust <lo...@gmail.com>
> > wrote:
> > >
> > >> Hi, ted,
> > >>
> > >> Thanks for your reply!!
> > >>
> > >> I found the log in l-namenode2.dba.cn8 during the restarting progress:
> > >> 2015-05-15 12:11:36,540 INFO  [master:l-namenode2:60000]
> > >> master.ServerManager: Finished waiting for region servers count to
> > settle;
> > >> checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
> > >> 2147483647, master is running.
> > >>
> > >> So this means the HMaster ready for handle request at 12:11:36?
> > >>
> > >> The backup master is l-namenode1.dba.cn8 and you can get the log at :
> > >>
> > >> http://pan.baidu.com/s/1eQlKXj0
> > >>
> > >> After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
> > >> the backup master l-namenode1 take over, and i found log:
> > >>
> > >> 2015-05-15 12:16:40,024 INFO  [master:l-namenode1:60000]
> > >> master.ServerManager: Finished waiting for region servers count to
> > settle;
> > >> checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
> > >> 2147483647, master is running.
> > >>
> > >> So the backup master take over at 2015-05-15 12:16:40,024 ?
> > >>
> > >> But it seems the l-namenode2 not working normally with the exception
> in
> > >> log:
> > >>
> > >> 2015-05-15 12:16:40,522 INFO
> > >>  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> > >> handler.ServerShutdownHandler: Finished processing of shutdown of
> > >> l-hbase31.data.cn8.qunar.com,60020,1427789773001
> > >> 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
> > >> client.HConnectionManager$HConnectionImplementation: Checking master
> > >> connection
> > >> com.google.protobuf.ServiceException: java.net.ConnectException:
> > >> Connection
> > >> refused
> > >> at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > >> at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > >> at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > >> at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > >> at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > >> at
> > >>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > >>
> > >> Is the exception means the HMaster not working normally or somewhat?
> > >>
> > >>
> > >>
> > >> 2015-05-17 11:06 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >>
> > >> > bq. the HMaster is handling two region server down, and not ready to
> > >> handle
> > >> > client request?
> > >> >
> > >> > I didn't mean that - for a functioning master, handling region
> server
> > >> > shutdown is part of the master's job.
> > >> >
> > >> > You should see something similar to the following in (functioning)
> > >> master
> > >> > log:
> > >> >
> > >> > 2015-05-13 04:06:36,266 INFO  [master:c6401:60000]
> > master.ServerManager:
> > >> > Finished waiting for region servers count to settle; checked in 1,
> > slept
> > >> > for 71582 ms, expecting minimum of 1, maximum of 2147483647, master
> > is
> > >> > running.
> > >> >
> > >> > bq. wait the backend HMaster to take over
> > >> >
> > >> > Was there exception in backup master log after it took over ?
> > >> >
> > >> > On Sat, May 16, 2015 at 6:44 PM, Louis Hust <lo...@gmail.com>
> > >> wrote:
> > >> >
> > >> > > hi,ted,
> > >> > >
> > >> > > Thanks very much!
> > >> > >
> > >> > > Namenode process was not running on
> > >> l-namenode2.dba.cn8(192.168.39.22),
> > >> > > just HMaster run on it。
> > >> > > So you means that at 2015-05-15 12:15:04,  the HMaster is handling
> > two
> > >> > > region server down, and not
> > >> > > ready to handle client request? And how can i tell when the
> HMaster
> > is
> > >> > > ready to handle client request
> > >> > > from the logs?
> > >> > >
> > >> > > I stop the Hmaster at 12:15:58 cause the HMaster can not handling
> > >> > request,
> > >> > > so i want to stop it and wait
> > >> > > the backend HMaster to take over.
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > 2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >> > >
> > >> > > > In the period you identified, master was assigning regions.
> > >> > > > e.g.
> > >> > > >
> > >> > > > 2015-05-15 12:13:09,683 INFO
> > >> > > > [l-namenode2.dba.cn8.qunar.com
> > >> > > ,60000,1431663090427-GeneralBulkAssigner-0]
> > >> > > > master.RegionStates: Transitioned
> > {c634280ce287b2d6cebd88b61accf685
> > >> > > > state=OFFLINE, ts=1431663189621, server=null} to
> > >> > > > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN,
> > >> ts=1431663189683,
> > >> > > > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > >> > > > 2015-05-15 12:13:09,683 INFO
> > >> > > > [l-namenode2.dba.cn8.qunar.com
> > >> > > ,60000,1431663090427-GeneralBulkAssigner-2]
> > >> > > > master.RegionStates: Transitioned
> > {2f60b1b4e51d32ef98ad19690f13a565
> > >> > > > state=OFFLINE, ts=1431663189621, server=null} to
> > >> > > > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN,
> > >> ts=1431663189683,
> > >> > > > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
> > >> > > >
> > >> > > > Then two region servers went down:
> > >> > > >
> > >> > > > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > deleted,
> > >> > > > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
> > >> > > >  1431663208899]
> > >> > > > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > deleted,
> > >> > > > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
> > >> > > >  1431663193865]
> > >> > > >
> > >> > > > Master was stopped afterwards:
> > >> > > >
> > >> > > > Fri May 15 12:15:58 CST 2015 Terminating master
> > >> > > >
> > >> > > > Namenode process was running on l-namenode2.dba.cn8, right ?
> > >> > > >
> > >> > > > Cheers
> > >> > > >
> > >> > > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <
> louis.hust@gmail.com
> > >
> > >> > > wrote:
> > >> > > >
> > >> > > > > hi, TED,
> > >> > > > > Any idea?
> > >> > > > > When the HMaster restart, how can i know when it is really can
> > >> handle
> > >> > > > > request from application? is there any mark in logs?
> > >> > > > >
> > >> > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > >> > > > >
> > >> > > > > > @Ted,
> > >> > > > > > plz see the log from 12:11:29 to 12:15:28, this timerange
> the
> > >> > HMaster
> > >> > > > is
> > >> > > > > > in restarting stage, but can not handle request from client?
> > Is
> > >> the
> > >> > > > > HMaster
> > >> > > > > > recovering or do something else?
> > >> > > > > >
> > >> > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <louis.hust@gmail.com
> >:
> > >> > > > > >
> > >> > > > > >> OK, you can get the log from
> > >> > > > > >> http://pan.baidu.com/s/1pqS6E
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >> > > > > >>
> > >> > > > > >>> Can you check server log on 192.168.39.22
> > >> > > > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000
> >
> > ?
> > >> > > > > >>>
> > >> > > > > >>> That should give you some clue.
> > >> > > > > >>>
> > >> > > > > >>> Cheers
> > >> > > > > >>>
> > >> > > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <
> > >> > louis.hust@gmail.com>
> > >> > > > > >>> wrote:
> > >> > > > > >>>
> > >> > > > > >>> > Hi all,
> > >> > > > > >>> >
> > >> > > > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> > >> > > > > >>> > and the custom said they can not write into hbase
> cluster,
> > >> > > > > >>> > So i stop the HMaster and start it soon,
> > >> > > > > >>> >
> > >> > > > > >>> > But it seems that the HMaster not response to request,
> > >> > following
> > >> > > is
> > >> > > > > the
> > >> > > > > >>> > HMaster log:
> > >> > > > > >>> >
> > >> > > > > >>> > {logs}
> > >> > > > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> > >> > > > > >>> master.RegionStates:
> > >> > > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa
> > >> state=OPENING,
> > >> > > > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> > >> > > > > >>> ,60020,1431462615651}
> > >> > > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
> > >> > > ts=1431663213136,
> > >> > > > > >>> server=
> > >> > > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > >> > > > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> > >> > > > > >>> master.RegionStates:
> > >> > > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> > >> > > > > >>> l-hbase26.data.cn8.qunar.com
> > >> > > > > >>> > ,60020,1431462615651
> > >> > > > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral
> node
> > >> > > deleted,
> > >> > > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> > >> > > > > >>> ,60020,1431663208899]
> > >> > > > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral
> node
> > >> > > deleted,
> > >> > > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> > >> > > > > >>> ,60020,1431663193865]
> > >> > > > > >>> > 2015-05-15 12:15:24,465 WARN
> [249240421@qtp-591022857-33
> > ]
> > >> > > > > >>> > client.HConnectionManager$HConnectionImplementation:
> > >> Checking
> > >> > > > master
> > >> > > > > >>> > connection
> > >> > > > > >>> > com.google.protobuf.ServiceException:
> > >> > > > > java.net.SocketTimeoutException:
> > >> > > > > >>> Call
> > >> > > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000
> > failed
> > >> > > > because
> > >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout
> > while
> > >> > > waiting
> > >> > > > > for
> > >> > > > > >>> > channel to be ready for read. ch :
> > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > >> > > > 192.168.39.22:47700
> > >> > > > > >>> > remote=
> > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > >> > > > > >>> > at
> > >> javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > >> > > > > >>> > at
> > >> javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > >
> > >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >> > > > > >>> > at
> > >> > > > >
> > >> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > >
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > >> > > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > >
> > >>
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > >> > > > > >>> > at
> > >> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > >> > > > > >>> > at
> > >> > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > >> > > > > >>> > at
> > >> > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > >> > > > > >>> > Caused by: java.net.SocketTimeoutException: Call to
> > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000
> failed
> > >> > because
> > >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout
> > while
> > >> > > waiting
> > >> > > > > for
> > >> > > > > >>> > channel to be ready for read. ch :
> > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > >> > > > 192.168.39.22:47700
> > >> > > > > >>> > remote=
> > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > >> > > > > >>> > at
> > >> > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > >> > > > > >>> > ... 37 more
> > >> > > > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis
> > >> > timeout
> > >> > > > > while
> > >> > > > > >>> > waiting for channel to be ready for read. ch :
> > >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > >> > > > 192.168.39.22:47700
> > >> > > > > >>> > remote=
> > >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > >> > > > > >>> > at
> > >> java.io.FilterInputStream.read(FilterInputStream.java:133)
> > >> > > > > >>> > at
> > >> java.io.FilterInputStream.read(FilterInputStream.java:133)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > >> > > > > >>> > at
> > >> > java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > >> > > > > >>> > at
> > >> > java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > >> > > > > >>> > at
> > java.io.DataInputStream.readInt(DataInputStream.java:387)
> > >> > > > > >>> > at
> > >> > > > > >>> >
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > >> > > > > >>> > at
> > >> > > > > >>>
> > >> > > > >
> > >> > >
> > >>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > >> > > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> > >> > > > > >>> > {/logs}
> > >> > > > > >>> > So what the exception means? Why? and how to solve the
> > >> problem?
> > >> > > > > >>> >
> > >> > > > > >>>
> > >> > > > > >>
> > >> > > > > >>
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: HMaster restart with error

Posted by Louis Hust <lo...@gmail.com>.

Yes, ted, can you tell me what the following excpetion means in
l-namenode1.log?

2015-05-15 12:16:40,522 INFO
 [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
handler.ServerShutdownHandler: Finished processing of shutdown of
l-hbase31.data.cn8.qunar.com,60020,1427789773001
2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
client.HConnectionManager$HConnectionImplementation: Checking master
connection

Does this mean the cluster was not operational?


2015-05-18 11:45 GMT+08:00 Ted Yu <yu...@gmail.com>:

> After l-namenode1 became active master , it assigned regions:
>
> 2015-05-15 12:16:40,432 INFO  [master:l-namenode1:60000]
> master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff
> state=OFFLINE, ts=1431663400432, server=null} to
> {6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server=
> l-hbase31.data.cn8.qunar.com,60020,1431462584879}
>
> However, l-hbase31 went down:
>
> 2015-05-15 12:16:40,508 INFO
>  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> handler.ServerShutdownHandler: Splitting logs for
> l-hbase31.data.cn8.qunar.com,60020,1427789773001   before assignment.
>
> l-namenode1 was restarted :
>
> 2015-05-15 12:20:25,322 INFO  [main] util.VersionInfo: HBase 0.96.0-hadoop2
> 2015-05-15 12:20:25,323 INFO  [main] util.VersionInfo: Subversion
> https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434
>
> However, it went down due to zookeeper session expiration:
>
> 2015-05-15 12:20:25,580 WARN  [main] zookeeper.ZooKeeperNodeTracker: Can't
> get or delete the master znode
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
>
> It started again after that and AssignmentManager did a lot of assignments.
>
> Looks like the cluster was operational this time.
>
> Cheers
>
> On Sun, May 17, 2015 at 8:24 AM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. the backup master take over at 2015-05-15 12:16:40,024 ?
> >
> > The switch of active master should be earlier than 12:16:40,024 - shortly
> > after 12:15:58
> >
> > l-namenode1 would do some initialization (such as waiting for region
> > servers count to settle) after it became active master.
> >
> > I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but
> > the download progress was very slow.
> >
> > Will try downloading later in the day.
> >
> > Do you have access to pastebin ?
> >
> > Cheers
> >
> > On Sun, May 17, 2015 at 2:07 AM, Louis Hust <lo...@gmail.com>
> wrote:
> >
> >> Hi, ted,
> >>
> >> Thanks for your reply!!
> >>
> >> I found the log in l-namenode2.dba.cn8 during the restarting progress:
> >> 2015-05-15 12:11:36,540 INFO  [master:l-namenode2:60000]
> >> master.ServerManager: Finished waiting for region servers count to
> settle;
> >> checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
> >> 2147483647, master is running.
> >>
> >> So this means the HMaster ready for handle request at 12:11:36?
> >>
> >> The backup master is l-namenode1.dba.cn8 and you can get the log at :
> >>
> >> http://pan.baidu.com/s/1eQlKXj0
> >>
> >> After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
> >> the backup master l-namenode1 take over, and i found log:
> >>
> >> 2015-05-15 12:16:40,024 INFO  [master:l-namenode1:60000]
> >> master.ServerManager: Finished waiting for region servers count to
> settle;
> >> checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
> >> 2147483647, master is running.
> >>
> >> So the backup master take over at 2015-05-15 12:16:40,024 ?
> >>
> >> But it seems the l-namenode2 not working normally with the exception in
> >> log:
> >>
> >> 2015-05-15 12:16:40,522 INFO
> >>  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> >> handler.ServerShutdownHandler: Finished processing of shutdown of
> >> l-hbase31.data.cn8.qunar.com,60020,1427789773001
> >> 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
> >> client.HConnectionManager$HConnectionImplementation: Checking master
> >> connection
> >> com.google.protobuf.ServiceException: java.net.ConnectException:
> >> Connection
> >> refused
> >> at
> >>
> >>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> >> at
> >>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> >>
> >> Is the exception means the HMaster not working normally or somewhat?
> >>
> >>
> >>
> >> 2015-05-17 11:06 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >>
> >> > bq. the HMaster is handling two region server down, and not ready to
> >> handle
> >> > client request?
> >> >
> >> > I didn't mean that - for a functioning master, handling region server
> >> > shutdown is part of the master's job.
> >> >
> >> > You should see something similar to the following in (functioning)
> >> master
> >> > log:
> >> >
> >> > 2015-05-13 04:06:36,266 INFO  [master:c6401:60000]
> master.ServerManager:
> >> > Finished waiting for region servers count to settle; checked in 1,
> slept
> >> > for 71582 ms, expecting minimum of 1, maximum of 2147483647, master
> is
> >> > running.
> >> >
> >> > bq. wait the backend HMaster to take over
> >> >
> >> > Was there exception in backup master log after it took over ?
> >> >
> >> > On Sat, May 16, 2015 at 6:44 PM, Louis Hust <lo...@gmail.com>
> >> wrote:
> >> >
> >> > > hi,ted,
> >> > >
> >> > > Thanks very much!
> >> > >
> >> > > Namenode process was not running on
> >> l-namenode2.dba.cn8(192.168.39.22),
> >> > > just HMaster run on it。
> >> > > So you means that at 2015-05-15 12:15:04,  the HMaster is handling
> two
> >> > > region server down, and not
> >> > > ready to handle client request? And how can i tell when the HMaster
> is
> >> > > ready to handle client request
> >> > > from the logs?
> >> > >
> >> > > I stop the Hmaster at 12:15:58 cause the HMaster can not handling
> >> > request,
> >> > > so i want to stop it and wait
> >> > > the backend HMaster to take over.
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > 2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >> > >
> >> > > > In the period you identified, master was assigning regions.
> >> > > > e.g.
> >> > > >
> >> > > > 2015-05-15 12:13:09,683 INFO
> >> > > > [l-namenode2.dba.cn8.qunar.com
> >> > > ,60000,1431663090427-GeneralBulkAssigner-0]
> >> > > > master.RegionStates: Transitioned
> {c634280ce287b2d6cebd88b61accf685
> >> > > > state=OFFLINE, ts=1431663189621, server=null} to
> >> > > > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN,
> >> ts=1431663189683,
> >> > > > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> >> > > > 2015-05-15 12:13:09,683 INFO
> >> > > > [l-namenode2.dba.cn8.qunar.com
> >> > > ,60000,1431663090427-GeneralBulkAssigner-2]
> >> > > > master.RegionStates: Transitioned
> {2f60b1b4e51d32ef98ad19690f13a565
> >> > > > state=OFFLINE, ts=1431663189621, server=null} to
> >> > > > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN,
> >> ts=1431663189683,
> >> > > > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
> >> > > >
> >> > > > Then two region servers went down:
> >> > > >
> >> > > > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node
> deleted,
> >> > > > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
> >> > > >  1431663208899]
> >> > > > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> >> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node
> deleted,
> >> > > > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
> >> > > >  1431663193865]
> >> > > >
> >> > > > Master was stopped afterwards:
> >> > > >
> >> > > > Fri May 15 12:15:58 CST 2015 Terminating master
> >> > > >
> >> > > > Namenode process was running on l-namenode2.dba.cn8, right ?
> >> > > >
> >> > > > Cheers
> >> > > >
> >> > > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <louis.hust@gmail.com
> >
> >> > > wrote:
> >> > > >
> >> > > > > hi, TED,
> >> > > > > Any idea?
> >> > > > > When the HMaster restart, how can i know when it is really can
> >> handle
> >> > > > > request from application? is there any mark in logs?
> >> > > > >
> >> > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
> >> > > > >
> >> > > > > > @Ted,
> >> > > > > > plz see the log from 12:11:29 to 12:15:28, this timerange the
> >> > HMaster
> >> > > > is
> >> > > > > > in restarting stage, but can not handle request from client?
> Is
> >> the
> >> > > > > HMaster
> >> > > > > > recovering or do something else?
> >> > > > > >
> >> > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
> >> > > > > >
> >> > > > > >> OK, you can get the log from
> >> > > > > >> http://pan.baidu.com/s/1pqS6E
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >> > > > > >>
> >> > > > > >>> Can you check server log on 192.168.39.22
> >> > > > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000>
> ?
> >> > > > > >>>
> >> > > > > >>> That should give you some clue.
> >> > > > > >>>
> >> > > > > >>> Cheers
> >> > > > > >>>
> >> > > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <
> >> > louis.hust@gmail.com>
> >> > > > > >>> wrote:
> >> > > > > >>>
> >> > > > > >>> > Hi all,
> >> > > > > >>> >
> >> > > > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> >> > > > > >>> > and the custom said they can not write into hbase cluster,
> >> > > > > >>> > So i stop the HMaster and start it soon,
> >> > > > > >>> >
> >> > > > > >>> > But it seems that the HMaster not response to request,
> >> > following
> >> > > is
> >> > > > > the
> >> > > > > >>> > HMaster log:
> >> > > > > >>> >
> >> > > > > >>> > {logs}
> >> > > > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> >> > > > > >>> master.RegionStates:
> >> > > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa
> >> state=OPENING,
> >> > > > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> >> > > > > >>> ,60020,1431462615651}
> >> > > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
> >> > > ts=1431663213136,
> >> > > > > >>> server=
> >> > > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> >> > > > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> >> > > > > >>> master.RegionStates:
> >> > > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> >> > > > > >>> l-hbase26.data.cn8.qunar.com
> >> > > > > >>> > ,60020,1431462615651
> >> > > > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> >> > > deleted,
> >> > > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> >> > > > > >>> ,60020,1431663208899]
> >> > > > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> >> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> >> > > deleted,
> >> > > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> >> > > > > >>> ,60020,1431663193865]
> >> > > > > >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33
> ]
> >> > > > > >>> > client.HConnectionManager$HConnectionImplementation:
> >> Checking
> >> > > > master
> >> > > > > >>> > connection
> >> > > > > >>> > com.google.protobuf.ServiceException:
> >> > > > > java.net.SocketTimeoutException:
> >> > > > > >>> Call
> >> > > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000
> failed
> >> > > > because
> >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout
> while
> >> > > waiting
> >> > > > > for
> >> > > > > >>> > channel to be ready for read. ch :
> >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> >> > > > 192.168.39.22:47700
> >> > > > > >>> > remote=
> >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > >
> >>
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> >> > > > > >>> > at
> >> javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> >> > > > > >>> > at
> >> javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > >
> >> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >> > > > > >>> > at
> >> > > > >
> >> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > >
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >> > > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > >
> >> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >> > > > > >>> > at
> >> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> >> > > > > >>> > at
> >> > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> >> > > > > >>> > at
> >> > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >> > > > > >>> > Caused by: java.net.SocketTimeoutException: Call to
> >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> >> > because
> >> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout
> while
> >> > > waiting
> >> > > > > for
> >> > > > > >>> > channel to be ready for read. ch :
> >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> >> > > > 192.168.39.22:47700
> >> > > > > >>> > remote=
> >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> >> > > > > >>> > at
> >> > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> >> > > > > >>> > ... 37 more
> >> > > > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis
> >> > timeout
> >> > > > > while
> >> > > > > >>> > waiting for channel to be ready for read. ch :
> >> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> >> > > > 192.168.39.22:47700
> >> > > > > >>> > remote=
> >> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> >> > > > > >>> > at
> >> java.io.FilterInputStream.read(FilterInputStream.java:133)
> >> > > > > >>> > at
> >> java.io.FilterInputStream.read(FilterInputStream.java:133)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> >> > > > > >>> > at
> >> > java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >> > > > > >>> > at
> >> > java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> >> > > > > >>> > at
> java.io.DataInputStream.readInt(DataInputStream.java:387)
> >> > > > > >>> > at
> >> > > > > >>> >
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> >> > > > > >>> > at
> >> > > > > >>>
> >> > > > >
> >> > >
> >> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> >> > > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> >> > > > > >>> > {/logs}
> >> > > > > >>> > So what the exception means? Why? and how to solve the
> >> problem?
> >> > > > > >>> >
> >> > > > > >>>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: HMaster restart with error

Posted by Ted Yu <yu...@gmail.com>.

After l-namenode1 became active master , it assigned regions:

2015-05-15 12:16:40,432 INFO  [master:l-namenode1:60000]
master.RegionStates: Transitioned {6f806bb62b347c992cd243fc909276ff
state=OFFLINE, ts=1431663400432, server=null} to
{6f806bb62b347c992cd243fc909276ff state=OPEN, ts=1431663400432, server=
l-hbase31.data.cn8.qunar.com,60020,1431462584879}

However, l-hbase31 went down:

2015-05-15 12:16:40,508 INFO
 [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
handler.ServerShutdownHandler: Splitting logs for
l-hbase31.data.cn8.qunar.com,60020,1427789773001   before assignment.

l-namenode1 was restarted :

2015-05-15 12:20:25,322 INFO  [main] util.VersionInfo: HBase 0.96.0-hadoop2
2015-05-15 12:20:25,323 INFO  [main] util.VersionInfo: Subversion
https://svn.apache.org/repos/asf/hbase/branches/0.96 -r 1531434

However, it went down due to zookeeper session expiration:

2015-05-15 12:20:25,580 WARN  [main] zookeeper.ZooKeeperNodeTracker: Can't
get or delete the master znode
org.apache.zookeeper.KeeperException$SessionExpiredException:
KeeperErrorCode = Session expired for /hbase/master

It started again after that and AssignmentManager did a lot of assignments.

Looks like the cluster was operational this time.

Cheers

On Sun, May 17, 2015 at 8:24 AM, Ted Yu <yu...@gmail.com> wrote:

> bq. the backup master take over at 2015-05-15 12:16:40,024 ?
>
> The switch of active master should be earlier than 12:16:40,024 - shortly
> after 12:15:58
>
> l-namenode1 would do some initialization (such as waiting for region
> servers count to settle) after it became active master.
>
> I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but
> the download progress was very slow.
>
> Will try downloading later in the day.
>
> Do you have access to pastebin ?
>
> Cheers
>
> On Sun, May 17, 2015 at 2:07 AM, Louis Hust <lo...@gmail.com> wrote:
>
>> Hi, ted,
>>
>> Thanks for your reply!!
>>
>> I found the log in l-namenode2.dba.cn8 during the restarting progress:
>> 2015-05-15 12:11:36,540 INFO  [master:l-namenode2:60000]
>> master.ServerManager: Finished waiting for region servers count to settle;
>> checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
>> 2147483647, master is running.
>>
>> So this means the HMaster ready for handle request at 12:11:36?
>>
>> The backup master is l-namenode1.dba.cn8 and you can get the log at :
>>
>> http://pan.baidu.com/s/1eQlKXj0
>>
>> After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
>> the backup master l-namenode1 take over, and i found log:
>>
>> 2015-05-15 12:16:40,024 INFO  [master:l-namenode1:60000]
>> master.ServerManager: Finished waiting for region servers count to settle;
>> checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
>> 2147483647, master is running.
>>
>> So the backup master take over at 2015-05-15 12:16:40,024 ?
>>
>> But it seems the l-namenode2 not working normally with the exception in
>> log:
>>
>> 2015-05-15 12:16:40,522 INFO
>>  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
>> handler.ServerShutdownHandler: Finished processing of shutdown of
>> l-hbase31.data.cn8.qunar.com,60020,1427789773001
>> 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
>> client.HConnectionManager$HConnectionImplementation: Checking master
>> connection
>> com.google.protobuf.ServiceException: java.net.ConnectException:
>> Connection
>> refused
>> at
>>
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
>> at
>>
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>> at
>>
>> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
>> at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
>> at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
>> at
>>
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
>>
>> Is the exception means the HMaster not working normally or somewhat?
>>
>>
>>
>> 2015-05-17 11:06 GMT+08:00 Ted Yu <yu...@gmail.com>:
>>
>> > bq. the HMaster is handling two region server down, and not ready to
>> handle
>> > client request?
>> >
>> > I didn't mean that - for a functioning master, handling region server
>> > shutdown is part of the master's job.
>> >
>> > You should see something similar to the following in (functioning)
>> master
>> > log:
>> >
>> > 2015-05-13 04:06:36,266 INFO  [master:c6401:60000] master.ServerManager:
>> > Finished waiting for region servers count to settle; checked in 1, slept
>> > for 71582 ms, expecting minimum of 1, maximum of 2147483647, master is
>> > running.
>> >
>> > bq. wait the backend HMaster to take over
>> >
>> > Was there exception in backup master log after it took over ?
>> >
>> > On Sat, May 16, 2015 at 6:44 PM, Louis Hust <lo...@gmail.com>
>> wrote:
>> >
>> > > hi,ted,
>> > >
>> > > Thanks very much!
>> > >
>> > > Namenode process was not running on
>> l-namenode2.dba.cn8(192.168.39.22),
>> > > just HMaster run on it。
>> > > So you means that at 2015-05-15 12:15:04,  the HMaster is handling two
>> > > region server down, and not
>> > > ready to handle client request? And how can i tell when the HMaster is
>> > > ready to handle client request
>> > > from the logs?
>> > >
>> > > I stop the Hmaster at 12:15:58 cause the HMaster can not handling
>> > request,
>> > > so i want to stop it and wait
>> > > the backend HMaster to take over.
>> > >
>> > >
>> > >
>> > >
>> > > 2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:
>> > >
>> > > > In the period you identified, master was assigning regions.
>> > > > e.g.
>> > > >
>> > > > 2015-05-15 12:13:09,683 INFO
>> > > > [l-namenode2.dba.cn8.qunar.com
>> > > ,60000,1431663090427-GeneralBulkAssigner-0]
>> > > > master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685
>> > > > state=OFFLINE, ts=1431663189621, server=null} to
>> > > > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN,
>> ts=1431663189683,
>> > > > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
>> > > > 2015-05-15 12:13:09,683 INFO
>> > > > [l-namenode2.dba.cn8.qunar.com
>> > > ,60000,1431663090427-GeneralBulkAssigner-2]
>> > > > master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565
>> > > > state=OFFLINE, ts=1431663189621, server=null} to
>> > > > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN,
>> ts=1431663189683,
>> > > > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
>> > > >
>> > > > Then two region servers went down:
>> > > >
>> > > > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
>> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
>> > > > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
>> > > >  1431663208899]
>> > > > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
>> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
>> > > > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
>> > > >  1431663193865]
>> > > >
>> > > > Master was stopped afterwards:
>> > > >
>> > > > Fri May 15 12:15:58 CST 2015 Terminating master
>> > > >
>> > > > Namenode process was running on l-namenode2.dba.cn8, right ?
>> > > >
>> > > > Cheers
>> > > >
>> > > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <lo...@gmail.com>
>> > > wrote:
>> > > >
>> > > > > hi, TED,
>> > > > > Any idea?
>> > > > > When the HMaster restart, how can i know when it is really can
>> handle
>> > > > > request from application? is there any mark in logs?
>> > > > >
>> > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
>> > > > >
>> > > > > > @Ted,
>> > > > > > plz see the log from 12:11:29 to 12:15:28, this timerange the
>> > HMaster
>> > > > is
>> > > > > > in restarting stage, but can not handle request from client? Is
>> the
>> > > > > HMaster
>> > > > > > recovering or do something else?
>> > > > > >
>> > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
>> > > > > >
>> > > > > >> OK, you can get the log from
>> > > > > >> http://pan.baidu.com/s/1pqS6E
>> > > > > >>
>> > > > > >>
>> > > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
>> > > > > >>
>> > > > > >>> Can you check server log on 192.168.39.22
>> > > > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
>> > > > > >>>
>> > > > > >>> That should give you some clue.
>> > > > > >>>
>> > > > > >>> Cheers
>> > > > > >>>
>> > > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <
>> > louis.hust@gmail.com>
>> > > > > >>> wrote:
>> > > > > >>>
>> > > > > >>> > Hi all,
>> > > > > >>> >
>> > > > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
>> > > > > >>> > and the custom said they can not write into hbase cluster,
>> > > > > >>> > So i stop the HMaster and start it soon,
>> > > > > >>> >
>> > > > > >>> > But it seems that the HMaster not response to request,
>> > following
>> > > is
>> > > > > the
>> > > > > >>> > HMaster log:
>> > > > > >>> >
>> > > > > >>> > {logs}
>> > > > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
>> > > > > >>> master.RegionStates:
>> > > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa
>> state=OPENING,
>> > > > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
>> > > > > >>> ,60020,1431462615651}
>> > > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
>> > > ts=1431663213136,
>> > > > > >>> server=
>> > > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
>> > > > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
>> > > > > >>> master.RegionStates:
>> > > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
>> > > > > >>> l-hbase26.data.cn8.qunar.com
>> > > > > >>> > ,60020,1431462615651
>> > > > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
>> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
>> > > deleted,
>> > > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
>> > > > > >>> ,60020,1431663208899]
>> > > > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
>> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
>> > > deleted,
>> > > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
>> > > > > >>> ,60020,1431663193865]
>> > > > > >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
>> > > > > >>> > client.HConnectionManager$HConnectionImplementation:
>> Checking
>> > > > master
>> > > > > >>> > connection
>> > > > > >>> > com.google.protobuf.ServiceException:
>> > > > > java.net.SocketTimeoutException:
>> > > > > >>> Call
>> > > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
>> > > > because
>> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
>> > > waiting
>> > > > > for
>> > > > > >>> > channel to be ready for read. ch :
>> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
>> > > > 192.168.39.22:47700
>> > > > > >>> > remote=
>> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > >
>> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
>> > > > > >>> > at
>> javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>> > > > > >>> > at
>> javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>> > > > > >>> > at
>> > > > > >>>
>> > > >
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> > > > > >>> > at
>> > > > >
>> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>> > > > > >>> > at
>> > > > > >>>
>> > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> > > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
>> > > > > >>> > at
>> > > > > >>>
>> > > >
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>> > > > > >>> > at
>> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>> > > > > >>> > at
>> > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>> > > > > >>> > at
>> > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>> > > > > >>> > Caused by: java.net.SocketTimeoutException: Call to
>> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
>> > because
>> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
>> > > waiting
>> > > > > for
>> > > > > >>> > channel to be ready for read. ch :
>> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
>> > > > 192.168.39.22:47700
>> > > > > >>> > remote=
>> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>> > > > > >>> > at
>> > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>> > > > > >>> > ... 37 more
>> > > > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis
>> > timeout
>> > > > > while
>> > > > > >>> > waiting for channel to be ready for read. ch :
>> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
>> > > > 192.168.39.22:47700
>> > > > > >>> > remote=
>> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>> > > > > >>> > at
>> java.io.FilterInputStream.read(FilterInputStream.java:133)
>> > > > > >>> > at
>> java.io.FilterInputStream.read(FilterInputStream.java:133)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
>> > > > > >>> > at
>> > java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>> > > > > >>> > at
>> > java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>> > > > > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
>> > > > > >>> > at
>> > > > > >>> >
>> > > > > >>> >
>> > > > > >>>
>> > > > >
>> > > >
>> > >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
>> > > > > >>> > at
>> > > > > >>>
>> > > > >
>> > >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
>> > > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
>> > > > > >>> > {/logs}
>> > > > > >>> > So what the exception means? Why? and how to solve the
>> problem?
>> > > > > >>> >
>> > > > > >>>
>> > > > > >>
>> > > > > >>
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: HMaster restart with error

Posted by Ted Yu <yu...@gmail.com>.

bq. the backup master take over at 2015-05-15 12:16:40,024 ?

The switch of active master should be earlier than 12:16:40,024 - shortly
after 12:15:58

l-namenode1 would do some initialization (such as waiting for region
servers count to settle) after it became active master.

I tried to download from http://pan.baidu.com/s/1eQlKXj0 (at home) but the
download progress was very slow.

Will try downloading later in the day.

Do you have access to pastebin ?

Cheers

On Sun, May 17, 2015 at 2:07 AM, Louis Hust <lo...@gmail.com> wrote:

> Hi, ted,
>
> Thanks for your reply!!
>
> I found the log in l-namenode2.dba.cn8 during the restarting progress:
> 2015-05-15 12:11:36,540 INFO  [master:l-namenode2:60000]
> master.ServerManager: Finished waiting for region servers count to settle;
> checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
> 2147483647, master is running.
>
> So this means the HMaster ready for handle request at 12:11:36?
>
> The backup master is l-namenode1.dba.cn8 and you can get the log at :
>
> http://pan.baidu.com/s/1eQlKXj0
>
> After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
> the backup master l-namenode1 take over, and i found log:
>
> 2015-05-15 12:16:40,024 INFO  [master:l-namenode1:60000]
> master.ServerManager: Finished waiting for region servers count to settle;
> checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
> 2147483647, master is running.
>
> So the backup master take over at 2015-05-15 12:16:40,024 ?
>
> But it seems the l-namenode2 not working normally with the exception in
> log:
>
> 2015-05-15 12:16:40,522 INFO
>  [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
> handler.ServerShutdownHandler: Finished processing of shutdown of
> l-hbase31.data.cn8.qunar.com,60020,1427789773001
> 2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
> client.HConnectionManager$HConnectionImplementation: Checking master
> connection
> com.google.protobuf.ServiceException: java.net.ConnectException: Connection
> refused
> at
>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> at
>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> at
>
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
>
> Is the exception means the HMaster not working normally or somewhat?
>
>
>
> 2015-05-17 11:06 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > bq. the HMaster is handling two region server down, and not ready to
> handle
> > client request?
> >
> > I didn't mean that - for a functioning master, handling region server
> > shutdown is part of the master's job.
> >
> > You should see something similar to the following in (functioning) master
> > log:
> >
> > 2015-05-13 04:06:36,266 INFO  [master:c6401:60000] master.ServerManager:
> > Finished waiting for region servers count to settle; checked in 1, slept
> > for 71582 ms, expecting minimum of 1, maximum of 2147483647, master is
> > running.
> >
> > bq. wait the backend HMaster to take over
> >
> > Was there exception in backup master log after it took over ?
> >
> > On Sat, May 16, 2015 at 6:44 PM, Louis Hust <lo...@gmail.com>
> wrote:
> >
> > > hi,ted,
> > >
> > > Thanks very much!
> > >
> > > Namenode process was not running on l-namenode2.dba.cn8(192.168.39.22),
> > > just HMaster run on it。
> > > So you means that at 2015-05-15 12:15:04,  the HMaster is handling two
> > > region server down, and not
> > > ready to handle client request? And how can i tell when the HMaster is
> > > ready to handle client request
> > > from the logs?
> > >
> > > I stop the Hmaster at 12:15:58 cause the HMaster can not handling
> > request,
> > > so i want to stop it and wait
> > > the backend HMaster to take over.
> > >
> > >
> > >
> > >
> > > 2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >
> > > > In the period you identified, master was assigning regions.
> > > > e.g.
> > > >
> > > > 2015-05-15 12:13:09,683 INFO
> > > > [l-namenode2.dba.cn8.qunar.com
> > > ,60000,1431663090427-GeneralBulkAssigner-0]
> > > > master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685
> > > > state=OFFLINE, ts=1431663189621, server=null} to
> > > > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN,
> ts=1431663189683,
> > > > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > > 2015-05-15 12:13:09,683 INFO
> > > > [l-namenode2.dba.cn8.qunar.com
> > > ,60000,1431663090427-GeneralBulkAssigner-2]
> > > > master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565
> > > > state=OFFLINE, ts=1431663189621, server=null} to
> > > > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN,
> ts=1431663189683,
> > > > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
> > > >
> > > > Then two region servers went down:
> > > >
> > > > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > > > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
> > > >  1431663208899]
> > > > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > > > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
> > > >  1431663193865]
> > > >
> > > > Master was stopped afterwards:
> > > >
> > > > Fri May 15 12:15:58 CST 2015 Terminating master
> > > >
> > > > Namenode process was running on l-namenode2.dba.cn8, right ?
> > > >
> > > > Cheers
> > > >
> > > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <lo...@gmail.com>
> > > wrote:
> > > >
> > > > > hi, TED,
> > > > > Any idea?
> > > > > When the HMaster restart, how can i know when it is really can
> handle
> > > > > request from application? is there any mark in logs?
> > > > >
> > > > > 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > > > >
> > > > > > @Ted,
> > > > > > plz see the log from 12:11:29 to 12:15:28, this timerange the
> > HMaster
> > > > is
> > > > > > in restarting stage, but can not handle request from client? Is
> the
> > > > > HMaster
> > > > > > recovering or do something else?
> > > > > >
> > > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > > > > >
> > > > > >> OK, you can get the log from
> > > > > >> http://pan.baidu.com/s/1pqS6E
> > > > > >>
> > > > > >>
> > > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > > >>
> > > > > >>> Can you check server log on 192.168.39.22
> > > > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
> > > > > >>>
> > > > > >>> That should give you some clue.
> > > > > >>>
> > > > > >>> Cheers
> > > > > >>>
> > > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <
> > louis.hust@gmail.com>
> > > > > >>> wrote:
> > > > > >>>
> > > > > >>> > Hi all,
> > > > > >>> >
> > > > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> > > > > >>> > and the custom said they can not write into hbase cluster,
> > > > > >>> > So i stop the HMaster and start it soon,
> > > > > >>> >
> > > > > >>> > But it seems that the HMaster not response to request,
> > following
> > > is
> > > > > the
> > > > > >>> > HMaster log:
> > > > > >>> >
> > > > > >>> > {logs}
> > > > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> > > > > >>> master.RegionStates:
> > > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> > > > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> > > > > >>> ,60020,1431462615651}
> > > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
> > > ts=1431663213136,
> > > > > >>> server=
> > > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> > > > > >>> master.RegionStates:
> > > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> > > > > >>> l-hbase26.data.cn8.qunar.com
> > > > > >>> > ,60020,1431462615651
> > > > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > > deleted,
> > > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> > > > > >>> ,60020,1431663208899]
> > > > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > > deleted,
> > > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> > > > > >>> ,60020,1431663193865]
> > > > > >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> > > > > >>> > client.HConnectionManager$HConnectionImplementation: Checking
> > > > master
> > > > > >>> > connection
> > > > > >>> > com.google.protobuf.ServiceException:
> > > > > java.net.SocketTimeoutException:
> > > > > >>> Call
> > > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> > > > because
> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> > > waiting
> > > > > for
> > > > > >>> > channel to be ready for read. ch :
> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > > 192.168.39.22:47700
> > > > > >>> > remote=
> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>>
> > > > >
> > >
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > > > > >>> > at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > > > > >>> > at
> javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > > > > >>> > at
> > > > > >>>
> > > >
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > > >>> > at
> > > > >
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > > > >>> > at
> > > > > >>>
> > > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > > > > >>> > at
> > > > > >>>
> > > >
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > > > >>> > at
> org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > > > >>> > at
> > > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > > > >>> > at
> > > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > > > >>> > Caused by: java.net.SocketTimeoutException: Call to
> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> > because
> > > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> > > waiting
> > > > > for
> > > > > >>> > channel to be ready for read. ch :
> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > > 192.168.39.22:47700
> > > > > >>> > remote=
> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > > > > >>> > at
> > > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > > > > >>> > ... 37 more
> > > > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis
> > timeout
> > > > > while
> > > > > >>> > waiting for channel to be ready for read. ch :
> > > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > > 192.168.39.22:47700
> > > > > >>> > remote=
> > > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > > > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > > > > >>> > at
> > java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > > > > >>> > at
> > java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > > > > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > > > >>> > at
> > > > > >>> >
> > > > > >>> >
> > > > > >>>
> > > > >
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > > > > >>> > at
> > > > > >>>
> > > > >
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> > > > > >>> > {/logs}
> > > > > >>> > So what the exception means? Why? and how to solve the
> problem?
> > > > > >>> >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: HMaster restart with error

Posted by Louis Hust <lo...@gmail.com>.

Hi, ted,

Thanks for your reply!!

I found the log in l-namenode2.dba.cn8 during the restarting progress:
2015-05-15 12:11:36,540 INFO  [master:l-namenode2:60000]
master.ServerManager: Finished waiting for region servers count to settle;
checked in 5, slept for 4511 ms, expecting minimum of 1, maximum of
2147483647, master is running.

So this means the HMaster ready for handle request at 12:11:36?

The backup master is l-namenode1.dba.cn8 and you can get the log at :

http://pan.baidu.com/s/1eQlKXj0

After the l-namenode2.dba.cn8 is stopped by me at 12:15:58,
the backup master l-namenode1 take over, and i found log:

2015-05-15 12:16:40,024 INFO  [master:l-namenode1:60000]
master.ServerManager: Finished waiting for region servers count to settle;
checked in 4, slept for 5663 ms, expecting minimum of 1, maximum of
2147483647, master is running.

So the backup master take over at 2015-05-15 12:16:40,024 ?

But it seems the l-namenode2 not working normally with the exception in log:

2015-05-15 12:16:40,522 INFO
 [MASTER_SERVER_OPERATIONS-l-namenode1:60000-0]
handler.ServerShutdownHandler: Finished processing of shutdown of
l-hbase31.data.cn8.qunar.com,60020,1427789773001
2015-05-15 12:17:11,301 WARN  [686544788@qtp-660252776-212]
client.HConnectionManager$HConnectionImplementation: Checking master
connection
com.google.protobuf.ServiceException: java.net.ConnectException: Connection
refused
at
org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
at
org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
at
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)

Is the exception means the HMaster not working normally or somewhat?



2015-05-17 11:06 GMT+08:00 Ted Yu <yu...@gmail.com>:

> bq. the HMaster is handling two region server down, and not ready to handle
> client request?
>
> I didn't mean that - for a functioning master, handling region server
> shutdown is part of the master's job.
>
> You should see something similar to the following in (functioning) master
> log:
>
> 2015-05-13 04:06:36,266 INFO  [master:c6401:60000] master.ServerManager:
> Finished waiting for region servers count to settle; checked in 1, slept
> for 71582 ms, expecting minimum of 1, maximum of 2147483647, master is
> running.
>
> bq. wait the backend HMaster to take over
>
> Was there exception in backup master log after it took over ?
>
> On Sat, May 16, 2015 at 6:44 PM, Louis Hust <lo...@gmail.com> wrote:
>
> > hi,ted,
> >
> > Thanks very much!
> >
> > Namenode process was not running on l-namenode2.dba.cn8(192.168.39.22),
> > just HMaster run on it。
> > So you means that at 2015-05-15 12:15:04,  the HMaster is handling two
> > region server down, and not
> > ready to handle client request? And how can i tell when the HMaster is
> > ready to handle client request
> > from the logs?
> >
> > I stop the Hmaster at 12:15:58 cause the HMaster can not handling
> request,
> > so i want to stop it and wait
> > the backend HMaster to take over.
> >
> >
> >
> >
> > 2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >
> > > In the period you identified, master was assigning regions.
> > > e.g.
> > >
> > > 2015-05-15 12:13:09,683 INFO
> > > [l-namenode2.dba.cn8.qunar.com
> > ,60000,1431663090427-GeneralBulkAssigner-0]
> > > master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685
> > > state=OFFLINE, ts=1431663189621, server=null} to
> > > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN, ts=1431663189683,
> > > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > 2015-05-15 12:13:09,683 INFO
> > > [l-namenode2.dba.cn8.qunar.com
> > ,60000,1431663090427-GeneralBulkAssigner-2]
> > > master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565
> > > state=OFFLINE, ts=1431663189621, server=null} to
> > > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN, ts=1431663189683,
> > > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
> > >
> > > Then two region servers went down:
> > >
> > > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
> > >  1431663208899]
> > > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
> > >  1431663193865]
> > >
> > > Master was stopped afterwards:
> > >
> > > Fri May 15 12:15:58 CST 2015 Terminating master
> > >
> > > Namenode process was running on l-namenode2.dba.cn8, right ?
> > >
> > > Cheers
> > >
> > > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <lo...@gmail.com>
> > wrote:
> > >
> > > > hi, TED,
> > > > Any idea?
> > > > When the HMaster restart, how can i know when it is really can handle
> > > > request from application? is there any mark in logs?
> > > >
> > > > 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > > >
> > > > > @Ted,
> > > > > plz see the log from 12:11:29 to 12:15:28, this timerange the
> HMaster
> > > is
> > > > > in restarting stage, but can not handle request from client? Is the
> > > > HMaster
> > > > > recovering or do something else?
> > > > >
> > > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > > > >
> > > > >> OK, you can get the log from
> > > > >> http://pan.baidu.com/s/1pqS6E
> > > > >>
> > > > >>
> > > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > > >>
> > > > >>> Can you check server log on 192.168.39.22
> > > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
> > > > >>>
> > > > >>> That should give you some clue.
> > > > >>>
> > > > >>> Cheers
> > > > >>>
> > > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <
> louis.hust@gmail.com>
> > > > >>> wrote:
> > > > >>>
> > > > >>> > Hi all,
> > > > >>> >
> > > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> > > > >>> > and the custom said they can not write into hbase cluster,
> > > > >>> > So i stop the HMaster and start it soon,
> > > > >>> >
> > > > >>> > But it seems that the HMaster not response to request,
> following
> > is
> > > > the
> > > > >>> > HMaster log:
> > > > >>> >
> > > > >>> > {logs}
> > > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> > > > >>> master.RegionStates:
> > > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> > > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> > > > >>> ,60020,1431462615651}
> > > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
> > ts=1431663213136,
> > > > >>> server=
> > > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> > > > >>> master.RegionStates:
> > > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> > > > >>> l-hbase26.data.cn8.qunar.com
> > > > >>> > ,60020,1431462615651
> > > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > deleted,
> > > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> > > > >>> ,60020,1431663208899]
> > > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> > deleted,
> > > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> > > > >>> ,60020,1431663193865]
> > > > >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> > > > >>> > client.HConnectionManager$HConnectionImplementation: Checking
> > > master
> > > > >>> > connection
> > > > >>> > com.google.protobuf.ServiceException:
> > > > java.net.SocketTimeoutException:
> > > > >>> Call
> > > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> > > because
> > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> > waiting
> > > > for
> > > > >>> > channel to be ready for read. ch :
> > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > 192.168.39.22:47700
> > > > >>> > remote=
> > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > > > >>> > at
> > > > >>> >
> > > > >>>
> > > >
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > > > >>> > at
> > > > >>>
> > > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > >>> > at
> > > > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > > >>> > at
> > > > >>>
> > > >
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > > >>> > at
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > > >>> > at
> > > > >>>
> > > >
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > > >>> > at
> > > > >>>
> > > >
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > > >>> > at
> > > > >>>
> > org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > > >>> > at
> > > > >>>
> > > >
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > > > >>> > at
> > > > >>>
> > > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > > >>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > > >>> > at
> > org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > > >>> > at
> > org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > > >>> > Caused by: java.net.SocketTimeoutException: Call to
> > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> because
> > > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> > waiting
> > > > for
> > > > >>> > channel to be ready for read. ch :
> > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > 192.168.39.22:47700
> > > > >>> > remote=
> > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > > >>> > at
> > > > >>>
> > > >
> > org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > > > >>> > at
> > org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > > > >>> > ... 37 more
> > > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis
> timeout
> > > > while
> > > > >>> > waiting for channel to be ready for read. ch :
> > > > >>> > java.nio.channels.SocketChannel[connected local=/
> > > 192.168.39.22:47700
> > > > >>> > remote=
> > > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > > >>> > at
> > > > >>>
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > > > >>> > at
> > > > >>>
> > > >
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > > > >>> > at
> java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > > > >>> > at
> java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > > > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > > >>> > at
> > > > >>> >
> > > > >>> >
> > > > >>>
> > > >
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > > > >>> > at
> > > > >>>
> > > >
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> > > > >>> > {/logs}
> > > > >>> > So what the exception means? Why? and how to solve the problem?
> > > > >>> >
> > > > >>>
> > > > >>
> > > > >>
> > > > >
> > > >
> > >
> >
>

Re: HMaster restart with error

Posted by Ted Yu <yu...@gmail.com>.

bq. the HMaster is handling two region server down, and not ready to handle
client request?

I didn't mean that - for a functioning master, handling region server
shutdown is part of the master's job.

You should see something similar to the following in (functioning) master
log:

2015-05-13 04:06:36,266 INFO  [master:c6401:60000] master.ServerManager:
Finished waiting for region servers count to settle; checked in 1, slept
for 71582 ms, expecting minimum of 1, maximum of 2147483647, master is
running.

bq. wait the backend HMaster to take over

Was there exception in backup master log after it took over ?

On Sat, May 16, 2015 at 6:44 PM, Louis Hust <lo...@gmail.com> wrote:

> hi,ted,
>
> Thanks very much!
>
> Namenode process was not running on l-namenode2.dba.cn8(192.168.39.22),
> just HMaster run on it。
> So you means that at 2015-05-15 12:15:04,  the HMaster is handling two
> region server down, and not
> ready to handle client request? And how can i tell when the HMaster is
> ready to handle client request
> from the logs?
>
> I stop the Hmaster at 12:15:58 cause the HMaster can not handling request,
> so i want to stop it and wait
> the backend HMaster to take over.
>
>
>
>
> 2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
> > In the period you identified, master was assigning regions.
> > e.g.
> >
> > 2015-05-15 12:13:09,683 INFO
> > [l-namenode2.dba.cn8.qunar.com
> ,60000,1431663090427-GeneralBulkAssigner-0]
> > master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685
> > state=OFFLINE, ts=1431663189621, server=null} to
> > {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN, ts=1431663189683,
> > server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > 2015-05-15 12:13:09,683 INFO
> > [l-namenode2.dba.cn8.qunar.com
> ,60000,1431663090427-GeneralBulkAssigner-2]
> > master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565
> > state=OFFLINE, ts=1431663189621, server=null} to
> > {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN, ts=1431663189683,
> > server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
> >
> > Then two region servers went down:
> >
> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > processing expiration [l-hbase27.data.cn8.qunar.com,60020,
> >  1431663208899]
> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > processing expiration [l-hbase25.data.cn8.qunar.com,60020,
> >  1431663193865]
> >
> > Master was stopped afterwards:
> >
> > Fri May 15 12:15:58 CST 2015 Terminating master
> >
> > Namenode process was running on l-namenode2.dba.cn8, right ?
> >
> > Cheers
> >
> > On Sat, May 16, 2015 at 7:50 AM, Louis Hust <lo...@gmail.com>
> wrote:
> >
> > > hi, TED,
> > > Any idea?
> > > When the HMaster restart, how can i know when it is really can handle
> > > request from application? is there any mark in logs?
> > >
> > > 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > >
> > > > @Ted,
> > > > plz see the log from 12:11:29 to 12:15:28, this timerange the HMaster
> > is
> > > > in restarting stage, but can not handle request from client? Is the
> > > HMaster
> > > > recovering or do something else?
> > > >
> > > > 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > > >
> > > >> OK, you can get the log from
> > > >> http://pan.baidu.com/s/1pqS6E
> > > >>
> > > >>
> > > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > > >>
> > > >>> Can you check server log on 192.168.39.22
> > > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
> > > >>>
> > > >>> That should give you some clue.
> > > >>>
> > > >>> Cheers
> > > >>>
> > > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <lo...@gmail.com>
> > > >>> wrote:
> > > >>>
> > > >>> > Hi all,
> > > >>> >
> > > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> > > >>> > and the custom said they can not write into hbase cluster,
> > > >>> > So i stop the HMaster and start it soon,
> > > >>> >
> > > >>> > But it seems that the HMaster not response to request, following
> is
> > > the
> > > >>> > HMaster log:
> > > >>> >
> > > >>> > {logs}
> > > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> > > >>> master.RegionStates:
> > > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> > > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> > > >>> ,60020,1431462615651}
> > > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN,
> ts=1431663213136,
> > > >>> server=
> > > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> > > >>> master.RegionStates:
> > > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> > > >>> l-hbase26.data.cn8.qunar.com
> > > >>> > ,60020,1431462615651
> > > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> deleted,
> > > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> > > >>> ,60020,1431663208899]
> > > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node
> deleted,
> > > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> > > >>> ,60020,1431663193865]
> > > >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> > > >>> > client.HConnectionManager$HConnectionImplementation: Checking
> > master
> > > >>> > connection
> > > >>> > com.google.protobuf.ServiceException:
> > > java.net.SocketTimeoutException:
> > > >>> Call
> > > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> > because
> > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> waiting
> > > for
> > > >>> > channel to be ready for read. ch :
> > > >>> > java.nio.channels.SocketChannel[connected local=/
> > 192.168.39.22:47700
> > > >>> > remote=
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > > >>> > at
> > > >>> >
> > > >>>
> > >
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > > >>> > at
> > > >>>
> > org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >>> > at
> > > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > > >>> > at
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > > >>> > at
> > > >>>
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > > >>> > at
> > > >>>
> > >
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > > >>> > at
> > > >>>
> > org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > > >>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > > >>> > at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > > >>> > at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > > >>> > Caused by: java.net.SocketTimeoutException: Call to
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> > > >>> > java.net.SocketTimeoutException: 60000 millis timeout while
> waiting
> > > for
> > > >>> > channel to be ready for read. ch :
> > > >>> > java.nio.channels.SocketChannel[connected local=/
> > 192.168.39.22:47700
> > > >>> > remote=
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > > >>> > at
> org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > > >>> > ... 37 more
> > > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> > > while
> > > >>> > waiting for channel to be ready for read. ch :
> > > >>> > java.nio.channels.SocketChannel[connected local=/
> > 192.168.39.22:47700
> > > >>> > remote=
> > > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > > >>> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > > >>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > >>> > at
> > > >>> >
> > > >>> >
> > > >>>
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > > >>> > at
> > > >>>
> > >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> > > >>> > {/logs}
> > > >>> > So what the exception means? Why? and how to solve the problem?
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: HMaster restart with error

Posted by Louis Hust <lo...@gmail.com>.

hi,ted,

Thanks very much!

Namenode process was not running on l-namenode2.dba.cn8(192.168.39.22),
just HMaster run on it。
So you means that at 2015-05-15 12:15:04,  the HMaster is handling two
region server down, and not
ready to handle client request? And how can i tell when the HMaster is
ready to handle client request
from the logs?

I stop the Hmaster at 12:15:58 cause the HMaster can not handling request,
so i want to stop it and wait
the backend HMaster to take over.




2015-05-17 0:29 GMT+08:00 Ted Yu <yu...@gmail.com>:

> In the period you identified, master was assigning regions.
> e.g.
>
> 2015-05-15 12:13:09,683 INFO
> [l-namenode2.dba.cn8.qunar.com,60000,1431663090427-GeneralBulkAssigner-0]
> master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685
> state=OFFLINE, ts=1431663189621, server=null} to
> {c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN, ts=1431663189683,
> server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> 2015-05-15 12:13:09,683 INFO
> [l-namenode2.dba.cn8.qunar.com,60000,1431663090427-GeneralBulkAssigner-2]
> master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565
> state=OFFLINE, ts=1431663189621, server=null} to
> {2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN, ts=1431663189683,
> server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}
>
> Then two region servers went down:
>
> 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> processing expiration [l-hbase27.data.cn8.qunar.com,60020,
>  1431663208899]
> 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> processing expiration [l-hbase25.data.cn8.qunar.com,60020,
>  1431663193865]
>
> Master was stopped afterwards:
>
> Fri May 15 12:15:58 CST 2015 Terminating master
>
> Namenode process was running on l-namenode2.dba.cn8, right ?
>
> Cheers
>
> On Sat, May 16, 2015 at 7:50 AM, Louis Hust <lo...@gmail.com> wrote:
>
> > hi, TED,
> > Any idea?
> > When the HMaster restart, how can i know when it is really can handle
> > request from application? is there any mark in logs?
> >
> > 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
> >
> > > @Ted,
> > > plz see the log from 12:11:29 to 12:15:28, this timerange the HMaster
> is
> > > in restarting stage, but can not handle request from client? Is the
> > HMaster
> > > recovering or do something else?
> > >
> > > 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
> > >
> > >> OK, you can get the log from
> > >> http://pan.baidu.com/s/1pqS6E
> > >>
> > >>
> > >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> > >>
> > >>> Can you check server log on 192.168.39.22
> > >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
> > >>>
> > >>> That should give you some clue.
> > >>>
> > >>> Cheers
> > >>>
> > >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <lo...@gmail.com>
> > >>> wrote:
> > >>>
> > >>> > Hi all,
> > >>> >
> > >>> > I use hbase0.96.0 with hadoop 2.2.0,
> > >>> > and the custom said they can not write into hbase cluster,
> > >>> > So i stop the HMaster and start it soon,
> > >>> >
> > >>> > But it seems that the HMaster not response to request, following is
> > the
> > >>> > HMaster log:
> > >>> >
> > >>> > {logs}
> > >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> > >>> master.RegionStates:
> > >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> > >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> > >>> ,60020,1431462615651}
> > >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, ts=1431663213136,
> > >>> server=
> > >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> > >>> master.RegionStates:
> > >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> > >>> l-hbase26.data.cn8.qunar.com
> > >>> > ,60020,1431462615651
> > >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> > >>> ,60020,1431663208899]
> > >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> > >>> ,60020,1431663193865]
> > >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> > >>> > client.HConnectionManager$HConnectionImplementation: Checking
> master
> > >>> > connection
> > >>> > com.google.protobuf.ServiceException:
> > java.net.SocketTimeoutException:
> > >>> Call
> > >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed
> because
> > >>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting
> > for
> > >>> > channel to be ready for read. ch :
> > >>> > java.nio.channels.SocketChannel[connected local=/
> 192.168.39.22:47700
> > >>> > remote=
> > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > >>> > at
> > >>> >
> > >>>
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > >>> > at
> > >>>
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >>> > at
> > org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > >>> > at
> > >>>
> > org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > >>> > at
> > >>> >
> > >>>
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > >>> > at
> > >>>
> > org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > >>> > at
> > >>>
> > org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > >>> > at
> > >>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > >>> > at
> > >>>
> > org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > >>> > at
> > >>>
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > >>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > >>> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > >>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > >>> > Caused by: java.net.SocketTimeoutException: Call to
> > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> > >>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting
> > for
> > >>> > channel to be ready for read. ch :
> > >>> > java.nio.channels.SocketChannel[connected local=/
> 192.168.39.22:47700
> > >>> > remote=
> > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > >>> > at
> > >>>
> > org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > >>> > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > >>> > ... 37 more
> > >>> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> > while
> > >>> > waiting for channel to be ready for read. ch :
> > >>> > java.nio.channels.SocketChannel[connected local=/
> 192.168.39.22:47700
> > >>> > remote=
> > >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > >>> > at
> > >>>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > >>> > at
> > >>>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > >>> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > >>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > >>> > at
> > >>> >
> > >>> >
> > >>>
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > >>> > at
> > >>>
> > org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> > >>> > {/logs}
> > >>> > So what the exception means? Why? and how to solve the problem?
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

Re: HMaster restart with error

Posted by Ted Yu <yu...@gmail.com>.

In the period you identified, master was assigning regions.
e.g.

2015-05-15 12:13:09,683 INFO
[l-namenode2.dba.cn8.qunar.com,60000,1431663090427-GeneralBulkAssigner-0]
master.RegionStates: Transitioned {c634280ce287b2d6cebd88b61accf685
state=OFFLINE, ts=1431663189621, server=null} to
{c634280ce287b2d6cebd88b61accf685 state=PENDING_OPEN, ts=1431663189683,
server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
2015-05-15 12:13:09,683 INFO
[l-namenode2.dba.cn8.qunar.com,60000,1431663090427-GeneralBulkAssigner-2]
master.RegionStates: Transitioned {2f60b1b4e51d32ef98ad19690f13a565
state=OFFLINE, ts=1431663189621, server=null} to
{2f60b1b4e51d32ef98ad19690f13a565 state=PENDING_OPEN, ts=1431663189683,
server=l-hbase30.data.cn8.qunar.com,60020,1431462562233}

Then two region servers went down:

2015-05-15 12:14:40,699 INFO  [main-EventThread]
zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [l-hbase27.data.cn8.qunar.com,60020,   1431663208899]
2015-05-15 12:15:04,899 INFO  [main-EventThread]
zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
processing expiration [l-hbase25.data.cn8.qunar.com,60020,   1431663193865]

Master was stopped afterwards:

Fri May 15 12:15:58 CST 2015 Terminating master

Namenode process was running on l-namenode2.dba.cn8, right ?

Cheers

On Sat, May 16, 2015 at 7:50 AM, Louis Hust <lo...@gmail.com> wrote:

> hi, TED,
> Any idea?
> When the HMaster restart, how can i know when it is really can handle
> request from application? is there any mark in logs?
>
> 2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:
>
> > @Ted,
> > plz see the log from 12:11:29 to 12:15:28, this timerange the HMaster is
> > in restarting stage, but can not handle request from client? Is the
> HMaster
> > recovering or do something else?
> >
> > 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
> >
> >> OK, you can get the log from
> >> http://pan.baidu.com/s/1pqS6E
> >>
> >>
> >> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
> >>
> >>> Can you check server log on 192.168.39.22
> >>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
> >>>
> >>> That should give you some clue.
> >>>
> >>> Cheers
> >>>
> >>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <lo...@gmail.com>
> >>> wrote:
> >>>
> >>> > Hi all,
> >>> >
> >>> > I use hbase0.96.0 with hadoop 2.2.0,
> >>> > and the custom said they can not write into hbase cluster,
> >>> > So i stop the HMaster and start it soon,
> >>> >
> >>> > But it seems that the HMaster not response to request, following is
> the
> >>> > HMaster log:
> >>> >
> >>> > {logs}
> >>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> >>> master.RegionStates:
> >>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> >>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> >>> ,60020,1431462615651}
> >>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, ts=1431663213136,
> >>> server=
> >>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> >>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> >>> master.RegionStates:
> >>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
> >>> l-hbase26.data.cn8.qunar.com
> >>> > ,60020,1431462615651
> >>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> >>> > processing expiration [l-hbase27.data.cn8.qunar.com
> >>> ,60020,1431663208899]
> >>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> >>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> >>> > processing expiration [l-hbase25.data.cn8.qunar.com
> >>> ,60020,1431663193865]
> >>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> >>> > client.HConnectionManager$HConnectionImplementation: Checking master
> >>> > connection
> >>> > com.google.protobuf.ServiceException:
> java.net.SocketTimeoutException:
> >>> Call
> >>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> >>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting
> for
> >>> > channel to be ready for read. ch :
> >>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> >>> > remote=
> >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> >>> > at
> >>> >
> >>>
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> >>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> >>> > at
> >>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>> > at
> org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> >>> > at
> >>>
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> >>> > at
> >>> >
> >>>
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> >>> > at
> >>>
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> >>> > at
> >>>
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> >>> > at
> >>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> >>> > at
> >>>
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> >>> > at org.mortbay.jetty.Server.handle(Server.java:326)
> >>> > at
> >>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> >>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> >>> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> >>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> >>> > at
> >>> >
> >>> >
> >>>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> >>> > Caused by: java.net.SocketTimeoutException: Call to
> >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> >>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting
> for
> >>> > channel to be ready for read. ch :
> >>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> >>> > remote=
> >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> >>> > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> >>> > ... 37 more
> >>> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> while
> >>> > waiting for channel to be ready for read. ch :
> >>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> >>> > remote=
> >>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >>> > at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> >>> > at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> >>> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> >>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> >>> > at
> >>> >
> >>> >
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> >>> > at
> >>>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> >>> > Fri May 15 12:15:58 CST 2015 Terminating master
> >>> > {/logs}
> >>> > So what the exception means? Why? and how to solve the problem?
> >>> >
> >>>
> >>
> >>
> >
>

Re: HMaster restart with error

Posted by Louis Hust <lo...@gmail.com>.

hi, TED,
Any idea?
When the HMaster restart, how can i know when it is really can handle
request from application? is there any mark in logs?

2015-05-16 14:05 GMT+08:00 Louis Hust <lo...@gmail.com>:

> @Ted,
> plz see the log from 12:11:29 to 12:15:28, this timerange the HMaster is
> in restarting stage, but can not handle request from client? Is the HMaster
> recovering or do something else?
>
> 2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:
>
>> OK, you can get the log from
>> http://pan.baidu.com/s/1pqS6E
>>
>>
>> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
>>
>>> Can you check server log on 192.168.39.22
>>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
>>>
>>> That should give you some clue.
>>>
>>> Cheers
>>>
>>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <lo...@gmail.com>
>>> wrote:
>>>
>>> > Hi all,
>>> >
>>> > I use hbase0.96.0 with hadoop 2.2.0,
>>> > and the custom said they can not write into hbase cluster,
>>> > So i stop the HMaster and start it soon,
>>> >
>>> > But it seems that the HMaster not response to request, following is the
>>> > HMaster log:
>>> >
>>> > {logs}
>>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
>>> master.RegionStates:
>>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
>>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
>>> ,60020,1431462615651}
>>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, ts=1431663213136,
>>> server=
>>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
>>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
>>> master.RegionStates:
>>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
>>> l-hbase26.data.cn8.qunar.com
>>> > ,60020,1431462615651
>>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
>>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
>>> > processing expiration [l-hbase27.data.cn8.qunar.com
>>> ,60020,1431663208899]
>>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
>>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
>>> > processing expiration [l-hbase25.data.cn8.qunar.com
>>> ,60020,1431663193865]
>>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
>>> > client.HConnectionManager$HConnectionImplementation: Checking master
>>> > connection
>>> > com.google.protobuf.ServiceException: java.net.SocketTimeoutException:
>>> Call
>>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
>>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> > channel to be ready for read. ch :
>>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
>>> > remote=
>>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
>>> > at
>>> >
>>> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
>>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>>> > at
>>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>>> > at
>>> >
>>> >
>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>>> > at
>>> >
>>> >
>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
>>> > at
>>> >
>>> >
>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>> > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>>> > at
>>> >
>>> >
>>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>>> > at
>>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>>> > at
>>> >
>>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>>> > at
>>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>>> > at
>>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>>> > at
>>> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>>> > at
>>> >
>>> >
>>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>>> > at
>>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>>> > at org.mortbay.jetty.Server.handle(Server.java:326)
>>> > at
>>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>>> > at
>>> >
>>> >
>>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>>> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>>> > at
>>> >
>>> >
>>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>>> > at
>>> >
>>> >
>>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>>> > Caused by: java.net.SocketTimeoutException: Call to
>>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
>>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> > channel to be ready for read. ch :
>>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
>>> > remote=
>>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>>> > at
>>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>>> > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>>> > ... 37 more
>>> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>>> > waiting for channel to be ready for read. ch :
>>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
>>> > remote=
>>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>> > at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>>> > at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
>>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
>>> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
>>> > at
>>> >
>>> >
>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
>>> > at
>>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
>>> > Fri May 15 12:15:58 CST 2015 Terminating master
>>> > {/logs}
>>> > So what the exception means? Why? and how to solve the problem?
>>> >
>>>
>>
>>
>

Re: HMaster restart with error

Posted by Louis Hust <lo...@gmail.com>.

@Ted,
plz see the log from 12:11:29 to 12:15:28, this timerange the HMaster is in
restarting stage, but can not handle request from client? Is the HMaster
recovering or do something else?

2015-05-16 13:59 GMT+08:00 Louis Hust <lo...@gmail.com>:

> OK, you can get the log from
> http://pan.baidu.com/s/1pqS6E
>
>
> 2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:
>
>> Can you check server log on 192.168.39.22
>> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
>>
>> That should give you some clue.
>>
>> Cheers
>>
>> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <lo...@gmail.com> wrote:
>>
>> > Hi all,
>> >
>> > I use hbase0.96.0 with hadoop 2.2.0,
>> > and the custom said they can not write into hbase cluster,
>> > So i stop the HMaster and start it soon,
>> >
>> > But it seems that the HMaster not response to request, following is the
>> > HMaster log:
>> >
>> > {logs}
>> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
>> master.RegionStates:
>> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
>> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
>> ,60020,1431462615651}
>> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, ts=1431663213136,
>> server=
>> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
>> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
>> master.RegionStates:
>> > Onlined 9036a3befee90eeffb9082f90a4a9afa on
>> l-hbase26.data.cn8.qunar.com
>> > ,60020,1431462615651
>> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
>> > processing expiration [l-hbase27.data.cn8.qunar.com
>> ,60020,1431663208899]
>> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
>> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
>> > processing expiration [l-hbase25.data.cn8.qunar.com
>> ,60020,1431663193865]
>> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
>> > client.HConnectionManager$HConnectionImplementation: Checking master
>> > connection
>> > com.google.protobuf.ServiceException: java.net.SocketTimeoutException:
>> Call
>> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>> > channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
>> > remote=
>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
>> > at
>> >
>> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
>> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
>> > at
>> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
>> > at
>> >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
>> > at
>> >
>> >
>> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
>> > at
>> >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> > at
>> >
>> >
>> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
>> > at
>> >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
>> > at
>> >
>> >
>> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
>> > at
>> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
>> > at
>> >
>> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
>> > at
>> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
>> > at
>> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
>> > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>> > at
>> >
>> >
>> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
>> > at
>> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
>> > at org.mortbay.jetty.Server.handle(Server.java:326)
>> > at
>> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
>> > at
>> >
>> >
>> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
>> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
>> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>> > at
>> >
>> >
>> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
>> > at
>> >
>> >
>> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
>> > Caused by: java.net.SocketTimeoutException: Call to
>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
>> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>> > channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
>> > remote=
>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>> > at
>> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
>> > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
>> > ... 37 more
>> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>> > waiting for channel to be ready for read. ch :
>> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
>> > remote=
>> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
>> > at
>> >
>> >
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>> > at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>> > at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
>> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
>> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
>> > at
>> >
>> >
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
>> > at
>> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
>> > Fri May 15 12:15:58 CST 2015 Terminating master
>> > {/logs}
>> > So what the exception means? Why? and how to solve the problem?
>> >
>>
>
>

Re: HMaster restart with error

Posted by Louis Hust <lo...@gmail.com>.

OK, you can get the log from
http://pan.baidu.com/s/1pqS6E


2015-05-16 13:26 GMT+08:00 Ted Yu <yu...@gmail.com>:

> Can you check server log on 192.168.39.22
> <http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?
>
> That should give you some clue.
>
> Cheers
>
> On Fri, May 15, 2015 at 8:22 PM, Louis Hust <lo...@gmail.com> wrote:
>
> > Hi all,
> >
> > I use hbase0.96.0 with hadoop 2.2.0,
> > and the custom said they can not write into hbase cluster,
> > So i stop the HMaster and start it soon,
> >
> > But it seems that the HMaster not response to request, following is the
> > HMaster log:
> >
> > {logs}
> > 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16]
> master.RegionStates:
> > Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> > ts=1431663212637, server=l-hbase26.data.cn8.qunar.com
> ,60020,1431462615651}
> > to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, ts=1431663213136,
> server=
> > l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> > 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4]
> master.RegionStates:
> > Onlined 9036a3befee90eeffb9082f90a4a9afa on l-hbase26.data.cn8.qunar.com
> > ,60020,1431462615651
> > 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > processing expiration [l-hbase27.data.cn8.qunar.com,60020,1431663208899]
> > 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> > zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> > processing expiration [l-hbase25.data.cn8.qunar.com,60020,1431663193865]
> > 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> > client.HConnectionManager$HConnectionImplementation: Checking master
> > connection
> > com.google.protobuf.ServiceException: java.net.SocketTimeoutException:
> Call
> > to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> > remote=
> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> > at
> >
> >
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> > at
> >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> > at
> >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> > at
> >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> > at
> >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> > at
> > org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> > at
> >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> > at
> >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> > at
> >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> > at
> >
> >
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> > at
> >
> >
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> > at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> > at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> > at
> >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> > at
> >
> >
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> > at
> >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > at
> >
> >
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> > at
> >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> > at
> >
> >
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> > at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> > at
> >
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> > at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> > at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> > at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> > at
> >
> >
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> > at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> > at org.mortbay.jetty.Server.handle(Server.java:326)
> > at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> > at
> >
> >
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> > at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> > at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> > at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> > at
> >
> >
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> > at
> >
> >
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> > Caused by: java.net.SocketTimeoutException: Call to
> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> > remote=
> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > at
> org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> > at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> > ... 37 more
> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> > remote=
> > l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> > at
> >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> > at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> > at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > at
> >
> >
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> > at
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> > Fri May 15 12:15:58 CST 2015 Terminating master
> > {/logs}
> > So what the exception means? Why? and how to solve the problem?
> >
>

Re: HMaster restart with error

Posted by Ted Yu <yu...@gmail.com>.

Can you check server log on 192.168.39.22
<http://l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000> ?

That should give you some clue.

Cheers

On Fri, May 15, 2015 at 8:22 PM, Louis Hust <lo...@gmail.com> wrote:

> Hi all,
>
> I use hbase0.96.0 with hadoop 2.2.0,
> and the custom said they can not write into hbase cluster,
> So i stop the HMaster and start it soon,
>
> But it seems that the HMaster not response to request, following is the
> HMaster log:
>
> {logs}
> 2015-05-15 12:13:33,136 INFO  [AM.ZK.Worker-pool2-t16] master.RegionStates:
> Transitioned {9036a3befee90eeffb9082f90a4a9afa state=OPENING,
> ts=1431663212637, server=l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> to {9036a3befee90eeffb9082f90a4a9afa state=OPEN, ts=1431663213136, server=
> l-hbase26.data.cn8.qunar.com,60020,1431462615651}
> 2015-05-15 12:13:33,139 INFO  [AM.ZK.Worker-pool2-t4] master.RegionStates:
> Onlined 9036a3befee90eeffb9082f90a4a9afa on l-hbase26.data.cn8.qunar.com
> ,60020,1431462615651
> 2015-05-15 12:14:40,699 INFO  [main-EventThread]
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> processing expiration [l-hbase27.data.cn8.qunar.com,60020,1431663208899]
> 2015-05-15 12:15:04,899 INFO  [main-EventThread]
> zookeeper.RegionServerTracker: RegionServer ephemeral node deleted,
> processing expiration [l-hbase25.data.cn8.qunar.com,60020,1431663193865]
> 2015-05-15 12:15:24,465 WARN  [249240421@qtp-591022857-33]
> client.HConnectionManager$HConnectionImplementation: Checking master
> connection
> com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call
> to l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> remote=
> l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> at
>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1667)
> at
>
> org.apache.hadoop.hbase.ipc.RpcClient$BlockingRpcChannelImplementation.callBlockingMethod(RpcClient.java:1708)
> at
>
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:40216)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$MasterServiceState.isMasterRunning(HConnectionManager.java:1484)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.isKeepAliveMasterConnectedAndRunning(HConnectionManager.java:2110)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getKeepAliveMasterService(HConnectionManager.java:1836)
> at
>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.listTables(HConnectionManager.java:2531)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.listTables(HBaseAdmin.java:298)
> at
>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.__jamon_innerUnit__userTables(MasterStatusTmplImpl.java:530)
> at
>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmplImpl.renderNoFlush(MasterStatusTmplImpl.java:255)
> at
>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.renderNoFlush(MasterStatusTmpl.java:382)
> at
>
> org.apache.hadoop.hbase.tmpl.master.MasterStatusTmpl.render(MasterStatusTmpl.java:372)
> at
>
> org.apache.hadoop.hbase.master.MasterStatusServlet.doGet(MasterStatusServlet.java:95)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
> at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
> at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> at
>
> org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
> at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at
>
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1081)
> at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
> at
>
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at
>
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at
>
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at
>
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> at
>
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> Caused by: java.net.SocketTimeoutException: Call to
> l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000 failed because
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> remote=
> l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> at org.apache.hadoop.hbase.ipc.RpcClient.wrapException(RpcClient.java:1475)
> at org.apache.hadoop.hbase.ipc.RpcClient.call(RpcClient.java:1450)
> at
>
> org.apache.hadoop.hbase.ipc.RpcClient.callBlockingMethod(RpcClient.java:1650)
> ... 37 more
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected local=/192.168.39.22:47700
> remote=
> l-namenode2.dba.cn8.qunar.com/192.168.39.22:60000]
> at
>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
> at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at java.io.FilterInputStream.read(FilterInputStream.java:133)
> at
>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection$PingInputStream.read(RpcClient.java:553)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> at java.io.DataInputStream.readInt(DataInputStream.java:387)
> at
>
> org.apache.hadoop.hbase.ipc.RpcClient$Connection.readResponse(RpcClient.java:1057)
> at org.apache.hadoop.hbase.ipc.RpcClient$Connection.run(RpcClient.java:719)
> Fri May 15 12:15:58 CST 2015 Terminating master
> {/logs}
> So what the exception means? Why? and how to solve the problem?
>