You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Gaojinchao <ga...@huawei.com> on 2011/04/07 05:10:16 UTC

Master can't exit when open port failed

When Hmaster crashed  and restart , The Hmaster is hung up.

    // start up all service threads.
    startServiceThreads();                                                                 ----this open port failed!

    // Wait for region servers to report in.  Returns count of regions.
    int regionCount = this.serverManager.waitForRegionServers();

    // TODO: Should do this in background rather than block master startup
    this.fileSystemManager.
      splitLogAfterStartup(this.serverManager.getOnlineServers());

    // Make sure root and meta assigned before proceeding.
assignRootAndMeta();                                                               --- hung up this function, because of root can’t be assigned.

  if (!catalogTracker.verifyRootRegionLocation(timeout)) {
      this.assignmentManager.assignRoot();
      this.catalogTracker.waitForRoot();                                           --- This statement code is hung up.
      assigned++;
}

Log is as:

2011-04-07 16:38:22,850 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2011-04-07 16:38:22,908 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60010
2011-04-07 16:38:22,909 FATAL org.apache.hadoop.hbase.master.HMaster: Failed startup
java.net.BindException: Address already in use
         at sun.nio.ch.Net.bind(Native Method)
         at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
         at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
         at org.apache.hadoop.http.HttpServer.start(HttpServer.java:445)
         at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:542)
         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:373)
         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
2011-04-07 16:38:22,910 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
2011-04-07 16:38:22,911 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=0, stopped=true, count of regions out on cluster=0
2011-04-07 16:38:22,914 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: No log files to split, proceeding...
2011-04-07 16:38:22,930 INFO org.apache.hadoop.ipc.HbaseRPC: Server at 167-6-1-12/167.6.1.12:60020 could not be reached after 1 tries, giving up.
2011-04-07 16:38:22,930 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region location in ZooKeeper
2011-04-07 16:38:22,941 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22f2c49d2590021 Creating (or updating) unassigned node for 70236052 with OFFLINE state
2011-04-07 16:38:22,956 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server stopped; skipping assign of -ROOT-,,0.70236052 state=OFFLINE, ts=1302165502941
2011-04-07 16:38:32,746 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: 167-6-1-11:60000.timeoutMonitor exiting
2011-04-07 16:39:22,770 INFO org.apache.hadoop.hbase.master.LogCleaner: master-167-6-1-11:60000.oldLogCleaner exiting

Re: Master can't exit when open port failed

Posted by Stack <st...@duboce.net>.
C

2011/4/6 Gaojinchao <ga...@huawei.com>:
> When Hmaster crashed  and restart , The Hmaster is hung up.
>
>    // start up all service threads.
>    startServiceThreads();                                                                 ----this open port failed!
>
>    // Wait for region servers to report in.  Returns count of regions.
>    int regionCount = this.serverManager.waitForRegionServers();
>
>    // TODO: Should do this in background rather than block master startup
>    this.fileSystemManager.
>      splitLogAfterStartup(this.serverManager.getOnlineServers());
>
>    // Make sure root and meta assigned before proceeding.
> assignRootAndMeta();                                                               --- hung up this function, because of root can't be assigned.
>
>  if (!catalogTracker.verifyRootRegionLocation(timeout)) {
>      this.assignmentManager.assignRoot();
>      this.catalogTracker.waitForRoot();                                           --- This statement code is hung up.
>      assigned++;
> }
>
> Log is as:
>
> 2011-04-07 16:38:22,850 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
> 2011-04-07 16:38:22,908 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60010
> 2011-04-07 16:38:22,909 FATAL org.apache.hadoop.hbase.master.HMaster: Failed startup
> java.net.BindException: Address already in use
>         at sun.nio.ch.Net.bind(Native Method)
>         at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>         at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>         at org.apache.hadoop.http.HttpServer.start(HttpServer.java:445)
>         at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:542)
>         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:373)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
> 2011-04-07 16:38:22,910 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2011-04-07 16:38:22,911 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=0, stopped=true, count of regions out on cluster=0
> 2011-04-07 16:38:22,914 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: No log files to split, proceeding...
> 2011-04-07 16:38:22,930 INFO org.apache.hadoop.ipc.HbaseRPC: Server at 167-6-1-12/167.6.1.12:60020 could not be reached after 1 tries, giving up.
> 2011-04-07 16:38:22,930 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region location in ZooKeeper
> 2011-04-07 16:38:22,941 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22f2c49d2590021 Creating (or updating) unassigned node for 70236052 with OFFLINE state
> 2011-04-07 16:38:22,956 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server stopped; skipping assign of -ROOT-,,0.70236052 state=OFFLINE, ts=1302165502941
> 2011-04-07 16:38:32,746 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: 167-6-1-11:60000.timeoutMonitor exiting
> 2011-04-07 16:39:22,770 INFO org.apache.hadoop.hbase.master.LogCleaner: master-167-6-1-11:60000.oldLogCleaner exiting
>

Re: Master can't exit when open port failed

Posted by Gaojinchao <ga...@huawei.com>.
Thanks.

Yes, a monitor process will keep master run,  if master crashed , wait for 1~2 minute and start again.

This bug trigger because of hmaster crashed and the port don't release that may be need a few minute.

I think should be catch exception for function finishInitialization and startServiceThreads should be throw exception.

About port use, Can Hbase use the feature port reuse ? 


-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年4月7日 11:48
收件人: user@hbase.apache.org
抄送: Gaojinchao; Chenjian
主题: Re: Master can't exit when open port failed

Can you easiy reproduce?  It looks like the previous incarnation of
the Master had not shutdown before the new one started up.  Do you
have some kind of trigger-happy process babysitter running keeping an
eye over the master process?

St.Ack

2011/4/6 Gaojinchao <ga...@huawei.com>:
> When Hmaster crashed  and restart , The Hmaster is hung up.
>
>    // start up all service threads.
>    startServiceThreads();                                                                 ----this open port failed!
>
>    // Wait for region servers to report in.  Returns count of regions.
>    int regionCount = this.serverManager.waitForRegionServers();
>
>    // TODO: Should do this in background rather than block master startup
>    this.fileSystemManager.
>      splitLogAfterStartup(this.serverManager.getOnlineServers());
>
>    // Make sure root and meta assigned before proceeding.
> assignRootAndMeta();                                                               --- hung up this function, because of root can't be assigned.
>
>  if (!catalogTracker.verifyRootRegionLocation(timeout)) {
>      this.assignmentManager.assignRoot();
>      this.catalogTracker.waitForRoot();                                           --- This statement code is hung up.
>      assigned++;
> }
>
> Log is as:
>
> 2011-04-07 16:38:22,850 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
> 2011-04-07 16:38:22,908 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60010
> 2011-04-07 16:38:22,909 FATAL org.apache.hadoop.hbase.master.HMaster: Failed startup
> java.net.BindException: Address already in use
>         at sun.nio.ch.Net.bind(Native Method)
>         at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>         at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>         at org.apache.hadoop.http.HttpServer.start(HttpServer.java:445)
>         at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:542)
>         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:373)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
> 2011-04-07 16:38:22,910 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2011-04-07 16:38:22,911 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=0, stopped=true, count of regions out on cluster=0
> 2011-04-07 16:38:22,914 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: No log files to split, proceeding...
> 2011-04-07 16:38:22,930 INFO org.apache.hadoop.ipc.HbaseRPC: Server at 167-6-1-12/167.6.1.12:60020 could not be reached after 1 tries, giving up.
> 2011-04-07 16:38:22,930 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region location in ZooKeeper
> 2011-04-07 16:38:22,941 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22f2c49d2590021 Creating (or updating) unassigned node for 70236052 with OFFLINE state
> 2011-04-07 16:38:22,956 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server stopped; skipping assign of -ROOT-,,0.70236052 state=OFFLINE, ts=1302165502941
> 2011-04-07 16:38:32,746 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: 167-6-1-11:60000.timeoutMonitor exiting
> 2011-04-07 16:39:22,770 INFO org.apache.hadoop.hbase.master.LogCleaner: master-167-6-1-11:60000.oldLogCleaner exiting
>

Re: Master can't exit when open port failed

Posted by Stack <st...@duboce.net>.
Can you easiy reproduce?  It looks like the previous incarnation of
the Master had not shutdown before the new one started up.  Do you
have some kind of trigger-happy process babysitter running keeping an
eye over the master process?

St.Ack

2011/4/6 Gaojinchao <ga...@huawei.com>:
> When Hmaster crashed  and restart , The Hmaster is hung up.
>
>    // start up all service threads.
>    startServiceThreads();                                                                 ----this open port failed!
>
>    // Wait for region servers to report in.  Returns count of regions.
>    int regionCount = this.serverManager.waitForRegionServers();
>
>    // TODO: Should do this in background rather than block master startup
>    this.fileSystemManager.
>      splitLogAfterStartup(this.serverManager.getOnlineServers());
>
>    // Make sure root and meta assigned before proceeding.
> assignRootAndMeta();                                                               --- hung up this function, because of root can't be assigned.
>
>  if (!catalogTracker.verifyRootRegionLocation(timeout)) {
>      this.assignmentManager.assignRoot();
>      this.catalogTracker.waitForRoot();                                           --- This statement code is hung up.
>      assigned++;
> }
>
> Log is as:
>
> 2011-04-07 16:38:22,850 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
> 2011-04-07 16:38:22,908 INFO org.apache.hadoop.http.HttpServer: Port returned by webServer.getConnectors()[0].getLocalPort() before open() is -1. Opening the listener on 60010
> 2011-04-07 16:38:22,909 FATAL org.apache.hadoop.hbase.master.HMaster: Failed startup
> java.net.BindException: Address already in use
>         at sun.nio.ch.Net.bind(Native Method)
>         at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
>         at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
>         at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
>         at org.apache.hadoop.http.HttpServer.start(HttpServer.java:445)
>         at org.apache.hadoop.hbase.master.HMaster.startServiceThreads(HMaster.java:542)
>         at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:373)
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:278)
> 2011-04-07 16:38:22,910 INFO org.apache.hadoop.hbase.master.HMaster: Aborting
> 2011-04-07 16:38:22,911 INFO org.apache.hadoop.hbase.master.ServerManager: Exiting wait on regionserver(s) to checkin; count=0, stopped=true, count of regions out on cluster=0
> 2011-04-07 16:38:22,914 DEBUG org.apache.hadoop.hbase.master.MasterFileSystem: No log files to split, proceeding...
> 2011-04-07 16:38:22,930 INFO org.apache.hadoop.ipc.HbaseRPC: Server at 167-6-1-12/167.6.1.12:60020 could not be reached after 1 tries, giving up.
> 2011-04-07 16:38:22,930 INFO org.apache.hadoop.hbase.catalog.RootLocationEditor: Unsetting ROOT region location in ZooKeeper
> 2011-04-07 16:38:22,941 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:60000-0x22f2c49d2590021 Creating (or updating) unassigned node for 70236052 with OFFLINE state
> 2011-04-07 16:38:22,956 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Server stopped; skipping assign of -ROOT-,,0.70236052 state=OFFLINE, ts=1302165502941
> 2011-04-07 16:38:32,746 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: 167-6-1-11:60000.timeoutMonitor exiting
> 2011-04-07 16:39:22,770 INFO org.apache.hadoop.hbase.master.LogCleaner: master-167-6-1-11:60000.oldLogCleaner exiting
>