You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by "shanmuganathan.r" <sh...@zohocorp.com> on 2011/08/08 13:41:34 UTC

How can i test the Multi master environment?

Hi All,

      I have a problem in my hbase fully distributed mode with four node cluster. I am using two master in my configuration, one is active master and another one is the backup master . 




i)  If I stop the hbase by using the stop-hbase.sh command the log printed in the end of my master log is



2011-08-08 16:05:04,897 INFO org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor: rohinis.zohocorpin.com:60000.timeoutMonitor exiting
2011-08-08 16:05:04,897 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Closed zookeeper sessionid=0x231a8f181f60000
2011-08-08 16:05:04,907 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2011-08-08 16:05:04,907 INFO org.apache.zookeeper.ZooKeeper: Session: 0x231a8f181f60000 closed
2011-08-08 16:05:04,914 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2011-08-08 16:05:04,915 INFO org.apache.zookeeper.ZooKeeper: Session: 0x131a8f11a570000 closed
2011-08-08 16:05:04,915 INFO org.apache.hadoop.hbase.master.HMaster: HMaster main thread exiting


-------------------------
 
ii)  If I kill the master by using the kill 15914 or kill -9  15914


no logs printed in my master log


-------------------------


iii)  If I stop the master by using ./bin/hbase-daemon.sh stop master command the log printed in the end of my master log is


2011-08-08 16:46:03,035 INFO org.apache.hadoop.hbase.master.AssignmentManager: Bulk assigning done
2011-08-08 16:46:03,037 INFO org.apache.hadoop.hbase.master.HMaster: Master has completed initialization
2011-08-08 16:46:03,045 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: Scanned 1 catalog row(s) and gc'd 0 unreferenced parent region(s)
Mon Aug  8 16:49:54 IST 2011 Killing master



--------------------------


In the (i) case the whole hbase cluster is stopped.


In the (ii) case the master only killed but the Regionservers are not assign to the backup master and the backup master is waiting for ZNode to be written


In the (iii)  case also the master only killed but the Regionservers are not assign to the backup master and the backup master is waiting for the ZNode to be written


In the (ii) and (iii) cases, Is the master properly killed?
If the master is properly killed, than why the region servers are unable to connect to the backup master ?
If the master is not properly killed, than how to kill the process of master for test this environment ?






-----------------------------


My Regionserver log is while kill -9 (master process)


2011-08-08 16:48:20,987 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Server handler 9 on 60020: starting
2011-08-08 16:48:20,987 INFO org.apache.hadoop.hbase.regionserver.StoreFile: Allocating LruBlockCache with maximum size 199.4m
2011-08-08 16:48:23,901 INFO org.apache.hadoop.hbase.zookeeper.MetaNodeTracker: Detected completed assignment of META, notifying catalog tracker
2011-08-08 16:48:23,934 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open 0 region(s)
2011-08-08 16:53:18,263 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
        at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
        at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:328)
        at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:883)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:750)
        at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:257)
        at $Proxy5.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
        at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.getMaster(HRegionServer.java:1445)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.tryRegionServerReport(HRegionServer.java:737)
        at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:586)
        at java.lang.Thread.run(Thread.java:636)
2011-08-08 16:53:20,992 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: LRU Stats: total=957.86 KB, free=198.43 MB, max=199.36 MB, blocks=0, accesses=0, hits=0, hitRatio=�%, cachingAccesses=0, cachingHits=0, cachingHitsRatio=�%, evictions=0, evicted=0, evictedPerRun=NaN
2011-08-08 16:54:21,349 WARN org.apache.hadoop.hbase.regionserver.HRegionServer: Unable to connect to master. Retrying. Error was:
java.net.ConnectException: Connection refused




---------------------------


My Backup master log in all time


2011-08-08 16:48:25,697 INFO org.apache.hadoop.hbase.metrics: MetricsString added: url
2011-08-08 16:48:25,697 INFO org.apache.hadoop.hbase.metrics: MetricsString added: version
2011-08-08 16:48:25,697 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2011-08-08 16:48:25,697 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2011-08-08 16:48:25,697 INFO org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized
2011-08-08 16:48:25,698 DEBUG org.apache.hadoop.hbase.master.HMaster: HMaster started in backup mode.  Stalling until master znode is written.
2011-08-08 16:48:25,698 DEBUG org.apache.hadoop.hbase.master.HMaster: Waiting for master address ZNode to be written (Also watching cluster state node)
2011-08-08 16:51:25,698 DEBUG org.apache.hadoop.hbase.master.HMaster: Waiting for master address ZNode to be written (Also watching cluster state node)
2011-08-08 16:54:25,698 DEBUG org.apache.hadoop.hbase.master.HMaster: Waiting for master address ZNode to be written (Also watching cluster state node)
2011-08-08 16:57:25,698 DEBUG org.apache.hadoop.hbase.master.HMaster: Waiting for master address ZNode to be written (Also watching cluster state node)





Thanks in Advance for your valuable suggestions..................




Regards,

Shanmuganathan

Re: How can i test the Multi master environment?

Posted by "shanmuganathan.r" <sh...@zohocorp.com>.

Hi  Stack,

      Thank you for your reply....................



---- On Tue, 09 Aug 2011 03:57:30 +0530 Stack&lt;stack@duboce.net&gt; wrote ---- 


On Mon, Aug 8, 2011 at 4:41 AM, shanmuganathan.r 
&lt;shanmuganathan.r@zohocorp.com&gt; wrote: 
&gt; In the (i) case the whole hbase cluster is stopped. 
&gt; 
 
Yes. 
 
&gt; 
&gt; In the (ii) case the master only killed but the Regionservers are not assign to the backup master and the backup master is waiting for ZNode to be written 
&gt; 
 
This is not right. Whats supposed to happen is that the first 
master's znode is meant to expire in zk and then the second master 
assumes the original's role. Did the first master come up fine. You 
waited long enough for the first master's znode to expire? 
 
What version of hbase? 
I am using the HBase 0.90.1 How much time it will take to expire the first master Znode expire ?
 
 
&gt; 
&gt; In the (iii)  case also the master only killed but the Regionservers are not assign to the backup master and the backup master is waiting for the ZNode to be written 
&gt; 
&gt; 
&gt; In the (ii) and (iii) cases, Is the master properly killed? 
 
 
Yes. I'd think so. 
 
 
&gt; If the master is properly killed, than why the region servers are unable to connect to the backup master ? 
 
They are trying to connect to the first master and will continue so 
until the master's address is updated in zk when the secondary assumes 
master role. 
 
St.Ack 
Regards


Shanmuganathan

Re: How can i test the Multi master environment?

Posted by Stack <st...@duboce.net>.

On Mon, Aug 8, 2011 at 4:41 AM, shanmuganathan.r
<sh...@zohocorp.com> wrote:
> In the (i) case the whole hbase cluster is stopped.
>

Yes.

>
> In the (ii) case the master only killed but the Regionservers are not assign to the backup master and the backup master is waiting for ZNode to be written
>

This is not right.  Whats supposed to happen is that the first
master's znode is meant to expire in zk and then the second master
assumes the original's role.  Did the first master come up fine.   You
waited long enough for the first master's znode to expire?

What version of hbase?

>
> In the (iii)  case also the master only killed but the Regionservers are not assign to the backup master and the backup master is waiting for the ZNode to be written
>
>
> In the (ii) and (iii) cases, Is the master properly killed?

Yes. I'd think so.

> If the master is properly killed, than why the region servers are unable to connect to the backup master ?

They are trying to connect to the first master and will continue so
until the master's address is updated in zk when the secondary assumes
master role.

St.Ack