You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hbase.apache.org by "Zheng Hu (JIRA)" <ji...@apache.org> on 2018/03/15 03:59:00 UTC
[jira] [Commented] (HBASE-20166) Make sure the RS/Master can works fine when using table based replication storage layer

    [ https://issues.apache.org/jira/browse/HBASE-20166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16399886#comment-16399886 ] 

Zheng Hu commented on HBASE-20166:
----------------------------------

There's a really big  problem here if we use table based replication to start a hbase cluster: 
1.  Start active master initialization . 
2.  Master wait  rs report in .
3.  Master assign meta region to one of the region servers . 
4.  Master create hbase:replication table if not exist. 

But the RS need to finish initialize the replication source & sink before finish startup( and the initialization of replication source & sink must finish before opening  region, because  we need to listen the wal  event, otherwise our replication may lost data),  and  when initialize the source & sink , we need to read hbase:replication table which hasn't been avaiable  because our master is waiting rs to be OK,  and the rs is waiting hbase:replication to be OK ... a dead loop happen again ... 

After discussed with [~zghaobac] offline,  I'm considering that try to assign all  system table to a rs which only accept regions of system table assignment (The rs will skip to initialize the replication source or sink )...

I've tried to start a mini cluster by setting hbase.balancer.tablesOnMaster.systemTablesOnly=true & hbase.balancer.tablesOnMaster=true , it seems not work. because currently  we initialize the master logic firstly, then region logic  for the HMaster process...  
 

> Make sure the RS/Master can works fine when using table based replication storage layer
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-20166
>                 URL: https://issues.apache.org/jira/browse/HBASE-20166
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>
> Currently,   we cannot setup the HBase Cluster because the master will list peers before finish its initialization, and if master cannot finish initialization, the meta cannot be online, in other hand, if meta cannot be online, the list peers will never success when using table based replication. a dead loop happen.
> {code}
> 2018-03-09 15:03:50,531 ERROR [M:0;huzheng-xiaomi:46549] helpers.MarkerIgnoringBase(159): ***** ABORTING master huzheng-xiaomi,46549,1520579026550: Unhandled exception. Starting shutdown. *****
> java.io.UncheckedIOException: org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location for replica 0
> 	at org.apache.hadoop.hbase.client.ResultScanner$1.hasNext(ResultScanner.java:55)
> 	at org.apache.hadoop.hbase.replication.TableReplicationPeerStorage.listPeerIds(TableReplicationPeerStorage.java:124)
> 	at org.apache.hadoop.hbase.master.replication.ReplicationPeerManager.create(ReplicationPeerManager.java:335)
> 	at org.apache.hadoop.hbase.master.HMaster.initializeZKBasedSystemTrackers(HMaster.java:737)
> 	at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:830)
> 	at org.apache.hadoop.hbase.master.HMaster.startActiveMasterManager(HMaster.java:2014)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:557)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)