You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/11/18 00:37:12 UTC

[Hadoop Wiki] Update of "Hbase/MasterRewrite" by stack

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/MasterRewrite" page has been changed by stack.
http://wiki.apache.org/hadoop/Hbase/MasterRewrite?action=diff&rev1=14&rev2=15

--------------------------------------------------

  Current thinking is to keep region lifecycle all up in zookeeper but that won't scale.  Postulate 100k regions -- 100TB at 1G regions -- each with two or three possible states each with watchers for state change.  My guess is that this is too much to put in zk (Mahadev+Patrick say no if data is small).  TODO: how to manage transition from zk to .META.?  Also, can't do getClosest up in zk, only in .META.
  
  ===== Design =====
- Here is [[http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases#case2|Patrick's suggestion]].  We already keep a znode per regionserver though its named for the regionservers startcode.  On evaporation of the regionserver ephemeral node, master would run a reconciliation (or on assumption of master roll, new master would check state in zk making sure a regionserver per region) adding unassigned regions back to the unassigned pool.
+ Here is [[http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases#case2|Patrick's suggestion]].  We already keep a znode per regionserver though its named for the regionservers startcode -- see the 'rs' directory in 0.20.x zookeepers.  On evaporation of the regionserver ephemeral node, master would run a reconciliation (or on assumption of master roll, new master would check state in zk making sure a regionserver per region, etc.).
  
- All regions would be listed in .META. table always.  Whether they are online, splitting or closing, etc., would be up in zk.
+ All regions would be listed in .META. table always.  Whether they are online, splitting or closing, etc., would be up in zk.  So, figuring if something is unassigned would be case of a .META. table scan.  Anything not managed by zk, needs to be added in there (assigned).
+ 
+ ====== zk layout ======
+ Here is some cleanup of [[http://wiki.apache.org/hadoop/ZooKeeper/HBaseUseCases#case2|Patrick's suggestion]]
+ 
+ {{{
+ # First, redo the current 'rs' directory slightly:
+ /hbase/regionservers # master watches /regionservers for any child changes
+ /hbase/regionserver/<host:port:startcode> = <status> # As each region server becomes available to do work (or track state if up but not avail) it creates an ephemeral node; writes state (up/down).
+ # Master watches all /regionserver/<host:port:startcode> and cleans up if RS goes away or changes status
+ 
+ # Now, for regions
+ /hbase/regions/<regionserver by host:port:startcode> # Gets created when master notices new region server
+ # RS host:port watches this node for any child changes 
+ 
+ /hbase/regions/<regionserver by host:port:startcode>/<regionXYZ> # znode for each region assigned to RS host:port.
+ # RS host:port watches this node in case reassigned by master, or region changes state 
+ 
+ #
+ /tables/<regionserver by host:port:startcode>/<regionXYZ>/<state>-<seq#> # znode created by master
+ # seq ensures order seen by RS
+ # RS deletes old state znodes as it transitions out, oldest entry is the current state, always 1 or more znode here -- the current state 
+ }}}
+ 
+ ====== Questions ======
+ 
+ Should the region znode have state?  E.g. no flush, no compaction so we could do a backup by copying a region at a time?
  
  <<Anchor(clean)>>