You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2009/10/20 23:58:57 UTC
[Hadoop Wiki] Update of "Hbase/MasterRewrite" by stack

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hbase/MasterRewrite" page has been changed by stack.
http://wiki.apache.org/hadoop/Hbase/MasterRewrite?action=diff&rev1=8&rev2=9

--------------------------------------------------

  = Design Notes for Master Rewrite =
  
- Initial Master Rewrite design came of conversations had at the hbase hackathon held at StumbleUpon, August 5-7, 2009 ([[https://issues.apache.org/jira/secure/attachment/12418561/HBase+Hackathon+Notes+-+Sunday.pdf|Jon Gray kept notes]]).  The umbrella issue for the master rewrite is [[https://issues.apache.org/jira/browse/HBASE-1816|HBASE-1816]].  Timeline is hbase 0.20.1.
+ Initial Master Rewrite design came of conversations had at the hbase hackathon held at StumbleUpon, August 5-7, 2009 ([[https://issues.apache.org/jira/secure/attachment/12418561/HBase+Hackathon+Notes+-+Sunday.pdf|Jon Gray kept notes]]).  The umbrella issue for the master rewrite is [[https://issues.apache.org/jira/browse/HBASE-1816|HBASE-1816]].  Timeline is hbase 0.21.0.
  
  == Table of Contents ==
   * [[#now|What does the Master do now?]]
   * [[#problems|Problems with current Master]]
+  * [[#scope|Design Scope]]
   * [[#design|Design]]
    * [[#all|Move all region state transitions to zookeeper]]
    * [[#distinct|In Zookeeper, a State and a Schema section]]
+   * [[#clean|State changes are clean, minimal, and comprehensive]]
+   * [[#schema|Schema]]
+   * [[#balancer|Load Assignment/Balancer]]
+   * [[#root|Remove -ROOT-]]
+   * [[#root|Remove Heartbeat]]
+   * [[#root|Remove Safe Mode]]
+   * [[#intermediary|Further remove Master as necessary intermediary]]
  
  <<Anchor(now)>>
  == What does the Master do now? ==
@@ -28, +36 @@

  <<Anchor(problems)>>
  == Problems with current Master ==
  There is a good list in the [[https://issues.apache.org/jira/secure/ManageLinks.jspa?id=12434794|Issue Links]] section of HBASE-1816.
+ 
+ <<Anchor(scope)>>
+ == Design Scope ==
+  1. Rewrite of Master is for HBase 0.21
+  1. Design for:
+    1. Regionserver loading (TODO: These numbers don't make sense -- jgray do you remember what they were about?)
+      1. 200 regionservers
+      1. 32 outstanding wal logs per regionserver
+      1. 200 regions per regionserver being written to
+      1. 2GB or 30 hour full log roll
+      1. 10MB/sec write speed
+      1. 1.2M edits per 2G
+      1. 7k writes/second across cluster (?) -- whats this?  Wrong.
+      1. 1.2M edits per 30 hours?
+      1. 100 writes/sec across cluster (?) -- Whats this?  Wrong?
  	  
  <<Anchor(design)>>
  == Design ==
  
  <<Anchor(all)>>
  === Move all region state transitions to zookeeper ===
- Run state transitions by changing state in zookeeper rather than inside in Master
+ Run state transitions by changing state in zookeeper rather than inside in Master.
+ 
+ Keep up a region transition trail; regions move through states from unassigned to opening to open, etc.  A region can't jump states as in going from unassigned to open.
+ 
+ A problem we have in current master is that states do not form a circle.  Once a region is open, master stops keeping state; region state is moved to .META. table once assigned with its condition checked periodically by .META. table scan.  Makes for confusion and evil such as region double assignment because there are race condition potholes as we move from one system -- internal state maps in master -- to the other during update to state in .META.  Current thinking is to keep region lifecycle all up in zookeeper but that won't scale.  Postulate 100k regions -- 100TB at 1G regions -- each with two or three possible states each with watchers for state change is too much to put in a zk cluster.  TODO: how to manage transition from zk to .META.?
  
  <<Anchor(distinct)>>
  === In Zookeeper, a State and a Schema section ===
@@ -46, +73 @@

  /hbase/regionserver/to_open/{list of regions....}
  /hbase/regionservers/to_close/{list of regions...}
  
+ <<Anchor(clean)>>
  === State changes are clean, minimal, and comprehensive ===
  Currently, moving a region from opening to open may involve a region compaction -- i.e. a change to content in filesystem.  Better if modification of filesystem content was done when no question of ownership involved.
  
@@ -57, +85 @@

      private volatile boolean pendingClose = false;
      private volatile boolean closed = false;
      private volatile boolean offlined = false;}}}
+ Its incomplete.
  
+ <<Anchor(schema)>>
+ === Schema Edits ===
+ Move Table Schema from .META.
  
+ <<Anchor(balancer)>>
+ === Region Assignment/Balancer ===
+ Make it so don't need to put up a cluster to test balancing algorithm.
- == More Notes ==
- 
- For RS Worst case loading calculation
-  * 200 regionservers
-  * 32 logs per regionserver
-  * 200 regions written to
-  * 2GB or 30 hour full log roll
-  * 10MB/sec write speed
-  * 1.2M edits per 2G
-   * 7k writes/second across cluster (?) -- whats this?  Wrong.
-  * 1.2M edits per 30 hours?
-   * 100 writes/sec across cluster (?) -- Whats this?  Wrong?
  
  Assignment / balancing
   * RS publish load into ZK
@@ -90, +113 @@

   * To Open Queue
    * Regionservers watch their own to open queues /hbase/rsopen/region(extra_info, which hlogs to replay or it’s a split, etc)
  
+ <<Anchor(root)>>
+ === Remove -ROOT- ===
+ Remove -ROOT- from filesystem; have it only live up in zk (Possible now Region Historian feature has been removed).
+ 
+ <<Anchor(heartbeat)>>
+ === Remove Heartbeat ===
+ We don't need RegionServers pinging master every 3 seconds if zookeeper is intermediary.
+ 
+ <<Anchor(safemode)>>
+ === Remove Safe Mode ===
+ Safe mode is broke.  It doesn't do anything.  Remove it.
+ 
+ <<Anchor(intermediary)>>
+ === Further remove Master as necessary intermediary ===
+ Clients do not need to go via master administrating tables, changing schema or sending flush/compaction commands, etc.  Clients should be able to write direct to regionserver or to zk.
+ 
+ 
+ 
+ <<Anchor(misc)>>
+ == Miscellaneous ==
+ 
+  * At meetup we talked of moving .META. to zk and adding a getClosest to zk code base.  Thats been punted on for now.
+ 
+ 
  Administrative functions
   * Hadoop RPC listeners on Master and Regionservers -- master can now push messages
   * Clients and Master can talk to RS
@@ -102, +149 @@

    * Look at all regions to be assigned
    * Make a single decision for the assignment of all of these
  
- META to ZK 
-  * New feature in ZK
-   * Sorted tree or list node
-   * Has a getClosestBefore / getRegionForRow(row)
-   * META location stored in ZK
-   * META table now only contains historian information
-   * All other in ZK
  
  No more ROOT
  
@@ -127, +167 @@

    * If complex, punt to 0.22
    * Rather than storing with each region, stored once in ZK
  
- Uptime on UI
- 
- Admin stuff
-   * Straight from client ->regionserver
-   * No more heartbeat piggyback
- 
- Scaling documentation
-  * More conf settings for block cache
-  * How to adjust knobs
-