You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/03/13 19:02:27 UTC

[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.

The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture

------------------------------------------------------------------------------
  
   * [#intro Introduction]
   * [#datamodel Data Model]
+   * [#conceptual Conceptual View]
+   * [#physical Physical Storage View]
   * [#hregion HRegion (Tablet) Server]
   * [#master HBase Master Server]
   * [#metadata META Table]
@@ -57, +59 @@

  can get data by asking for the "most recent value as of a certain
  time". Or, clients can fetch all available versions at once.
  
+ [[Anchor(conceptual)]]
+ == Conceptual View ==
+ 
+ Conceptually a table may be thought of a collection of rows that
+ are located by a row key (and optional timestamp) and where any column
+ may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable Paper].
+ 
+ [[Anchor(datamodelexample)]]
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"'' ||||<:> '''Column''' ''"anchor:"'' ||<:> '''Column''' ''"mime"'' ||
+ ||<^|5> "com.cnn.www" ||<:> t9 || ||<)> "anchor:cnnsi.com" ||<:> "CNN" || ||
+ ||<:> t8 || ||<)> "anchor:my.look.ca" ||<:> "CNN.com" || ||
+ ||<:> t6 ||<:> "<html>..." || || ||<:> "text/html" ||
+ ||<:> t5 ||<:> `"<html>..."` || || || ||
+ ||<:> t3 ||<:> `"<html>..."` || || || ||
+ 
+ [[Anchor(physical)]]
+ == Physical Storage View ==
+ 
+ Although, at a conceptual level, tables may be viewed as a sparse set
+ of rows, physically they are stored on a per-column basis. This is an
+ important consideration for schema and application designers to keep
+ in mind.
+ 
+ Pictorially, the table shown in the [#datamodelexample conceptual view] above would be stored as
+ follows:
+ 
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"'' ||
+ ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "<html>..." ||
+ ||<:> t5 ||<:> `"<html>..."` ||
+ ||<:> t3 ||<:> `"<html>..."` ||
+ 
+ [[BR]]
+ 
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' |||| '''Column''' ''"anchor:"'' ||
+ ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "anchor:cnnsi.com" ||<:> "CNN" ||
+ ||<:> t8 ||<)> "anchor:my.look.ca" ||<:> "CNN.com" ||
+ 
+ [[BR]]
+ 
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime:"'' ||
+ || "com.cnn.www" ||<:> t6 ||<:> "text/html" ||
+ 
+ [[BR]]
+ 
+ It is important to note in the diagram above that the empty cells
+ shown in the conceptual view are not stored. Thus a request for the
+ value of the ''"contents"'' column at time stamp ''t8'' would return
+ a null value. Similarly, a request for an ''"anchor"'' value at time
+ stamp ''t9'' for "my.look.ca" would return a null value.
+ 
+ However, if no timestamp is supplied, the most recent value for a
+ particular column would be returned and would also be the first one
+ found since time stamps are stored in descending order. Consequently
+ the value returned for ''"contents"'' if no time stamp is supplied is
+ the value for ''t6'' and the value for an ''"anchor"''  for
+ "my.look.ca" if no time stamp is supplied is the value for time stamp
+ ''t8''.
+  
  [[Anchor(hregion)]]
  = HRegion (Tablet) Server =