You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by Apache Wiki <wi...@apache.org> on 2007/03/13 19:02:27 UTC
[Lucene-hadoop Wiki] Update of "Hbase/HbaseArchitecture" by JimKellerman
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by JimKellerman:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture
------------------------------------------------------------------------------
* [#intro Introduction]
* [#datamodel Data Model]
+ * [#conceptual Conceptual View]
+ * [#physical Physical Storage View]
* [#hregion HRegion (Tablet) Server]
* [#master HBase Master Server]
* [#metadata META Table]
@@ -57, +59 @@
can get data by asking for the "most recent value as of a certain
time". Or, clients can fetch all available versions at once.
+ [[Anchor(conceptual)]]
+ == Conceptual View ==
+
+ Conceptually a table may be thought of a collection of rows that
+ are located by a row key (and optional timestamp) and where any column
+ may not have a value for a particular row key (sparse). The following example is a slightly modified form of the one on page 2 of the [http://labs.google.com/papers/bigtable.html Bigtable Paper].
+
+ [[Anchor(datamodelexample)]]
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"'' ||||<:> '''Column''' ''"anchor:"'' ||<:> '''Column''' ''"mime"'' ||
+ ||<^|5> "com.cnn.www" ||<:> t9 || ||<)> "anchor:cnnsi.com" ||<:> "CNN" || ||
+ ||<:> t8 || ||<)> "anchor:my.look.ca" ||<:> "CNN.com" || ||
+ ||<:> t6 ||<:> "<html>..." || || ||<:> "text/html" ||
+ ||<:> t5 ||<:> `"<html>..."` || || || ||
+ ||<:> t3 ||<:> `"<html>..."` || || || ||
+
+ [[Anchor(physical)]]
+ == Physical Storage View ==
+
+ Although, at a conceptual level, tables may be viewed as a sparse set
+ of rows, physically they are stored on a per-column basis. This is an
+ important consideration for schema and application designers to keep
+ in mind.
+
+ Pictorially, the table shown in the [#datamodelexample conceptual view] above would be stored as
+ follows:
+
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"contents:"'' ||
+ ||<^|3> "com.cnn.www" ||<:> t6 ||<:> "<html>..." ||
+ ||<:> t5 ||<:> `"<html>..."` ||
+ ||<:> t3 ||<:> `"<html>..."` ||
+
+ [[BR]]
+
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' |||| '''Column''' ''"anchor:"'' ||
+ ||<^|2> "com.cnn.www" ||<:> t9 ||<)> "anchor:cnnsi.com" ||<:> "CNN" ||
+ ||<:> t8 ||<)> "anchor:my.look.ca" ||<:> "CNN.com" ||
+
+ [[BR]]
+
+ ||<:> '''Row Key''' ||<:> '''Time Stamp''' ||<:> '''Column''' ''"mime:"'' ||
+ || "com.cnn.www" ||<:> t6 ||<:> "text/html" ||
+
+ [[BR]]
+
+ It is important to note in the diagram above that the empty cells
+ shown in the conceptual view are not stored. Thus a request for the
+ value of the ''"contents"'' column at time stamp ''t8'' would return
+ a null value. Similarly, a request for an ''"anchor"'' value at time
+ stamp ''t9'' for "my.look.ca" would return a null value.
+
+ However, if no timestamp is supplied, the most recent value for a
+ particular column would be returned and would also be the first one
+ found since time stamps are stored in descending order. Consequently
+ the value returned for ''"contents"'' if no time stamp is supplied is
+ the value for ''t6'' and the value for an ''"anchor"'' for
+ "my.look.ca" if no time stamp is supplied is the value for time stamp
+ ''t8''.
+
[[Anchor(hregion)]]
= HRegion (Tablet) Server =