You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/10/10 19:41:54 UTC
svn commit: r1181091 - in /hbase/trunk/src/docbkx: book.xml ops_mgt.xml
Author: dmeil
Date: Mon Oct 10 17:41:53 2011
New Revision: 1181091
URL: http://svn.apache.org/viewvc?rev=1181091&view=rev
Log:
HBASE-4566 book.xml,ops_mgt.xml - KeyValue documentation
Modified:
hbase/trunk/src/docbkx/book.xml
hbase/trunk/src/docbkx/ops_mgt.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1181091&r1=1181090&r2=1181091&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Mon Oct 10 17:41:53 2011
@@ -312,7 +312,7 @@ public static class MyReducer extends Ta
<para>A good general introduction on the strength and weaknesses modelling on
the various non-rdbms datastores is Ian Varleys' Master thesis,
<link xlink:href="http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf">No Relation: The Mixed Blessings of Non-Relational Databases</link>.
- Recommended.
+ Recommended. Also, read <xref linkend="keyvalue"/> for how HBase stores data internally.
</para>
<section xml:id="schema.creation">
<title>
@@ -400,7 +400,7 @@ admin.enableTable(table);
</para>
<para>Most of the time small inefficiencies don't matter all that much. Unfortunately,
this is a case where they do. Whatever patterns are selected for ColumnFamilies, attributes, and rowkeys they could be repeated
- several billion times in your data</para>
+ several billion times in your data. See <xref linkend="keyvalue"/> for more information on HBase stores data internally.</para>
<section xml:id="keysize.cf"><title>Column Families</title>
<para>Try to keep the ColumnFamily names as small as possible, preferably one character (e.g. "d" for data/default).
</para>
@@ -1615,6 +1615,8 @@ scan.setFilter(filter);
Schubert Zhang's blog post on <link xlink:ref="http://cloudepr.blogspot.com/2009/09/hfile-block-indexed-file-format-to.html">HFile: A Block-Indexed File Format to Store Sorted Key-Value Pairs</link> makes for a thorough introduction to HBase's hfile. Matteo Bertozzi has also put up a
helpful description, <link xlink:href="http://th30z.blogspot.com/2011/02/hbase-io-hfile.html?spref=tw">HBase I/O: HFile</link>.
</para>
+ <para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFile.html">HFile source code</link>.
+ </para>
</section>
<section xml:id="hfile_tool">
@@ -1631,6 +1633,40 @@ scan.setFilter(filter);
tool.</para>
</section>
</section>
+ <section xml:id="hfile.blocks">
+ <title>Blocks</title>
+ <para>StoreFiles are composed of blocks. The blocksize is configured on a per-ColumnFamily basis.
+ </para>
+ <para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/io/hfile/HFileBlock.html">HFileBlock source code</link>.
+ </para>
+ </section>
+ <section xml:id="keyvalue">
+ <title>KeyValue</title>
+ <para>The KeyValue class is the heart of data storage in HBase. KeyValue wraps a byte array and takes offsets and lengths into passed array
+ at where to start interpreting the content as KeyValue.
+ </para>
+ <para>The KeyValue format inside a byte array is:
+ <itemizedlist>
+ <listitem>keylength</listitem>
+ <listitem>valuelength</listitem>
+ <listitem>key</listitem>
+ <listitem>value</listitem>
+ </itemizedlist>
+ </para>
+ <para>The Key is further decomposed as:
+ <itemizedlist>
+ <listitem>rowlength</listitem>
+ <listitem>row (i.e., the rowkey)</listitem>
+ <listitem>columnfamilylength</listitem>
+ <listitem>columnfamily</listitem>
+ <listitem>columnqualifier</listitem>
+ <listitem>timestamp</listitem>
+ <listitem>keytype (e.g., Put, Delete)</listitem>
+ </itemizedlist>
+ </para>
+ <para>For more information, see the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/KeyValue.html">KeyValue source code</link>.
+ </para>
+ </section>
<section xml:id="compaction">
<title>Compaction</title>
<para>There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent
Modified: hbase/trunk/src/docbkx/ops_mgt.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/ops_mgt.xml?rev=1181091&r1=1181090&r2=1181091&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/ops_mgt.xml (original)
+++ hbase/trunk/src/docbkx/ops_mgt.xml Mon Oct 10 17:41:53 2011
@@ -301,6 +301,32 @@ false
<para>Since the cluster is up, there is a risk that edits could be missed in the export process.
</para>
</section>
+ </section> <!-- backup -->
+ <section xml:id="ops.capacity"><title>Capacity Planning</title>
+ <section xml:id="ops.capacity.storage"><title>Storage</title>
+ <para>A common question for HBase administrators is estimating how much storage will be required for an HBase cluster.
+ There are several apsects to consider, the most important of which is what data load into the cluster. Start
+ with a solid understanding of how HBase handles data internally (KeyValue).
+ </para>
+ <section xml:id="ops.capacity.storage.kv"><title>KeyValue</title>
+ <para>HBase storage will be dominated by KeyValues. See <xref linkend="keyvalue" /> and <xref linkend="keysize" /> for
+ how HBase stores data internally.
+ </para>
+ <para>It is critical to understand that there is a KeyValue instance for every attribute stored in a row, and the
+ rowkey-length, ColumnFamily name-length and attribute lengths will drive the size of the database more than any other
+ factor.
+ </para>
+ </section>
+ <section xml:id="ops.capacity.storage.sf"><title>StoreFiles and Blocks</title>
+ <para>KeyValue instances are aggregated into blocks, and the blocksize is configurable on a per-ColumnFamily basis.
+ Blocks are aggregated into StoreFile's. See <xref linkend="regions.arch" />.
+ </para>
+ </section>
+ <section xml:id="ops.capacity.storage.hdfs"><title>HDFS Block Replication</title>
+ <para>Because HBase runs on top of HDFS, factor in HDFS block replication into storage calculations.
+ </para>
+ </section>
+ </section>
</section>
</chapter>