You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by dm...@apache.org on 2011/12/02 21:57:36 UTC
svn commit: r1209688 - in /hbase/trunk/src/docbkx: book.xml
troubleshooting.xml
Author: dmeil
Date: Fri Dec 2 20:57:35 2011
New Revision: 1209688
URL: http://svn.apache.org/viewvc?rev=1209688&view=rev
Log:
hbase-4939 book.xml (architecture/faq), troubleshooting.xml (created resources section)
Modified:
hbase/trunk/src/docbkx/book.xml
hbase/trunk/src/docbkx/troubleshooting.xml
Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1209688&r1=1209687&r2=1209688&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Fri Dec 2 20:57:35 2011
@@ -1200,6 +1200,63 @@ if (!b) {
<chapter xml:id="architecture">
<title>Architecture</title>
+ <section xml:id="arch.overview">
+ <title>Overview</title>
+ <section xml:id="arch.overview.nosql">
+ <title>NoSQL?</title>
+ <para>HBase is a type of "NoSQL" database. "NoSQL" is a general term meaning that the database isn't an RDBMS which
+ supports SQL as it's primary access language, but there are many types of NoSQL databases: BerkeleyDB is an
+ example of a local NoSQL database, whereas HBase is very much a distributed database. Technically speaking,
+ HBase is really more a "Data Store" than "Data Base" because it lacks many of the features you find in an RDBMS,
+ such as typed columns, secondary indexes, triggers, and advanced query languages, etc.
+ </para>
+ <para>However, HBase has many features which supports both linear and modular scaling. HBase clusters expand
+ by adding RegionServers that are hosted on commodity class servers. If a cluster expands from 10 to 20
+ RegionServers, for example, it doubles both in terms of storage and as well as processing capacity.
+ RDBMS can scale well, but only up to a point - specifically, the size of a single database server - and for the best
+ performance requires specialized hardware and storage devices. HBase features of note are:
+ <itemizedlist>
+ <listitem>Strongly consistent reads/writes: HBase is not an "eventually consistent" DataStore. This
+ makes it very suitable for tasks such as high-speed counter aggregation. </listitem>
+ <listitem>Automatic sharding: HBase tables are distributed on the cluster via regions, and regions are
+ automatically split and re-distributed as your data grows.</listitem>
+ <listitem>Automatic RegionServer failover</listitem>
+ <listitem>Hadoop/HDFS Integration: HBase supports HDFS out of the box as it's distributed file system.</listitem>
+ <listitem>MapReduce: HBase supports massively parallelized processing via MapReduce for using HBase as both
+ source and sink.</listitem>
+ <listitem>Java Client API: HBase supports an easy to use Java API for programmatic access.</listitem>
+ <listitem>Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends.</listitem>
+ <listitem>Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization.</listitem>
+ <listitem>Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.</listitem>
+ </itemizedlist>
+ </para>
+ </section>
+
+ <section xml:id="arch.overview.when">
+ <title>When Should I Use HBase?</title>
+ <para>First, make sure you have enough data. HBase isn't suitable for every problem. If you have
+ hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few
+ thousand/million rows, then using a traditional RDBMS might be a better choice due to the
+ fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
+ be sitting idle.
+ </para>
+ <para>Second, make sure you have enough hardware. Even HDFS doesn't do well with anything less than
+ 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
+ </para>
+ <para>HBase can run quite well stand-alone on a laptop - but this should be considered a development
+ configuration only.
+ </para>
+ </section>
+ <section xml:id="arch.overview.hbasehdfs">
+ <title>What Is The Difference Between HBase and Hadoop/HDFS?</title>
+ <para><link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files.
+ It's documentation states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
+ HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables.
+ This can sometimes be a point of conceptual confusion. HBase internally puts your data in indexed "StoreFiles" that exist
+ on HDFS for high-speed lookups. See the <xref linkend="datamodel" /> and the rest of this chapter for more information on how HBase achieves its goals.
+ </para>
+ </section>
+ </section>
<section xml:id="arch.catalog">
<title>Catalog Tables</title>
@@ -2000,17 +2057,7 @@ hbase> describe 't1'</programlisting>
<qandaentry>
<question><para>When should I use HBase?</para></question>
<answer>
- <para>
- Anybody can download and give HBase a spin, even on a laptop. The scope of this answer is when
- would it be best to use HBase in a <emphasis>real</emphasis> deployment.
- </para>
- <para>First, make sure you have enough hardware. Even HDFS doesn't do well with anything less than
- 5 DataNodes (due to things such as HDFS block replication which has a default of 3), plus a NameNode.
- Second, make sure you have enough data. HBase isn't suitable for every problem. If you have
- hundreds of millions or billions of rows, then HBase is a good candidate. If you only have a few
- thousand/million rows, then using a traditional RDBMS might be a better choice due to the
- fact that all of your data might wind up on a single node (or two) and the rest of the cluster may
- be sitting idle.
+ <para>See the <xref linkend="arch.overview" /> in the Architecture chapter.
</para>
</answer>
</qandaentry>
@@ -2031,17 +2078,6 @@ hbase> describe 't1'</programlisting>
</para>
</answer>
</qandaentry>
- <qandaentry xml:id="faq.hdfs.hbase">
- <question><para>How does HBase work on top of HDFS?</para></question>
- <answer>
- <para>
- <link xlink:href="http://hadoop.apache.org/hdfs/">HDFS</link> is a distributed file system that is well suited for the storage of large files. It's documentation
- states that it is not, however, a general purpose file system, and does not provide fast individual record lookups in files.
- HBase, on the other hand, is built on top of HDFS and provides fast record lookups (and updates) for large tables. This can sometimes be a point of conceptual confusion.
- See the <xref linkend="datamodel" /> and <xref linkend="architecture" /> sections for more information on how HBase achieves its goals.
- </para>
- </answer>
- </qandaentry>
</qandadiv>
<qandadiv xml:id="faq.config"><title>Configuration</title>
<qandaentry xml:id="faq.config.started">
@@ -2109,6 +2145,16 @@ hbase> describe 't1'</programlisting>
</answer>
</qandaentry>
</qandadiv>
+ <qandadiv xml:id="faq.mapreduce"><title>MapReduce</title>
+ <qandaentry xml:id="faq.mapreduce.use">
+ <question><para>How can I use MapReduce with HBase?</para></question>
+ <answer>
+ <para>
+ See <xref linkend="mapreduce" />
+ </para>
+ </answer>
+ </qandaentry>
+ </qandadiv>
<qandadiv><title>Performance and Troubleshooting</title>
<qandaentry>
<question><para>
Modified: hbase/trunk/src/docbkx/troubleshooting.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/troubleshooting.xml?rev=1209688&r1=1209687&r2=1209688&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/troubleshooting.xml (original)
+++ hbase/trunk/src/docbkx/troubleshooting.xml Fri Dec 2 20:57:35 2011
@@ -196,6 +196,28 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:M
</para>
</section>
</section>
+ <section xml:id="trouble.resources">
+ <title>Resources</title>
+ <section xml:id="trouble.resources.lists">
+ <title>Dist-Lists</title>
+ <para>Sign up for the <link xlink:href="http://hbase.apache.org/mail-lists.html">HBase Dist-Lists</link> and post a question. 'Dev' is aimed at the
+ community of developers actually building HBase and for features currently under development, and 'User' for generally used for questions on released
+ versions of HBase.
+ </para>
+ </section>
+ <section xml:id="trouble.resources.searchhadoop">
+ <title>search-hadoop.com</title>
+ <para>
+ <link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and is great for historical searches.
+ </para>
+ </section>
+ <section xml:id="trouble.resources.jira">
+ <title>JIRA</title>
+ <para>
+ <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link> is also really helpful when looking for Hadoop/HBase-specific issues.
+ </para>
+ </section>
+ </section>
<section xml:id="trouble.tools">
<title>Tools</title>
<section xml:id="trouble.tools.builtin">
@@ -221,12 +243,6 @@ export HBASE_OPTS="-XX:NewSize=64m -XX:M
</section>
<section xml:id="trouble.tools.external">
<title>External Tools</title>
- <section xml:id="trouble.tools.searchhadoop">
- <title>search-hadoop.com</title>
- <para>
- <link xlink:href="http://search-hadoop.com">search-hadoop.com</link> indexes all the mailing lists and <link xlink:href="https://issues.apache.org/jira/browse/HBASE">JIRA</link>, itâs really helpful when looking for Hadoop/HBase-specific issues.
- </para>
- </section>
<section xml:id="trouble.tools.tail">
<title>tail</title>
<para>