You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2013/02/04 19:25:48 UTC
svn commit: r1442290 - /hbase/trunk/src/docbkx/ops_mgt.xml
Author: stack
Date: Mon Feb 4 18:25:47 2013
New Revision: 1442290
URL: http://svn.apache.org/viewvc?rev=1442290&view=rev
Log:
HBASE-7758 Update book to include documentation of CellCounter utility
Modified:
hbase/trunk/src/docbkx/ops_mgt.xml
Modified: hbase/trunk/src/docbkx/ops_mgt.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/ops_mgt.xml?rev=1442290&r1=1442289&r2=1442290&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/ops_mgt.xml (original)
+++ hbase/trunk/src/docbkx/ops_mgt.xml Mon Feb 4 18:25:47 2013
@@ -265,16 +265,35 @@ row10 c1 c2
</para>
</section>
<section xml:id="rowcounter">
- <title>RowCounter</title>
- <para>RowCounter is a mapreduce job to count all the rows of a table. This is a good utility to use
- as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
- It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to
- exploit.
+ <title>RowCounter and CellCounter</title>
+ <para><ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html">RowCounter</ulink> is a
+ mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read
+ all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single
+ process but it will run faster if you have a MapReduce cluster in place for it to exploit.
<programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter <tablename> [<column1> <column2>...]
</programlisting>
</para>
- <para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
+ <para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
</para>
+ <para>HBase ships another diagnostic mapreduce job called
+ <ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html">CellCounter</ulink>. Like
+ RowCounter, it gathers more fine-grained statistics about your table. The statistics gathered by RowCounter are more fine-grained
+ and include:
+ <itemizedlist>
+ <listitem>Total number of rows in the table.</listitem>
+ <listitem>Total number of CFs across all rows.</listitem>
+ <listitem>Total qualifiers across all rows.</listitem>
+ <listitem>Total occurrence of each CF.</listitem>
+ <listitem>Total occurrence of each qualifier.</listitem>
+ <listitem>Total number of versions of each qualifier.</listitem>
+ </itemizedlist>
+ </para>
+ <para>The program allows you to limit the scope of the run. Provide a row regex or prefix to limit the rows to analyze. Use
+ <code>hbase.mapreduce.scan.column.family</code> to specify scanning a single column family.
+ <programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter <tablename> <outputDir> [regex or prefix]</programlisting>
+ </para>
+ <para>Note: just like RowCounter, caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the
+ job configuration. </para>
</section>
</section> <!-- tools -->