You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hbase.apache.org by st...@apache.org on 2013/02/04 19:25:48 UTC
svn commit: r1442290 - /hbase/trunk/src/docbkx/ops_mgt.xml

Author: stack
Date: Mon Feb  4 18:25:47 2013
New Revision: 1442290

URL: http://svn.apache.org/viewvc?rev=1442290&view=rev
Log:
HBASE-7758 Update book to include documentation of CellCounter utility

Modified:
    hbase/trunk/src/docbkx/ops_mgt.xml

Modified: hbase/trunk/src/docbkx/ops_mgt.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/ops_mgt.xml?rev=1442290&r1=1442289&r2=1442290&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/ops_mgt.xml (original)
+++ hbase/trunk/src/docbkx/ops_mgt.xml Mon Feb  4 18:25:47 2013
@@ -265,16 +265,35 @@ row10	c1	c2
        </para>
     </section>
     <section xml:id="rowcounter">
-       <title>RowCounter</title>
-       <para>RowCounter is a mapreduce job to count all the rows of a table.  This is a good utility to use
-           as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency.
-           It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to
-           exploit.
+       <title>RowCounter and CellCounter</title>
+       <para><ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/RowCounter.html">RowCounter</ulink> is a
+       mapreduce job to count all the rows of a table.  This is a good utility to use as a sanity check to ensure that HBase can read
+       all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single
+       process but it will run faster if you have a MapReduce cluster in place for it to exploit.
 <programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter &lt;tablename&gt; [&lt;column1&gt; &lt;column2&gt;...]
 </programlisting>
        </para>
-       <para>Note:  caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
+       <para>Note: caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the job configuration.
        </para>
+       <para>HBase ships another diagnostic mapreduce job called
+         <ulink url="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/CellCounter.html">CellCounter</ulink>. Like
+         RowCounter, it gathers more fine-grained statistics about your table. The statistics gathered by RowCounter are more fine-grained
+         and include:
+         <itemizedlist>
+           <listitem>Total number of rows in the table.</listitem>
+           <listitem>Total number of CFs across all rows.</listitem>
+           <listitem>Total qualifiers across all rows.</listitem>
+           <listitem>Total occurrence of each CF.</listitem>
+           <listitem>Total occurrence of each qualifier.</listitem>
+           <listitem>Total number of versions of each qualifier.</listitem>
+         </itemizedlist>
+       </para>
+       <para>The program allows you to limit the scope of the run. Provide a row regex or prefix to limit the rows to analyze. Use
+         <code>hbase.mapreduce.scan.column.family</code> to specify scanning a single column family.
+         <programlisting>$ bin/hbase org.apache.hadoop.hbase.mapreduce.CellCounter &lt;tablename&gt; &lt;outputDir&gt; [regex or prefix]</programlisting>
+       </para>
+       <para>Note: just like RowCounter, caching for the input Scan is configured via <code>hbase.client.scanner.caching</code> in the
+       job configuration. </para>
     </section>
 
     </section>  <!--  tools -->