You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by bi...@apache.org on 2011/11/15 22:08:06 UTC

svn commit: r798728 - in /websites/production/accumulo: ./ content/accumulo/user_manual_1.3-incubating/ content/accumulo/user_manual_1.4-incubating/

Author: billie
Date: Tue Nov 15 21:08:06 2011
New Revision: 798728

Log:
updated manuals

Added:
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img6.png
      - copied unchanged from r798727, websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img6.png
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img7.png
      - copied unchanged from r798727, websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img7.png
Modified:
    websites/production/accumulo/   (props changed)
    websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Contents.html
    websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html
    websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Contents.html
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Design.html
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img2.png
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img3.png
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img4.png
    websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img5.png

Propchange: websites/production/accumulo/
------------------------------------------------------------------------------
--- svn:mergeinfo (original)
+++ svn:mergeinfo Tue Nov 15 21:08:06 2011
@@ -1 +1 @@
-/websites/staging/accumulo/trunk:797863-798662
+/websites/staging/accumulo/trunk:797863-798727

Modified: websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Contents.html
==============================================================================
--- websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Contents.html (original)
+++ websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Contents.html Tue Nov 15 21:08:06 2011
@@ -165,19 +165,14 @@
 <ul>
 <li><a href="Table_Configuration.html#Setting_Iterators_via_the_Shell">Setting Iterators via the Shell</a></li>
 <li><a href="Table_Configuration.html#Setting_Iterators_Programmatically">Setting Iterators Programmatically</a></li>
+<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning Iterators and Timestamps</a></li>
+<li><a href="Table_Configuration.html#Filtering_Iterators">Filtering Iterators</a></li>
 </ul>
 </li>
 <li>
-<p><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning Iterators and Timestamps</a></p>
-<ul>
-<li><a href="Table_Configuration.html#Logical_Time">Logical Time</a></li>
-<li><a href="Table_Configuration.html#Deletes">Deletes</a></li>
-</ul>
-</li>
-<li>
-<p><a href="Table_Configuration.html#Filtering_Iterators">Filtering Iterators</a></p>
+<p><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></p>
 </li>
-<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Block_Cache">Block Cache</a></li>
 </ul>
 </li>
 <li>

Modified: websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html
==============================================================================
--- websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html (original)
+++ websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html Tue Nov 15 21:08:06 2011
@@ -479,7 +479,7 @@
 </p>
 <div class="codehilite"><pre><span class="err">usage:</span> <span class="err">listscans</span> <span class="err">[-?]</span> <span class="err">[-np]</span> <span class="err">[-ts</span> <span class="err">&lt;tablet</span> <span class="err">server&gt;]</span>   
 <span class="err">description:</span> <span class="err">list</span> <span class="err">what</span> <span class="err">scans</span> <span class="err">are</span> <span class="err">currently</span> <span class="err">running</span> <span class="err">in</span> <span class="err">accumulo.</span> <span class="err">See</span> <span class="err">the</span>   
-       <span class="err">accumulo.core.client.admin.ActiveScan</span> <span class="err">javadoc</span> <span class="err">for</span> <span class="err">more</span> <span class="err">information</span>   
+       <span class="err">org.apache.accumulo.core.client.admin.ActiveScan</span> <span class="err">javadoc</span> <span class="err">for</span> <span class="err">more</span> <span class="err">information</span>   
        <span class="err">about</span> <span class="err">columns.</span>   
   <span class="err">-?,-help</span>  <span class="err">display</span> <span class="err">this</span> <span class="err">help</span>   
   <span class="err">-np,-no-pagination</span>  <span class="err">disables</span> <span class="err">pagination</span> <span class="err">of</span> <span class="err">output</span>   

Modified: websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html
==============================================================================
--- websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html (original)
+++ websites/production/accumulo/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html Tue Nov 15 21:08:06 2011
@@ -100,9 +100,8 @@
 <li><a href="Table_Configuration.html#Constraints">Constraints</a></li>
 <li><a href="Table_Configuration.html#Bloom_Filters">Bloom Filters</a></li>
 <li><a href="Table_Configuration.html#Iterators">Iterators</a></li>
-<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning Iterators and Timestamps</a></li>
-<li><a href="Table_Configuration.html#Filtering_Iterators">Filtering Iterators</a></li>
 <li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Block_Cache">Block Cache</a></li>
 </ul>
 <hr />
 <h2 id="a_idtable_configurationa_table_configuration"><a id=Table_Configuration></a> Table Configuration</h2>
@@ -208,7 +207,7 @@ accumulo/docs/examples/README.bloom . </
 
 
 <p>Tables support separate Iterator settings to be applied at scan time, upon minor compaction and upon major compaction. For most uses, tables will have identical iterator settings for all three to avoid inconsistent results. </p>
-<h2 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h2>
+<h3 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h3>
 <p>Accumulo provides the capability to manage versioned data through the use of timestamps within the Key. If a timestamp is not specified in the key created by the client then the system will set the timestamp to the current time. Two keys with identical rowIDs and columns but different timestamps are considered two versions of the same key. If two inserts are made into accumulo with the same rowID, column, and timestamp, then the behavior is non-deterministic. </p>
 <p>Timestamps are sorted in descending order, so the most recent data comes first. Accumulo can be configured to return the top k versions, or versions later than a given date. The default is to return the one most recent version. </p>
 <p>The version policy can be changed by changing the VersioningIterator options for a table as follows: </p>
@@ -223,16 +222,16 @@ accumulo/docs/examples/README.bloom . </
 </pre></div>
 
 
-<h3 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical Time</h3>
+<h4 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical Time</h4>
 <p>Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps set by accumulo always move forward. This helps avoid problems caused by TabletServers that have different time settings. The per tablet counter gives unique one up time stamps on a per mutation basis. When using time in milliseconds, if two things arrive within the same millisecond then both receive the same timestamp. </p>
 <p>A table can be configured to use logical timestamps at creation time as follows: </p>
 <div class="codehilite"><pre><span class="n">user</span><span class="nv">@myinstance</span><span class="o">&gt;</span> <span class="n">createtable</span> <span class="o">-</span><span class="n">tl</span> <span class="n">logical</span>
 </pre></div>
 
 
-<h3 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h3>
+<h4 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h4>
 <p>Deletes are special keys in accumulo that get sorted along will all the other data. When a delete key is inserted, accumulo will not show anything that has a timestamp less than or equal to the delete key. During major compaction, any keys older than a delete key are omitted from the new file created, and the omitted keys are removed from disk as part of the regular garbage collection process. </p>
-<h2 id="a_idfiltering_iteratorsa_filtering_iterators"><a id=Filtering_Iterators></a> Filtering Iterators</h2>
+<h3 id="a_idfiltering_iteratorsa_filtering_iterators"><a id=Filtering_Iterators></a> Filtering Iterators</h3>
 <p>When scanning over a set of key-value pairs it is possible to apply an arbitrary filtering policy through the use of a FilteringIterator. These types of iterators return only key-value pairs that satisfy the filter logic. Accumulo has two built-in filtering iterators that can be configured on any table: AgeOff and RegEx. More can be added by writing a Java class that implements the <br />
 org.apache.accumulo.core.iterators.filter.Filter interface. </p>
 <p>To configure the AgeOff filter to remove data older than a certain date or a fixed amount of time from the present. The following example sets a table to delete everything inserted over 30 seconds ago: </p>
@@ -338,7 +337,22 @@ org.apache.accumulo.core.iterators.filte
 <p>Additional Aggregators can be added by creating a Java class that implements <br />
 <strong>org.apache.accumulo.core.iterators.aggregation.Aggregator</strong> and adding a jar containing that class to Accumulo's lib directory. </p>
 <p>An example of an aggregator can be found under <br />
-accumulo/src/examples/main/java/accumulo/examples/aggregation/SortedSetAggregator.java </p>
+accumulo/src/examples/main/java/org/apache/accumulo/examples/aggregation/SortedSetAggregator.java </p>
+<h2 id="a_idblock_cachea_block_cache"><a id=Block_Cache></a> Block Cache</h2>
+<p>In order to increase throughput of commonly accessed entries, Accumulo employs a block cache. This block cache buffers data in memory so that it doesn't have to be read off of disk. The RFile format that Accumulo prefers is a mix of index blocks and data blocks, where the index blocks are used to find the appropriate data blocks. Typical queries to Accumulo result in a binary search over several index blocks followed by a linear scan of one or more data blocks. </p>
+<p>The block cache can be configured on a per-table basis, and all tablets hosted on a tablet server share a single resource pool. To configure the size of the tablet server's block cache, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">size:</span> <span class="n">Specifies</span> <span class="n">the</span> <span class="n">size</span> <span class="n">of</span> <span class="n">the</span> <span class="n">cache</span> <span class="k">for</span> <span class="n">file</span> <span class="n">data</span> <span class="n">blocks</span><span class="o">.</span>
+<span class="n">tserver</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="nb">index</span><span class="o">.</span><span class="n">size:</span> <span class="n">Specifies</span> <span class="n">the</span> <span class="n">size</span> <span class="n">of</span> <span class="n">the</span> <span class="n">cache</span> <span class="k">for</span> <span class="n">file</span> <span class="n">indices</span><span class="o">.</span>
+</pre></div>
+
+
+<p>To enable the block cache for your table, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">table</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="n">block</span><span class="o">.</span><span class="n">enable:</span> <span class="n">Determines</span> <span class="n">whether</span> <span class="n">file</span> <span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="n">block</span> <span class="n">cache</span> <span class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+<span class="n">table</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="nb">index</span><span class="o">.</span><span class="n">enable:</span> <span class="n">Determines</span> <span class="n">whether</span> <span class="nb">index</span> <span class="n">cache</span> <span class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+</pre></div>
+
+
+<p>The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency. It is enabled by default for the !METADATA table. </p>
 <hr />
 <p><strong> Next:</strong> <a href="Table_Design.html">Table Design</a> <strong> Up:</strong> <a href="accumulo_user_manual.html">Accumulo User Manual Version 1.3</a> <strong> Previous:</strong> <a href="Writing_Accumulo_Clients.html">Writing Accumulo Clients</a>   <strong> <a href="Contents.html">Contents</a></strong></p>
   </div>

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Contents.html
==============================================================================
--- websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Contents.html (original)
+++ websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Contents.html Tue Nov 15 21:08:06 2011
@@ -166,19 +166,15 @@
 <ul>
 <li><a href="Table_Configuration.html#Setting_Iterators_via_the_Shell">Setting Iterators via the Shell</a></li>
 <li><a href="Table_Configuration.html#Setting_Iterators_Programmatically">Setting Iterators Programmatically</a></li>
+<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning Iterators and Timestamps</a></li>
+<li><a href="Table_Configuration.html#Filters">Filters</a></li>
+<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
 </ul>
 </li>
 <li>
-<p><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning Iterators and Timestamps</a></p>
-<ul>
-<li><a href="Table_Configuration.html#Logical_Time">Logical Time</a></li>
-<li><a href="Table_Configuration.html#Deletes">Deletes</a></li>
-</ul>
-</li>
-<li>
-<p><a href="Table_Configuration.html#Filters">Filters</a></p>
+<p><a href="Table_Configuration.html#Block_Cache">Block Cache</a></p>
 </li>
-<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Compaction">Compaction</a></li>
 <li><a href="Table_Configuration.html#Pre-splitting_tables">Pre-splitting tables</a></li>
 <li><a href="Table_Configuration.html#Merging_tablets">Merging tablets</a></li>
 <li><a href="Table_Configuration.html#Delete_Range">Delete Range</a></li>

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html
==============================================================================
--- websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html (original)
+++ websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html Tue Nov 15 21:08:06 2011
@@ -545,7 +545,7 @@
 </p>
 <div class="codehilite"><pre><span class="err">usage:</span> <span class="err">listscans</span> <span class="err">[-?]</span> <span class="err">[-np]</span> <span class="err">[-ts</span> <span class="err">&lt;tablet</span> <span class="err">server&gt;]</span>   
 <span class="err">description:</span> <span class="err">list</span> <span class="err">what</span> <span class="err">scans</span> <span class="err">are</span> <span class="err">currently</span> <span class="err">running</span> <span class="err">in</span> <span class="err">accumulo.</span> <span class="err">See</span> <span class="err">the</span>   
-       <span class="err">accumulo.core.client.admin.ActiveScan</span> <span class="err">javadoc</span> <span class="err">for</span> <span class="err">more</span> <span class="err">information</span>   
+       <span class="err">org.apache.accumulo.core.client.admin.ActiveScan</span> <span class="err">javadoc</span> <span class="err">for</span> <span class="err">more</span> <span class="err">information</span>   
        <span class="err">about</span> <span class="err">columns.</span>   
   <span class="err">-?,-help</span>  <span class="err">display</span> <span class="err">this</span> <span class="err">help</span>   
   <span class="err">-np,-no-pagination</span>  <span class="err">disables</span> <span class="err">pagination</span> <span class="err">of</span> <span class="err">output</span>   

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html
==============================================================================
--- websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html (original)
+++ websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html Tue Nov 15 21:08:06 2011
@@ -100,9 +100,8 @@
 <li><a href="Table_Configuration.html#Constraints">Constraints</a></li>
 <li><a href="Table_Configuration.html#Bloom_Filters">Bloom Filters</a></li>
 <li><a href="Table_Configuration.html#Iterators">Iterators</a></li>
-<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning Iterators and Timestamps</a></li>
-<li><a href="Table_Configuration.html#Filters">Filters</a></li>
-<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Block_Cache">Block Cache</a></li>
+<li><a href="Table_Configuration.html#Compaction">Compaction</a></li>
 <li><a href="Table_Configuration.html#Pre-splitting_tables">Pre-splitting tables</a></li>
 <li><a href="Table_Configuration.html#Merging_tablets">Merging tablets</a></li>
 <li><a href="Table_Configuration.html#Delete_Range">Delete Range</a></li>
@@ -110,7 +109,7 @@
 </ul>
 <hr />
 <h2 id="a_idtable_configurationa_table_configuration"><a id=Table_Configuration></a> Table Configuration</h2>
-<p>Accumulo tables have a few options that can be configured to alter the default behavior of Accumulo as well as improve performance based on the data stored. These include locality groups, constraints, and iterators. </p>
+<p>Accumulo tables have a few options that can be configured to alter the default behavior of Accumulo as well as improve performance based on the data stored. These include locality groups, constraints, bloom filters, iterators, and block cache. </p>
 <h2 id="a_idlocality_groupsa_locality_groups"><a id=Locality_Groups></a> Locality Groups</h2>
 <p>Accumulo supports storing of sets of column families separately on disk to allow clients to scan over columns that are frequently used together efficient and to avoid scanning over column families that are not requested. After a locality group is set Scanner and BatchScanner operations will automatically take advantage of them whenever the fetchColumnFamilies() method is used. </p>
 <p>By default tables place all column families into the same ``default" locality group. Additional locality groups can be configured anytime via the shell or programmatically as follows: </p>
@@ -212,7 +211,7 @@ accumulo/docs/examples/README.bloom . </
 
 
 <p>Tables support separate Iterator settings to be applied at scan time, upon minor compaction and upon major compaction. For most uses, tables will have identical iterator settings for all three to avoid inconsistent results. </p>
-<h2 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h2>
+<h3 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h3>
 <p>Accumulo provides the capability to manage versioned data through the use of timestamps within the Key. If a timestamp is not specified in the key created by the client then the system will set the timestamp to the current time. Two keys with identical rowIDs and columns but different timestamps are considered two versions of the same key. If two inserts are made into accumulo with the same rowID, column, and timestamp, then the behavior is non-deterministic. </p>
 <p>Timestamps are sorted in descending order, so the most recent data comes first. Accumulo can be configured to return the top k versions, or versions later than a given date. The default is to return the one most recent version. </p>
 <p>The version policy can be changed by changing the VersioningIterator options for a table as follows: </p>
@@ -227,16 +226,16 @@ accumulo/docs/examples/README.bloom . </
 </pre></div>
 
 
-<h3 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical Time</h3>
+<h4 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical Time</h4>
 <p>Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps set by accumulo always move forward. This helps avoid problems caused by TabletServers that have different time settings. The per tablet counter gives unique one up time stamps on a per mutation basis. When using time in milliseconds, if two things arrive within the same millisecond then both receive the same timestamp. When using time in milliseconds, accumulo set times will still always move forward and never backwards. </p>
 <p>A table can be configured to use logical timestamps at creation time as follows: </p>
 <div class="codehilite"><pre><span class="n">user</span><span class="nv">@myinstance</span><span class="o">&gt;</span> <span class="n">createtable</span> <span class="o">-</span><span class="n">tl</span> <span class="n">logical</span>
 </pre></div>
 
 
-<h3 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h3>
+<h4 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h4>
 <p>Deletes are special keys in accumulo that get sorted along will all the other data. When a delete key is inserted, accumulo will not show anything that has a timestamp less than or equal to the delete key. During major compaction, any keys older than a delete key are omitted from the new file created, and the omitted keys are removed from disk as part of the regular garbage collection process. </p>
-<h2 id="a_idfiltersa_filters"><a id=Filters></a> Filters</h2>
+<h3 id="a_idfiltersa_filters"><a id=Filters></a> Filters</h3>
 <p>When scanning over a set of key-value pairs it is possible to apply an arbitrary filtering policy through the use of a Filter. Filters are types of iterators that return only key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters that can be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be added by writing a Java class that extends the <br />
 org.apache.accumulo.core.iterators.Filter class. </p>
 <p>The AgeOff filter can be configured to remove data older than a certain date or a fixed amount of time from the present. The following example sets a table to delete everything inserted over 30 seconds ago: </p>
@@ -278,7 +277,7 @@ org.apache.accumulo.core.iterators.Filte
 </pre></div>
 
 
-<h2 id="a_idaggregating_iteratorsa_aggregating_iterators"><a id=Aggregating_Iterators></a> Aggregating Iterators</h2>
+<h3 id="a_idaggregating_iteratorsa_aggregating_iterators"><a id=Aggregating_Iterators></a> Aggregating Iterators</h3>
 <p>Accumulo allows aggregating iterators to be configured on tables and column families. When an aggregating iterator is set, the iterator is applied across the values associated with any keys that share rowID, column family, and column qualifier. This is similar to the reduce step in MapReduce, which applied some function to all the values associated with a particular key. </p>
 <p>For example, if an aggregating iterator were configured on a table and the following mutations were inserted: </p>
 <div class="codehilite"><pre><span class="n">Row</span>     <span class="n">Family</span> <span class="n">Qualifier</span> <span class="n">Timestamp</span>  <span class="n">Value</span>
@@ -319,7 +318,49 @@ org.apache.accumulo.core.iterators.Filte
 <p>Additional Aggregators can be added by creating a Java class that implements <br />
 <strong>org.apache.accumulo.core.iterators.aggregation.Aggregator</strong> and adding a jar containing that class to Accumulo's lib directory. </p>
 <p>An example of an aggregator can be found under <br />
-accumulo/src/examples/main/java/accumulo/examples/aggregation/SortedSetAggregator.java </p>
+accumulo/src/examples/main/java/org/apache/accumulo/examples/aggregation/SortedSetAggregator.java </p>
+<h2 id="a_idblock_cachea_block_cache"><a id=Block_Cache></a> Block Cache</h2>
+<p>In order to increase throughput of commonly accessed entries, Accumulo employs a block cache. This block cache buffers data in memory so that it doesn't have to be read off of disk. The RFile format that Accumulo prefers is a mix of index blocks and data blocks, where the index blocks are used to find the appropriate data blocks. Typical queries to Accumulo result in a binary search over several index blocks followed by a linear scan of one or more data blocks. </p>
+<p>The block cache can be configured on a per-table basis, and all tablets hosted on a tablet server share a single resource pool. To configure the size of the tablet server's block cache, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="n">data</span><span class="o">.</span><span class="n">size:</span> <span class="n">Specifies</span> <span class="n">the</span> <span class="n">size</span> <span class="n">of</span> <span class="n">the</span> <span class="n">cache</span> <span class="k">for</span> <span class="n">file</span> <span class="n">data</span> <span class="n">blocks</span><span class="o">.</span>
+<span class="n">tserver</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="nb">index</span><span class="o">.</span><span class="n">size:</span> <span class="n">Specifies</span> <span class="n">the</span> <span class="n">size</span> <span class="n">of</span> <span class="n">the</span> <span class="n">cache</span> <span class="k">for</span> <span class="n">file</span> <span class="n">indices</span><span class="o">.</span>
+</pre></div>
+
+
+<p>To enable the block cache for your table, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">table</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="n">block</span><span class="o">.</span><span class="n">enable:</span> <span class="n">Determines</span> <span class="n">whether</span> <span class="n">file</span> <span class="p">(</span><span class="n">data</span><span class="p">)</span> <span class="n">block</span> <span class="n">cache</span> <span class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+<span class="n">table</span><span class="o">.</span><span class="n">cache</span><span class="o">.</span><span class="nb">index</span><span class="o">.</span><span class="n">enable:</span> <span class="n">Determines</span> <span class="n">whether</span> <span class="nb">index</span> <span class="n">cache</span> <span class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+</pre></div>
+
+
+<p>The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency. It is enabled by default for the !METADATA table. </p>
+<h2 id="a_idcompactiona_compaction"><a id=Compaction></a> Compaction</h2>
+<p>As data is written to Accumulo it is buffered in memory. The data buffered in memory is eventually written to HDFS on a per tablet basis. Files can also be added to tablets directly by bulk import. In the background tablet servers run major compactions to merge multiple files into one. The tablet server has to decide which tablets to compact and which files within a tablet to compact. This decision is made using the compaction ratio, which is configurable on a per table basis. To configure this ratio modify the following property: </p>
+<div class="codehilite"><pre><span class="n">table</span><span class="o">.</span><span class="n">compaction</span><span class="o">.</span><span class="n">major</span><span class="o">.</span><span class="n">ratio</span>
+</pre></div>
+
+
+<p>Increasing this ratio will result in more files per tablet and less compaction work. More files per tablet means more higher query latency. So adjusting this ratio is a trade off between ingest and query performance. The ratio defaults to 3. </p>
+<p>The way the ratio works is that a set of files is compacted into one file if the sum of the sizes of the files in the set is larger than the ratio multiplied by the size of the largest file in the set. If this is not true for the set of all files in a tablet, the largest file is removed from consideration, and the remaining files are considered for compaction. This is repeated until a compaction is triggered or there are no files left to consider. </p>
+<p>The number of background threads tablet servers use to run major compactions is configurable. To configure this modify the following property: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span class="o">.</span><span class="n">compaction</span><span class="o">.</span><span class="n">major</span><span class="o">.</span><span class="n">concurrent</span><span class="o">.</span><span class="n">max</span>
+</pre></div>
+
+
+<p>Also, the number of threads tablet servers use for minor compactions is configurable. To configure this modify the following property: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span class="o">.</span><span class="n">compaction</span><span class="o">.</span><span class="n">minor</span><span class="o">.</span><span class="n">concurrent</span><span class="o">.</span><span class="n">max</span>
+</pre></div>
+
+
+<p>The numbers of minor and major compactions running and queued is visible on the Accumulo monitor page. This allows you to see if compactions are backing up and adjustments to the above settings are needed. When adjusting the number of threads available for compactions, consider the number of cores and other tasks running on the nodes such as maps and reduces. </p>
+<p>If major compactions are not keeping up, then the number of files per tablet will grow to a point such that query performance starts to suffer. One way to handle this situation is to increase the compaction ratio. For example, if the compaction ratio were set to 1, then every new file added to a tablet by minor compaction would immediately queue the tablet for major compaction. So if a tablet has a 200M file and minor compaction writes a 1M file, then the major compaction will attempt to merge the 200M and 1M file. If the tablet server has lots of tablets trying to do this sort of thing, then major compactions will back up and the number of files per tablet will start to grow, assuming data is being continuously written. Increasing the compaction ratio will alleviate backups by lowering the amount of major compaction work that needs to be done. </p>
+<p>Another option to deal with the files per tablet growing too large is to adjust the following property: </p>
+<div class="codehilite"><pre><span class="n">table</span><span class="o">.</span><span class="n">file</span><span class="o">.</span><span class="n">max</span>
+</pre></div>
+
+
+<p>When a tablet reaches this number of files and needs to flush its in-memory data to disk, it will choose to do a merging minor compaction. A merging minor compaction will merge the tablet's smallest file with the data in memory at minor compaction time. Therefore the number of files will not grow beyond this limit. This will make minor compactions take longer, which will cause ingest performance to decrease. This can cause ingest to slow down until major compactions have enough time to catch up. When adjusting this property, also consider adjusting the compaction ratio. Ideally, merging minor compactions never need to occur and major compactions will keep up. It is possible to configure the file max and compaction ratio such that only merging minor compactions occur and major compactions never occur. This should be avoided because doing only merging minor compactions causes <img alt="$O(N^2)$" src="img2.png" /> work to be done. The amount of work done by major compactions
  is  <img alt="$O(N*klzzwxh:0051og_R(N))$" src="img3.png" /> where <em>R</em> is the compaction ratio. </p>
+<p>Compactions can be initiated manually for a table. To initiate a minor compaction, use the flush command in the shell. To initiate a major compaction, use the compact command in the shell. The compact command will compact all tablets in a table to one file. Even tablets with one file are compacted. This is useful for the case where a major compaction filter is configured for a table. In 1.4 the ability to compact a range of a table was added. To use this feature specify start and stop rows for the compact command. This will only compact tablets that overlap the given row range. </p>
 <h2 id="a_idpre-splitting_tablesa_pre-splitting_tables"><a id=Pre-splitting_tables></a> Pre-splitting tables</h2>
 <p>Accumulo will balance and distribute tables accross servers. Before a table gets large, it will be maintained as a single tablet on a single server. This limits the speed at which data can be added or queried to the speed of a single node. To improve performance when the a table is new, or small, you can add split points and generate new tablets. </p>
 <p>In the shell: </p>

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Design.html
==============================================================================
--- websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Design.html (original)
+++ websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/Table_Design.html Tue Nov 15 21:08:06 2011
@@ -164,7 +164,7 @@
 <p>Appending dates provides the additional capability of restricting a scan to a given date range. </p>
 <h2 id="a_idindexinga_indexing"><a id=Indexing></a> Indexing</h2>
 <p>In order to support lookups via more than one attribute of an entity, additional indexes can be built. However, because Accumulo tables can support any number of columns without specifying them beforehand, a single additional index will often suffice for supporting lookups of records in the main table. Here, the index has, as the rowID, the Value or Term from the main table, the column families are the same, and the column qualifier of the index table contains the rowID from the main table. </p>
-<p><img alt="converted table" src="img2.png" /></p>
+<p><img alt="converted table" src="img4.png" /></p>
 <p>Note: We store rowIDs in the column qualifier rather than the Value so that we can have more than one rowID associated with a particular term within the index. If we stored this in the Value we would only see one of the rows in which the value appears since Accumulo is configured by default to return the one most recent value associated with a key. </p>
 <p>Lookups can then be done by scanning the Index Table first for occurrences of the desired values in the columns specified, which returns a list of row ID from the main table. These can then be used to retrieve each matching record, in their entirety, or a subset of their columns, from the Main Table. </p>
 <p>To support efficient lookups of multiple rowIDs from the same table, the Accumulo client library provides a BatchScanner. Users specify a set of Ranges to the BatchScanner, which performs the lookups in multiple threads to multiple servers and returns an Iterator over all the rows retrieved. The rows returned are NOT in sorted order, as is the case with the basic Scanner interface. </p>
@@ -197,9 +197,9 @@
 <p>Accumulo is ideal for storing entities and their attributes, especially of the attributes are sparse. It is often useful to join several datasets together on common entities within the same table. This can allow for the representation of graphs, including nodes, their attributes, and connections to other nodes. </p>
 <p>Rather than storing individual events, Entity-Attribute or Graph tables store aggregate information about the entities involved in the events and the relationships between entities. This is often preferrable when single events aren't very useful and when a continuously updated summarization is desired. </p>
 <p>The physical schema for an entity-attribute or graph table is as follows: </p>
-<p><img alt="converted table" src="img3.png" /></p>
+<p><img alt="converted table" src="img5.png" /></p>
 <p>For example, to keep track of employees, managers and products the following entity-attribute table could be used. Note that the weights are not always necessary and are set to 0 when not used. </p>
-<p><img alt="converted table" src="img4.png" /> <br />
+<p><img alt="converted table" src="img6.png" /> <br />
 </p>
 <p>To allow efficient updating of edge weights, an aggregating iterator can be configured to add the value of all mutations applied with the same key. These types of tables can easily be created from raw events by simply extracting the entities, attributes, and relationships from individual events and inserting the keys into Accumulo each with a count of 1. The aggregating iterator will take care of maintaining the edge weights. </p>
 <h2 id="a_iddocument-partitioned_indexinga_document-partitioned_indexing"><a id=Document-Partitioned_Indexing></a> Document-Partitioned Indexing</h2>
@@ -207,7 +207,7 @@
 <p>First is that the set of all records matching any one of the search terms must be sent to the client, which incurs a lot of network traffic. The second problem is that the client is responsible for performing set intersection on the sets of records returned to eliminate all but the records matching all search terms. The memory of the client may easily be overwhelmed during this operation. </p>
 <p>For these reasons Accumulo includes support for a scheme known as sharded indexing, in which these set operations can be performed at the TabletServers and decisions about which records to include in the result set can be made without incurring network traffic. </p>
 <p>This is accomplished via partitioning records into bins that each reside on at most one TabletServer, and then creating an index of terms per record within each bin as follows: </p>
-<p><img alt="converted table" src="img5.png" /></p>
+<p><img alt="converted table" src="img7.png" /></p>
 <p>Documents or records are mapped into bins by a user-defined ingest application. By storing the BinID as the RowID we ensure that all the information for a particular bin is contained in a single tablet and hosted on a single TabletServer since Accumulo never splits rows across tablets. Storing the Terms as column families serves to enable fast lookups of all the documents within this bin that contain the given term. </p>
 <p>Finally, we perform set intersection operations on the TabletServer via a special iterator called the Intersecting Iterator. Since documents are partitioned into many bins, a search of all documents must search every bin. We can use the BatchScanner to scan all bins in parallel. The Intersecting Iterator should be enabled on a BatchScanner within user query code as follows: </p>
 <div class="codehilite"><pre><span class="n">Text</span><span class="o">[]</span> <span class="n">terms</span> <span class="o">=</span> <span class="p">{</span><span class="k">new</span> <span class="n">Text</span><span class="p">(</span><span class="s">&quot;the&quot;</span><span class="p">),</span> <span class="k">new</span> <span class="n">Text</span><span class="p">(</span><span class="s">&quot;white&quot;</span><span class="p">),</span> <span class="k">new</span> <span class="n">Text</span><span class="p">(</span><span class="s">&quot;house&quot;</span><span class="p">)};</span>

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img2.png
==============================================================================
Binary files - no diff available.

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img3.png
==============================================================================
Binary files - no diff available.

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img4.png
==============================================================================
Binary files - no diff available.

Modified: websites/production/accumulo/content/accumulo/user_manual_1.4-incubating/img5.png
==============================================================================
Binary files - no diff available.