You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by bi...@apache.org on 2011/11/01 15:49:31 UTC

svn commit: r1196033 [2/2] - in /incubator/accumulo: branches/1.3/docs/examples/ site/trunk/content/accumulo/user_manual_1.3-incubating/ site/trunk/content/accumulo/user_manual_1.4-incubating/

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Analytics.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Analytics.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Analytics.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Analytics.mdtext Tue Nov  1 14:49:30 2011
@@ -24,11 +24,11 @@ Notice:    Licensed to the Apache Softwa
 
 * * *
 
-## <a id=SECTION00900000000000000000></a> Analytics
+## <a id=Analytics></a> Analytics
 
 Accumulo supports more advanced data processing than simply keeping keys sorted and performing efficient lookups. Analytics can be developed by using MapReduce and Iterators in conjunction with Accumulo tables. 
 
-## <a id=SECTION00910000000000000000></a> MapReduce
+## <a id=MapReduce></a> MapReduce
 
 Accumulo tables can be used as the source and destination of MapReduce jobs. To use a Accumulo table with a MapReduce job (specifically with the new Hadoop API as of version 0.20), configure the job parameters to use the AccumuloInputFormat and AccumuloOutputFormat. Accumulo specific parameters can be set via these two format classes to do the following: 
 
@@ -36,7 +36,7 @@ Accumulo tables can be used as the sourc
 * Restrict the scan to a range of rows 
 * Restrict the input to a subset of available columns 
 
-### <a id=SECTION00911000000000000000></a> Mapper and Reducer classes
+### <a id=Mapper_and_Reducer_classes></a> Mapper and Reducer classes
 
 To read from a Accumulo table create a Mapper with the following class parameterization and be sure to configure the AccumuloInputFormat. 
     
@@ -66,7 +66,7 @@ To write to a Accumulo table, create a R
 
 The Text object passed as the output should contain the name of the table to which this mutation should be applied. The Text can be null in which case the mutation will be applied to the default table name specified in the AccumuloOutputFormat options. 
 
-### <a id=SECTION00912000000000000000></a> AccumuloInputFormat options
+### <a id=AccumuloInputFormat_options></a> AccumuloInputFormat options
     
     
     Job job = new Job(getConf());
@@ -104,7 +104,7 @@ To use a regular expression to match row
     AccumuloInputFormat.setRegex(job, RegexType.ROW, "^.*");
     
 
-### <a id=SECTION00913000000000000000></a> AccumuloOutputFormat options
+### <a id=AccumuloOutputFormat_options></a> AccumuloOutputFormat options
     
     
     boolean createTables = true;
@@ -130,7 +130,7 @@ To use a regular expression to match row
 An example of using MapReduce with Accumulo can be found at   
 accumulo/docs/examples/README.mapred 
 
-## <a id=SECTION00920000000000000000></a> Aggregating Iterators
+## <a id=Aggregating_Iterators></a> Aggregating Iterators
 
 Many applications can benefit from the ability to aggregate values across common keys. This can be done via aggregating iterators and is similar to the Reduce step in MapReduce. This provides the ability to define online, incrementally updated analytics without the overhead or latency associated with batch-oriented MapReduce jobs. 
 
@@ -138,11 +138,11 @@ All that is needed to aggregate values o
 
 The only restriction on an aggregating iterator is that the aggregator developer should not assume that all values for a given key have been seen, since new mutations can be inserted at anytime. This precludes using the total number of values in the aggregation such as when calculating an average, for example. 
 
-### <a id=SECTION00921000000000000000></a> Feature Vectors
+### <a id=Feature_Vectors></a> Feature Vectors
 
 An interesting use of aggregating iterators within a Accumulo table is to store feature vectors for use in machine learning algorithms. For example, many algorithms such as k-means clustering, support vector machines, anomaly detection, etc. use the concept of a feature vector and the calculation of distance metrics to learn a particular model. The columns in a Accumulo table can be used to efficiently store sparse features and their weights to be incrementally updated via the use of an aggregating iterator. 
 
-## <a id=SECTION00930000000000000000></a> Statistical Modeling
+## <a id=Statistical_Modeling></a> Statistical Modeling
 
 Statistical models that need to be updated by many machines in parallel could be similarly stored within a Accumulo table. For example, a MapReduce job that is iteratively updating a global statistical model could have each map or reduce worker reference the parts of the model to be read and updated through an embedded Accumulo client. 
 
@@ -153,7 +153,7 @@ Using Accumulo this way enables efficien
 
 * * *
 
-   [1]: Analytics.html#SECTION00910000000000000000
-   [2]: Analytics.html#SECTION00920000000000000000
-   [3]: Analytics.html#SECTION00930000000000000000
+   [1]: Analytics.html#MapReduce
+   [2]: Analytics.html#Aggregating_Iterators
+   [3]: Analytics.html#Statistical_Modeling
 

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Contents.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Contents.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Contents.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Contents.mdtext Tue Nov  1 14:49:30 2011
@@ -19,7 +19,7 @@ Notice:    Licensed to the Apache Softwa
   
 
 
-### <a id=SECTION00100000000000000000></a> Contents
+### <a id=Contents></a> Contents
 
 * [Introduction][1]
 * [Accumulo Design][2]
@@ -58,109 +58,112 @@ Notice:    Licensed to the Apache Softwa
     * [Reading Data][22]
 
         * [Scanner][23]
-        * [BatchScanner][24]
+        * [Isolated Scanner][24]
+        * [BatchScanner][25]
 
   
 
-* [Table Configuration][25]
+* [Table Configuration][26]
 
-    * [Locality Groups][26]
+    * [Locality Groups][27]
 
-        * [Managing Locality Groups via the Shell][27]
-        * [Managing Locality Groups via the Client API][28]
+        * [Managing Locality Groups via the Shell][28]
+        * [Managing Locality Groups via the Client API][29]
 
-    * [Constraints][29]
-    * [Bloom Filters][30]
-    * [Iterators][31]
+    * [Constraints][30]
+    * [Bloom Filters][31]
+    * [Iterators][32]
 
-        * [Setting Iterators via the Shell][32]
-        * [Setting Iterators Programmatically][33]
+        * [Setting Iterators via the Shell][33]
+        * [Setting Iterators Programmatically][34]
 
-    * [Versioning Iterators and Timestamps][34]
+    * [Versioning Iterators and Timestamps][35]
 
-        * [Logical Time][35]
-        * [Deletes][36]
+        * [Logical Time][36]
+        * [Deletes][37]
 
-    * [Filters][37]
-    * [Aggregating Iterators][38]
-    * [Pre-splitting tables][39]
-    * [Merging tables][40]
-    * [Delete Range][41]
+    * [Filters][38]
+    * [Aggregating Iterators][39]
+    * [Pre-splitting tables][40]
+    * [Merging tablets][41]
+    * [Delete Range][42]
+    * [Cloning Tables][43]
 
   
 
-* [Table Design][42]
+* [Table Design][44]
 
-    * [Basic Table][43]
-    * [RowID Design][44]
-    * [Indexing][45]
-    * [Entity-Attribute and Graph Tables][46]
-    * [Document-Partitioned Indexing][47]
+    * [Basic Table][45]
+    * [RowID Design][46]
+    * [Indexing][47]
+    * [Entity-Attribute and Graph Tables][48]
+    * [Document-Partitioned Indexing][49]
 
   
 
-* [High-Speed Ingest][48]
+* [High-Speed Ingest][50]
 
-    * [Pre-Splitting New Tables][49]
-    * [Multiple Ingester Clients][50]
-    * [Bulk Ingest][51]
-    * [MapReduce Ingest][52]
+    * [Pre-Splitting New Tables][51]
+    * [Multiple Ingester Clients][52]
+    * [Bulk Ingest][53]
+    * [Logical Time for Bulk Ingest][54]
+    * [MapReduce Ingest][55]
 
   
 
-* [Analytics][53]
+* [Analytics][56]
 
-    * [MapReduce][54]
+    * [MapReduce][57]
 
-        * [Mapper and Reducer classes][55]
-        * [AccumuloInputFormat options][56]
-        * [AccumuloOutputFormat options][57]
+        * [Mapper and Reducer classes][58]
+        * [AccumuloInputFormat options][59]
+        * [AccumuloOutputFormat options][60]
 
-    * [Aggregating Iterators][58]
+    * [Aggregating Iterators][61]
 
-        * [Feature Vectors][59]
+        * [Feature Vectors][62]
 
-    * [Statistical Modeling][60]
+    * [Statistical Modeling][63]
 
   
 
-* [Security][61]
+* [Security][64]
 
-    * [Security Label Expressions][62]
-    * [Security Label Expression Syntax][63]
-    * [Authorization][64]
-    * [User Authorizations][65]
-    * [Secure Authorizations Handling][66]
-    * [Query Services Layer][67]
+    * [Security Label Expressions][65]
+    * [Security Label Expression Syntax][66]
+    * [Authorization][67]
+    * [User Authorizations][68]
+    * [Secure Authorizations Handling][69]
+    * [Query Services Layer][70]
 
   
 
-* [Administration][68]
+* [Administration][71]
 
-    * [Hardware][69]
-    * [Network][70]
-    * [Installation][71]
-    * [Dependencies][72]
-    * [Configuration][73]
+    * [Hardware][72]
+    * [Network][73]
+    * [Installation][74]
+    * [Dependencies][75]
+    * [Configuration][76]
 
-        * [Edit conf/accumulo-env.sh][74]
-        * [Cluster Specification][75]
-        * [Accumulo Settings][76]
-        * [Deploy Configuration][77]
+        * [Edit conf/accumulo-env.sh][77]
+        * [Cluster Specification][78]
+        * [Accumulo Settings][79]
+        * [Deploy Configuration][80]
 
-    * [Initialization][78]
-    * [Running][79]
+    * [Initialization][81]
+    * [Running][82]
 
-        * [Starting Accumulo][80]
-        * [Stopping Accumulo][81]
+        * [Starting Accumulo][83]
+        * [Stopping Accumulo][84]
 
-    * [Monitoring][82]
-    * [Logging][83]
-    * [Recovery][84]
+    * [Monitoring][85]
+    * [Logging][86]
+    * [Recovery][87]
 
   
 
-* [Shell Commands][85]
+* [Shell Commands][88]
 
   
 
@@ -169,87 +172,90 @@ Notice:    Licensed to the Apache Softwa
 
    [1]: Introduction.html
    [2]: Accumulo_Design.html
-   [3]: Accumulo_Design.html#SECTION00310000000000000000
-   [4]: Accumulo_Design.html#SECTION00320000000000000000
-   [5]: Accumulo_Design.html#SECTION00330000000000000000
-   [6]: Accumulo_Design.html#SECTION00331000000000000000
-   [7]: Accumulo_Design.html#SECTION00332000000000000000
-   [8]: Accumulo_Design.html#SECTION00333000000000000000
-   [9]: Accumulo_Design.html#SECTION00334000000000000000
-   [10]: Accumulo_Design.html#SECTION00335000000000000000
-   [11]: Accumulo_Design.html#SECTION00340000000000000000
-   [12]: Accumulo_Design.html#SECTION00350000000000000000
-   [13]: Accumulo_Design.html#SECTION00360000000000000000
-   [14]: Accumulo_Design.html#SECTION00370000000000000000
+   [3]: Accumulo_Design.html#Data_Model
+   [4]: Accumulo_Design.html#Architecture
+   [5]: Accumulo_Design.html#Components
+   [6]: Accumulo_Design.html#Tablet_Server
+   [7]: Accumulo_Design.html#Loggers
+   [8]: Accumulo_Design.html#Garbage_Collector
+   [9]: Accumulo_Design.html#Master
+   [10]: Accumulo_Design.html#Client
+   [11]: Accumulo_Design.html#Data_Management
+   [12]: Accumulo_Design.html#Tablet_Service
+   [13]: Accumulo_Design.html#Compactions
+   [14]: Accumulo_Design.html#Fault-Tolerance
    [15]: Accumulo_Shell.html
-   [16]: Accumulo_Shell.html#SECTION00410000000000000000
-   [17]: Accumulo_Shell.html#SECTION00420000000000000000
-   [18]: Accumulo_Shell.html#SECTION00430000000000000000
+   [16]: Accumulo_Shell.html#Basic_Administration
+   [17]: Accumulo_Shell.html#Table_Maintenance
+   [18]: Accumulo_Shell.html#User_Administration
    [19]: Writing_Accumulo_Clients.html
-   [20]: Writing_Accumulo_Clients.html#SECTION00510000000000000000
-   [21]: Writing_Accumulo_Clients.html#SECTION00511000000000000000
-   [22]: Writing_Accumulo_Clients.html#SECTION00520000000000000000
-   [23]: Writing_Accumulo_Clients.html#SECTION00521000000000000000
-   [24]: Writing_Accumulo_Clients.html#SECTION00522000000000000000
-   [25]: Table_Configuration.html
-   [26]: Table_Configuration.html#SECTION00610000000000000000
-   [27]: Table_Configuration.html#SECTION00611000000000000000
-   [28]: Table_Configuration.html#SECTION00612000000000000000
-   [29]: Table_Configuration.html#SECTION00620000000000000000
-   [30]: Table_Configuration.html#SECTION00630000000000000000
-   [31]: Table_Configuration.html#SECTION00640000000000000000
-   [32]: Table_Configuration.html#SECTION00641000000000000000
-   [33]: Table_Configuration.html#SECTION00642000000000000000
-   [34]: Table_Configuration.html#SECTION00650000000000000000
-   [35]: Table_Configuration.html#SECTION00651000000000000000
-   [36]: Table_Configuration.html#SECTION00652000000000000000
-   [37]: Table_Configuration.html#SECTION00660000000000000000
-   [38]: Table_Configuration.html#SECTION00670000000000000000
-   [39]: Table_Configuration.html#SECTION00680000000000000000
-   [40]: Table_Configuration.html#SECTION00690000000000000000
-   [41]: Table_Configuration.html#SECTION006100000000000000000
-   [42]: Table_Design.html
-   [43]: Table_Design.html#SECTION00710000000000000000
-   [44]: Table_Design.html#SECTION00720000000000000000
-   [45]: Table_Design.html#SECTION00730000000000000000
-   [46]: Table_Design.html#SECTION00740000000000000000
-   [47]: Table_Design.html#SECTION00750000000000000000
-   [48]: High_Speed_Ingest.html
-   [49]: High_Speed_Ingest.html#SECTION00810000000000000000
-   [50]: High_Speed_Ingest.html#SECTION00820000000000000000
-   [51]: High_Speed_Ingest.html#SECTION00830000000000000000
-   [52]: High_Speed_Ingest.html#SECTION00840000000000000000
-   [53]: Analytics.html
-   [54]: Analytics.html#SECTION00910000000000000000
-   [55]: Analytics.html#SECTION00911000000000000000
-   [56]: Analytics.html#SECTION00912000000000000000
-   [57]: Analytics.html#SECTION00913000000000000000
-   [58]: Analytics.html#SECTION00920000000000000000
-   [59]: Analytics.html#SECTION00921000000000000000
-   [60]: Analytics.html#SECTION00930000000000000000
-   [61]: Security.html
-   [62]: Security.html#SECTION001010000000000000000
-   [63]: Security.html#SECTION001020000000000000000
-   [64]: Security.html#SECTION001030000000000000000
-   [65]: Security.html#SECTION001040000000000000000
-   [66]: Security.html#SECTION001050000000000000000
-   [67]: Security.html#SECTION001060000000000000000
-   [68]: Administration.html
-   [69]: Administration.html#SECTION001110000000000000000
-   [70]: Administration.html#SECTION001120000000000000000
-   [71]: Administration.html#SECTION001130000000000000000
-   [72]: Administration.html#SECTION001140000000000000000
-   [73]: Administration.html#SECTION001150000000000000000
-   [74]: Administration.html#SECTION001151000000000000000
-   [75]: Administration.html#SECTION001152000000000000000
-   [76]: Administration.html#SECTION001153000000000000000
-   [77]: Administration.html#SECTION001154000000000000000
-   [78]: Administration.html#SECTION001160000000000000000
-   [79]: Administration.html#SECTION001170000000000000000
-   [80]: Administration.html#SECTION001171000000000000000
-   [81]: Administration.html#SECTION001172000000000000000
-   [82]: Administration.html#SECTION001180000000000000000
-   [83]: Administration.html#SECTION001190000000000000000
-   [84]: Administration.html#SECTION0011100000000000000000
-   [85]: Shell_Commands.html
+   [20]: Writing_Accumulo_Clients.html#Writing_Data
+   [21]: Writing_Accumulo_Clients.html#BatchWriter
+   [22]: Writing_Accumulo_Clients.html#Reading_Data
+   [23]: Writing_Accumulo_Clients.html#Scanner
+   [24]: Writing_Accumulo_Clients.html#Isolated_Scanner
+   [25]: Writing_Accumulo_Clients.html#BatchScanner
+   [26]: Table_Configuration.html
+   [27]: Table_Configuration.html#Locality_Groups
+   [28]: Table_Configuration.html#Managing_Locality_Groups_via_the_Shell
+   [29]: Table_Configuration.html#Managing_Locality_Groups_via_the_Client_API
+   [30]: Table_Configuration.html#Constraints
+   [31]: Table_Configuration.html#Bloom_Filters
+   [32]: Table_Configuration.html#Iterators
+   [33]: Table_Configuration.html#Setting_Iterators_via_the_Shell
+   [34]: Table_Configuration.html#Setting_Iterators_Programmatically
+   [35]: Table_Configuration.html#Versioning_Iterators_and_Timestamps
+   [36]: Table_Configuration.html#Logical_Time
+   [37]: Table_Configuration.html#Deletes
+   [38]: Table_Configuration.html#Filters
+   [39]: Table_Configuration.html#Aggregating_Iterators
+   [40]: Table_Configuration.html#Pre-splitting_tables
+   [41]: Table_Configuration.html#Merging_tablets
+   [42]: Table_Configuration.html#Delete_Range
+   [43]: Table_Configuration.html#Cloning_Tables
+   [44]: Table_Design.html
+   [45]: Table_Design.html#Basic_Table
+   [46]: Table_Design.html#RowID_Design
+   [47]: Table_Design.html#Indexing
+   [48]: Table_Design.html#Entity-Attribute_and_Graph_Tables
+   [49]: Table_Design.html#Document-Partitioned_Indexing
+   [50]: High_Speed_Ingest.html
+   [51]: High_Speed_Ingest.html#Pre-Splitting_New_Tables
+   [52]: High_Speed_Ingest.html#Multiple_Ingester_Clients
+   [53]: High_Speed_Ingest.html#Bulk_Ingest
+   [54]: High_Speed_Ingest.html#Logical_Time_for_Bulk_Ingest
+   [55]: High_Speed_Ingest.html#MapReduce_Ingest
+   [56]: Analytics.html
+   [57]: Analytics.html#MapReduce
+   [58]: Analytics.html#Mapper_and_Reducer_classes
+   [59]: Analytics.html#AccumuloInputFormat_options
+   [60]: Analytics.html#AccumuloOutputFormat_options
+   [61]: Analytics.html#Aggregating_Iterators
+   [62]: Analytics.html#Feature_Vectors
+   [63]: Analytics.html#Statistical_Modeling
+   [64]: Security.html
+   [65]: Security.html#Security_Label_Expressions
+   [66]: Security.html#Security_Label_Expression_Syntax
+   [67]: Security.html#Authorization
+   [68]: Security.html#User_Authorizations
+   [69]: Security.html#Secure_Authorizations_Handling
+   [70]: Security.html#Query_Services_Layer
+   [71]: Administration.html
+   [72]: Administration.html#Hardware
+   [73]: Administration.html#Network
+   [74]: Administration.html#Installation
+   [75]: Administration.html#Dependencies
+   [76]: Administration.html#Configuration
+   [77]: Administration.html#Edit_conf/accumulo-env.sh
+   [78]: Administration.html#Cluster_Specification
+   [79]: Administration.html#Accumulo_Settings
+   [80]: Administration.html#Deploy_Configuration
+   [81]: Administration.html#Initialization
+   [82]: Administration.html#Running
+   [83]: Administration.html#Starting_Accumulo
+   [84]: Administration.html#Stopping_Accumulo
+   [85]: Administration.html#Monitoring
+   [86]: Administration.html#Logging
+   [87]: Administration.html#Recovery
+   [88]: Shell_Commands.html
 

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/High_Speed_Ingest.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/High_Speed_Ingest.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/High_Speed_Ingest.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/High_Speed_Ingest.mdtext Tue Nov  1 14:49:30 2011
@@ -21,15 +21,16 @@ Notice:    Licensed to the Apache Softwa
 * [Pre-Splitting New Tables][1]
 * [Multiple Ingester Clients][2]
 * [Bulk Ingest][3]
-* [MapReduce Ingest][4]
+* [Logical Time for Bulk Ingest][4]
+* [MapReduce Ingest][5]
 
 * * *
 
-## <a id=SECTION00800000000000000000></a> High-Speed Ingest
+## <a id=High-Speed_Ingest></a> High-Speed Ingest
 
 Accumulo is often used as part of a larger data processing and storage system. To maximize the performance of a parallel system involving Accumulo, the ingestion and query components should be designed to provide enough parallelism and concurrency to avoid creating bottlenecks for users and other systems writing to and reading from Accumulo. There are several ways to achieve high ingest performance. 
 
-## <a id=SECTION00810000000000000000></a> Pre-Splitting New Tables
+## <a id=Pre-Splitting_New_Tables></a> Pre-Splitting New Tables
 
 New tables consist of a single tablet by default. As mutations are applied, the table grows and splits into multiple tablets which are balanced by the Master across TabletServers. This implies that the aggregate ingest rate will be limited to fewer servers than are available within the cluster until the table has reached the point where there are tablets on every TabletServer. 
 
@@ -41,13 +42,13 @@ Pre-splitting a table ensures that there
 
 For the purposes of providing parallelism to ingest it is not necessary to create more tablets than there are physical machines within the cluster as the aggregate ingest rate is a function of the number of physical machines. Note that the aggregate ingest rate is still subject to the number of machines running ingest clients, and the distribution of rowIDs across the table. The aggregation ingest rate will be suboptimal if there are many inserts into a small number of rowIDs. 
 
-## <a id=SECTION00820000000000000000></a> Multiple Ingester Clients
+## <a id=Multiple_Ingester_Clients></a> Multiple Ingester Clients
 
 Accumulo is capable of scaling to very high rates of ingest, which is dependent upon not just the number of TabletServers in operation but also the number of ingest clients. This is because a single client, while capable of batching mutations and sending them to all TabletServers, is ultimately limited by the amount of data that can be processed on a single machine. The aggregate ingest rate will scale linearly with the number of clients up to the point at which either the aggregate I/O of TabletServers or total network bandwidth capacity is reached. 
 
 In operational settings where high rates of ingest are paramount, clusters are often configured to dedicate some number of machines solely to running Ingester Clients. The exact ratio of clients to TabletServers necessary for optimum ingestion rates will vary according to the distribution of resources per machine and by data type. 
 
-## <a id=SECTION00830000000000000000></a> Bulk Ingest
+## <a id=Bulk_Ingest></a> Bulk Ingest
 
 Accumulo supports the ability to import files produced by an external process such as MapReduce into an existing table. In some cases it may be faster to load data this way rather than via ingesting through clients using BatchWriters. This allows a large number of machines to format data the way Accumulo expects. The new files can then simply be introduced to Accumulo via a shell command. 
 
@@ -75,7 +76,13 @@ Note that the paths referenced are direc
 A complete example of using Bulk Ingest can be found at   
 accumulo/docs/examples/README.bulkIngest 
 
-## <a id=SECTION00840000000000000000></a> MapReduce Ingest
+## <a id=Logical_Time_for_Bulk_Ingest></a> Logical Time for Bulk Ingest
+
+Logical time is important for bulk imported data, for which the client code may be choosing a timestamp. At bulk import time, the user can choose to enable logical time for the set of files being imported. When its enabled, Accumulo uses a specialized system iterator to lazily set times in a bulk imported file. This mechanism guarantees that times set by unsynchronized multi-node applications (such as those running on MapReduce) will maintain some semblance of causal ordering. This mitigates the problem of the time being wrong on the system that created the file for bulk import. These times are not set when the file is imported, but whenever it is read by scans or compactions. At import, a time is obtained and always used by the specialized system iterator to set that time. 
+
+The timestamp asigned by accumulo will be the same for every key in the file. This could cause problems if the file contains multiple keys that are identical except for the timestamp. In this case, the sort order of the keys will be undefined. This could occur if an insert and an update were in the same bulk import file. 
+
+## <a id=MapReduce_Ingest></a> MapReduce Ingest
 
 It is possible to efficiently write many mutations to Accumulo in parallel via a MapReduce job. In this scenario the MapReduce is written to process data that lives in HDFS and write mutations to Accumulo using the AccumuloOutputFormat. See the MapReduce section under Analytics for details. 
 
@@ -87,8 +94,9 @@ accumulo/docs/examples/README.mapred 
 
 * * *
 
-   [1]: High_Speed_Ingest.html#SECTION00810000000000000000
-   [2]: High_Speed_Ingest.html#SECTION00820000000000000000
-   [3]: High_Speed_Ingest.html#SECTION00830000000000000000
-   [4]: High_Speed_Ingest.html#SECTION00840000000000000000
+   [1]: High_Speed_Ingest.html#Pre-Splitting_New_Tables
+   [2]: High_Speed_Ingest.html#Multiple_Ingester_Clients
+   [3]: High_Speed_Ingest.html#Bulk_Ingest
+   [4]: High_Speed_Ingest.html#Logical_Time_for_Bulk_Ingest
+   [5]: High_Speed_Ingest.html#MapReduce_Ingest
 

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Introduction.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Introduction.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Introduction.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Introduction.mdtext Tue Nov  1 14:49:30 2011
@@ -16,7 +16,7 @@ Notice:    Licensed to the Apache Softwa
            specific language governing permissions and limitations
            under the License.
 
-## <a id=SECTION00200000000000000000></a> Introduction
+## <a id=Introduction></a> Introduction
 
 Accumulo is a highly scalable structured store based on Google's BigTable. Accumulo is written in Java and operates over the Hadoop Distributed File System (HDFS), which is part of the popular Apache Hadoop project. Accumulo supports efficient storage and retrieval of structured data, including queries for ranges, and provides support for using Accumulo tables as input and output for MapReduce jobs. 
 

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Security.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Security.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Security.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Security.mdtext Tue Nov  1 14:49:30 2011
@@ -27,11 +27,11 @@ Notice:    Licensed to the Apache Softwa
 
 * * *
 
-## <a id=SECTION001000000000000000000></a> Security
+## <a id=Security></a> Security
 
 Accumulo extends the BigTable data model to implement a security mechanism known as cell-level security. Every key-value pair has its own security label, stored under the column visibility element of the key, which is used to determine whether a given user meets the security requirements to read the value. This enables data of various security levels to be stored within the same row, and users of varying degrees of access to query the same table, while preserving data confidentiality. 
 
-## <a id=SECTION001010000000000000000></a> Security Label Expressions
+## <a id=Security_Label_Expressions></a> Security Label Expressions
 
 When mutations are applied, users can specify a security label for each value. This is done as the Mutation is created by passing a ColumnVisibility object to the put() method: 
     
@@ -48,7 +48,7 @@ When mutations are applied, users can sp
     mutation.put(colFam, colQual, colVis, timestamp, value);
     
 
-## <a id=SECTION001020000000000000000></a> Security Label Expression Syntax
+## <a id=Security_Label_Expression_Syntax></a> Security Label Expression Syntax
 
 Security labels consist of a set of user-defined tokens that are required to read the value the label is associated with. The set of tokens required can be specified using syntax that supports logical AND and OR combinations of tokens, as well as nesting groups of tokens together. 
 
@@ -78,7 +78,7 @@ These can be specified alone or combined
 
 When both `|` and `&` operators are used, parentheses must be used to specify precedence of the operators. 
 
-## <a id=SECTION001030000000000000000></a> Authorization
+## <a id=Authorization></a> Authorization
 
 When clients attempt to read data from Accumulo, any security labels present are examined against the set of authorizations passed by the client code when the Scanner or BatchScanner are created. If the authorizations are determined to be insufficient to satisfy the security label, the value is suppressed from the set of results sent back to the client. 
 
@@ -91,7 +91,7 @@ Authorizations are specified as a comma-
     Scanner s = connector.createScanner("table", auths);
     
 
-## <a id=SECTION001040000000000000000></a> User Authorizations
+## <a id=User_Authorizations></a> User Authorizations
 
 Each accumulo user has a set of associated security labels. To manipulate these in the shell use the setuaths and getauths commands. These may also be modified using the java security operations API. 
 
@@ -105,13 +105,13 @@ To prevent users from writing data they 
 
 Any user with the alter table permission can add or remove this constraint. This constraint is not applied to bulk imported data, if this a concern then disable the bulk import pesmission. 
 
-## <a id=SECTION001050000000000000000></a> Secure Authorizations Handling
+## <a id=Secure_Authorizations_Handling></a> Secure Authorizations Handling
 
 For applications serving many users, it is not expected that a accumulo user will be created for each application user. In this case a accumulo user with all authorizations needed by any of the applications users must be created. To service queries, the application should create a scanner with the application users authorizations. These authorizations could be obtined from a trusted 3rd party. 
 
 Often production systems will integrate with Public-Key Infrastructure (PKI) and designate client code within the query layer to negotiate with PKI servers in order to authenticate users and retrieve their authorization tokens (credentials). This requires users to specify only the information necessary to authenticate themselves to the system. Once user identity is established, their credentials can be accessed by the client code and passed to Accumulo outside of the reach of the user. 
 
-## <a id=SECTION001060000000000000000></a> Query Services Layer
+## <a id=Query_Services_Layer></a> Query Services Layer
 
 Since the primary method of interaction with Accumulo is through the Java API, production environments often call for the implementation of a Query layer. This can be done using web services in containers such as Apache Tomcat, but is not a requirement. The Query Services Layer provides a mechanism for providing a platform on which user facing applications can be built. This allows the application designers to isolate potentially complex query logic, and enables a convenient point at which to perform essential security functions. 
 
@@ -124,10 +124,10 @@ Typically, the query services layer sits
 
 * * *
 
-   [1]: Security.html#SECTION001010000000000000000
-   [2]: Security.html#SECTION001020000000000000000
-   [3]: Security.html#SECTION001030000000000000000
-   [4]: Security.html#SECTION001040000000000000000
-   [5]: Security.html#SECTION001050000000000000000
-   [6]: Security.html#SECTION001060000000000000000
+   [1]: Security.html#Security_Label_Expressions
+   [2]: Security.html#Security_Label_Expression_Syntax
+   [3]: Security.html#Authorization
+   [4]: Security.html#User_Authorizations
+   [5]: Security.html#Secure_Authorizations_Handling
+   [6]: Security.html#Query_Services_Layer
 

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.mdtext Tue Nov  1 14:49:30 2011
@@ -16,7 +16,7 @@ Notice:    Licensed to the Apache Softwa
            specific language governing permissions and limitations
            under the License.
 
-## <a id=SECTION001200000000000000000></a> Shell Commands
+## <a id=Shell_Commands></a> Shell Commands
 
 **?**   
   

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.mdtext Tue Nov  1 14:49:30 2011
@@ -26,22 +26,23 @@ Notice:    Licensed to the Apache Softwa
 * [Filters][6]
 * [Aggregating Iterators][7]
 * [Pre-splitting tables][8]
-* [Merging tables][9]
+* [Merging tablets][9]
 * [Delete Range][10]
+* [Cloning Tables][11]
 
 * * *
 
-## <a id=SECTION00600000000000000000></a> Table Configuration
+## <a id=Table_Configuration></a> Table Configuration
 
 Accumulo tables have a few options that can be configured to alter the default behavior of Accumulo as well as improve performance based on the data stored. These include locality groups, constraints, and iterators. 
 
-## <a id=SECTION00610000000000000000></a> Locality Groups
+## <a id=Locality_Groups></a> Locality Groups
 
 Accumulo supports storing of sets of column families separately on disk to allow clients to scan over columns that are frequently used together efficient and to avoid scanning over column families that are not requested. After a locality group is set Scanner and BatchScanner operations will automatically take advantage of them whenever the fetchColumnFamilies() method is used. 
 
 By default tables place all column families into the same ``default" locality group. Additional locality groups can be configured anytime via the shell or programmatically as follows: 
 
-### <a id=SECTION00611000000000000000></a> Managing Locality Groups via the Shell
+### <a id=Managing_Locality_Groups_via_the_Shell></a> Managing Locality Groups via the Shell
     
     
     usage: setgroups <group>=<col fam>{,<col fam>}{ <group>=<col fam>{,<col
@@ -53,7 +54,7 @@ By default tables place all column famil
     group_one=colf1,colf2
     
 
-### <a id=SECTION00612000000000000000></a> Managing Locality Groups via the Client API
+### <a id=Managing_Locality_Groups_via_the_Client_API></a> Managing Locality Groups via the Client API
     
     
     Connector conn;
@@ -85,7 +86,7 @@ The assignment of Column Families to Loc
     user@myinstance mytable> compact -t mytable
     
 
-## <a id=SECTION00620000000000000000></a> Constraints
+## <a id=Constraints></a> Constraints
 
 Accumulo supports constraints applied on mutations at insert time. This can be used to disallow certain inserts according to a user defined policy. Any mutation that fails to meet the requirements of the constraint is rejected and sent back to the client. 
 
@@ -111,7 +112,7 @@ An example of constraints can be found i
 accumulo/docs/examples/README.constraints with corresponding code under   
 accumulo/src/examples/main/java/accumulo/examples/constraints . 
 
-## <a id=SECTION00630000000000000000></a> Bloom Filters
+## <a id=Bloom_Filters></a> Bloom Filters
 
 As mutations are applied to a Accumulo table, several files are created per tablet. If bloom filters are enabled, Accumulo will create and load a small data structure into memory to determine whether a file contains a given key before opening the file. This can speed up lookups considerably. 
 
@@ -124,11 +125,11 @@ To enable bloom filters, enter the follo
 An extensive example of using Bloom Filters can be found at   
 accumulo/docs/examples/README.bloom . 
 
-## <a id=SECTION00640000000000000000></a> Iterators
+## <a id=Iterators></a> Iterators
 
 Iterators provide a modular mechanism for adding functionality to be executed by TabletServers when scanning or compacting data. This allows users to efficiently summarize, filter, and aggregate data. In fact, the built-in features of cell-level security and column fetching are implemented using Iterators. Some useful Iterators are provided with Accumulo and can be found in the org.apache.accumulo.core.iterators.user package. 
 
-### <a id=SECTION00641000000000000000></a> Setting Iterators via the Shell
+### <a id=Setting_Iterators_via_the_Shell></a> Setting Iterators via the Shell
     
     
     usage: setiter [-?] -agg | -class <name> | -filter | -nolabel | 
@@ -138,7 +139,7 @@ Iterators provide a modular mechanism fo
     user@myinstance mytable> setiter -t mytable -scan -p 10 -n myiter
     
 
-### <a id=SECTION00642000000000000000></a> Setting Iterators Programmatically
+### <a id=Setting_Iterators_Programmatically></a> Setting Iterators Programmatically
     
     
     scanner.addIterator(new IteratorSetting(
@@ -158,7 +159,7 @@ Some iterators take additional parameter
 
 Tables support separate Iterator settings to be applied at scan time, upon minor compaction and upon major compaction. For most uses, tables will have identical iterator settings for all three to avoid inconsistent results. 
 
-## <a id=SECTION00650000000000000000></a> Versioning Iterators and Timestamps
+## <a id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps
 
 Accumulo provides the capability to manage versioned data through the use of timestamps within the Key. If a timestamp is not specified in the key created by the client then the system will set the timestamp to the current time. Two keys with identical rowIDs and columns but different timestamps are considered two versions of the same key. If two inserts are made into accumulo with the same rowID, column, and timestamp, then the behavior is non-deterministic. 
 
@@ -177,9 +178,9 @@ The version policy can be changed by cha
     table.iterator.majc.vers.opt.maxVersions=3
     
 
-### <a id=SECTION00651000000000000000></a> Logical Time
+### <a id=Logical_Time></a> Logical Time
 
-Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps set by accumulo always move forward. This helps avoid problems caused by TabletServers that have different time settings. The per tablet counter gives unique one up time stamps on a per mutation basis. When using time in milliseconds, if two things arrive within the same millisecond then both receive the same timestamp. 
+Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps set by accumulo always move forward. This helps avoid problems caused by TabletServers that have different time settings. The per tablet counter gives unique one up time stamps on a per mutation basis. When using time in milliseconds, if two things arrive within the same millisecond then both receive the same timestamp. When using time in milliseconds, accumulo set times will still always move forward and never backwards. 
 
 A table can be configured to use logical timestamps at creation time as follows: 
     
@@ -187,11 +188,11 @@ A table can be configured to use logical
     user@myinstance> createtable -tl logical
     
 
-### <a id=SECTION00652000000000000000></a> Deletes
+### <a id=Deletes></a> Deletes
 
 Deletes are special keys in accumulo that get sorted along will all the other data. When a delete key is inserted, accumulo will not show anything that has a timestamp less than or equal to the delete key. During major compaction, any keys older than a delete key are omitted from the new file created, and the omitted keys are removed from disk as part of the regular garbage collection process. 
 
-## <a id=SECTION00660000000000000000></a> Filters
+## <a id=Filters></a> Filters
 
 When scanning over a set of key-value pairs it is possible to apply an arbitrary filtering policy through the use of a Filter. Filters are types of iterators that return only key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters that can be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be added by writing a Java class that extends the   
 org.apache.accumulo.core.iterators.Filter class. 
@@ -237,7 +238,7 @@ To see the iterator settings for a table
     ---------+------------------------------------------+------------------
     
 
-## <a id=SECTION00670000000000000000></a> Aggregating Iterators
+## <a id=Aggregating_Iterators></a> Aggregating Iterators
 
 Accumulo allows aggregating iterators to be configured on tables and column families. When an aggregating iterator is set, the iterator is applied across the values associated with any keys that share rowID, column family, and column qualifier. This is similar to the reduce step in MapReduce, which applied some function to all the values associated with a particular key. 
 
@@ -286,7 +287,7 @@ Additional Aggregators can be added by c
 An example of an aggregator can be found under   
 accumulo/src/examples/main/java/accumulo/examples/aggregation/SortedSetAggregator.java 
 
-## <a id=SECTION00680000000000000000></a> Pre-splitting tables
+## <a id=Pre-splitting_tables></a> Pre-splitting tables
 
 Accumulo will balance and distribute tables accross servers. Before a table gets large, it will be maintained as a single tablet on a single server. This limits the speed at which data can be added or queried to the speed of a single node. To improve performance when the a table is new, or small, you can add split points and generate new tablets. 
 
@@ -299,7 +300,7 @@ In the shell: 
 
 This will create a new table with 4 tablets. The table will be split on the letters ``g'', ``n'', and ``t'' which will work nicely if the row data start with lower-case alphabetic characters. If your row data includes binary information or numeric information, or if the distribution of the row information is not flat, then you would pick different split points. Now ingest and query can proceed on 4 nodes which can improve performance. 
 
-## <a id=SECTION00690000000000000000></a> Merging tables
+## <a id=Merging_tablets></a> Merging tablets
 
 Over time, a table can get very large, so large that it has hundreds of thousands of split points. Once there are enough tablets to spread a table across the entire cluster, additional splits may not improve performance, and may create unnecessary bookkeeping. The distribution of data may change over time. For example, if row data contains date information, and data is continually added and removed to maintain a window of current information, tablets for older rows may be empty. 
 
@@ -334,9 +335,9 @@ Merging away small tablets works on one 
     root@myinstance> merge -t myTable
     
 
-## <a id=SECTION006100000000000000000></a> Delete Range
+## <a id=Delete_Range></a> Delete Range
 
-Consider an indexing scheme that uses date information in each row. For example ``20110823-15:20:25.013'' might be a row that specifies a date and time. In some cases, we might like to delete rows based on this date, say to remove all the data older than the current year. Accumulo supports a delete range operation which can efficiently remove data between two rows. For example: 
+Consider an indexing scheme that uses date information in each row. For example ``20110823-15:20:25.013'' might be a row that specifies a date and time. In some cases, we might like to delete rows based on this date, say to remove all the data older than the current year. Accumulo supports a delete range operation which can efficiently removes data between two rows. For example: 
     
     
     root@myinstance> deleterange -t myTable -s 2010 -e 2011
@@ -352,19 +353,75 @@ The shell will not allow you to delete a
 
 Range deletion is implemented using splits at the given start/end positions, and will affect the number of splits in the table. 
 
+## <a id=Cloning_Tables></a> Cloning Tables
+
+A new table can be created that points to an existing table's data. This is a very quick metadata operation, no data is actually copied. The cloned table and the source table can change independently after the clone operation. One use case for this feature is testing. For example to test a new filtering iterator, clone the table, add the filter to the clone, and force a major compaction. To perform a test on less data, clone a table and then use delete range to efficiently remove a lot of data from the clone. Another use case is generating a snapshot to guard against human error. To create a snapshot, clone a table and then disable write permissions on the clone. 
+
+The clone operation will point to the source table's files. This is why the flush option is present and is enabled by default in the shell. If the flush option is not enabled, then any data the source table currently has in memory will not exist in the clone. 
+
+A cloned table copies the configuration of the source table. However the permissions of the source table are not copied to the clone. After a clone is created, only the user that created the clone can read and write to it. 
+
+In the following example we see that data inserted after the clone operation is not visible in the clone. 
+    
+    
+    root@a14> createtable people
+    root@a14 people> insert 890435 name last Doe
+    root@a14 people> insert 890435 name first John
+    root@a14 people> clonetable people test  
+    root@a14 people> insert 890436 name first Jane
+    root@a14 people> insert 890436 name last Doe  
+    root@a14 people> scan
+    890435 name:first []    John
+    890435 name:last []    Doe
+    890436 name:first []    Jane
+    890436 name:last []    Doe
+    root@a14 people> table test
+    root@a14 test> scan
+    890435 name:first []    John
+    890435 name:last []    Doe
+    root@a14 test>
+    
+
+The du command in the shell shows how much space a table is using in HDFS. This command can also show how much overlapping space two cloned tables have in HDFS. In the example below du shows table ci is using 428M. Then ci is cloned to cic and du shows that both tables share 428M. After three entries are inserted into cic and its flushed, du shows the two tables still share 428M but cic has 226 bytes to itself. Finally, table cic is compacted and then du shows that each table uses 428M. 
+    
+    
+    root@a14> du ci           
+                 428,482,573 [ci]
+    root@a14> clonetable ci cic
+    root@a14> du ci cic
+                 428,482,573 [ci, cic]
+    root@a14> table cic
+    root@a14 cic> insert r1 cf1 cq1 v1
+    root@a14 cic> insert r1 cf1 cq2 v2
+    root@a14 cic> insert r1 cf1 cq3 v3 
+    root@a14 cic> flush -t cic -w 
+    27 15:00:13,908 [shell.Shell] INFO : Flush of table cic completed.
+    root@a14 cic> du ci cic       
+                 428,482,573 [ci, cic]
+                         226 [cic]
+    root@a14 cic> compact -t cic -w
+    27 15:00:35,871 [shell.Shell] INFO : Compacting table ...
+    27 15:03:03,303 [shell.Shell] INFO : Compaction of table cic completed for given range
+    root@a14 cic> du ci cic        
+                 428,482,573 [ci]
+                 428,482,612 [cic]
+    root@a14 cic>
+    
+
   
 
 
 * * *
 
-   [1]: Table_Configuration.html#SECTION00610000000000000000
-   [2]: Table_Configuration.html#SECTION00620000000000000000
-   [3]: Table_Configuration.html#SECTION00630000000000000000
-   [4]: Table_Configuration.html#SECTION00640000000000000000
-   [5]: Table_Configuration.html#SECTION00650000000000000000
-   [6]: Table_Configuration.html#SECTION00660000000000000000
-   [7]: Table_Configuration.html#SECTION00670000000000000000
-   [8]: Table_Configuration.html#SECTION00680000000000000000
-   [9]: Table_Configuration.html#SECTION00690000000000000000
-   [10]: Table_Configuration.html#SECTION006100000000000000000
+   [1]: Table_Configuration.html#Locality_Groups
+   [2]: Table_Configuration.html#Constraints
+   [3]: Table_Configuration.html#Bloom_Filters
+   [4]: Table_Configuration.html#Iterators
+   [5]: Table_Configuration.html#Versioning_Iterators_and_Timestamps
+   [6]: Table_Configuration.html#Filters
+   [7]: Table_Configuration.html#Aggregating_Iterators
+   [8]: Table_Configuration.html#Pre-splitting_tables
+   [9]: Table_Configuration.html#Merging_tablets
+   [10]: Table_Configuration.html#Delete_Range
+   [11]: Table_Configuration.html#Cloning_Tables
 

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.mdtext Tue Nov  1 14:49:30 2011
@@ -26,9 +26,9 @@ Notice:    Licensed to the Apache Softwa
 
 * * *
 
-## <a id=SECTION00700000000000000000></a> Table Design
+## <a id=Table_Design></a> Table Design
 
-## <a id=SECTION00710000000000000000></a> Basic Table
+## <a id=Basic_Table></a> Basic Table
 
 Since Accumulo tables are sorted by row ID, each table can be thought of as being indexed by the row ID. Lookups performed row ID can be executed quickly, by doing a binary search, first across the tablets, and then within a tablet. Clients should choose a row ID carefully in order to support their desired application. A simple rule is to select a unique identifier as the row ID for each entity to be stored and assign all the other attributes to be tracked to be columns under this row ID. For example, if we have the following data in a comma-separated file: 
     
@@ -59,7 +59,7 @@ We could then retrieve any of the column
         System.out.println(entry.getValue().toString());
     
 
-## <a id=SECTION00720000000000000000></a> RowID Design
+## <a id=RowID_Design></a> RowID Design
 
 Often it is necessary to transform the rowID in order to have rows ordered in a way that is optimal for anticipated access patterns. A good example of this is reversing the order of components of internet domain names in order to group rows of the same parent domain together: 
     
@@ -95,7 +95,7 @@ It could also be done by adding a string
 
 Appending dates provides the additional capability of restricting a scan to a given date range. 
 
-## <a id=SECTION00730000000000000000></a> Indexing
+## <a id=Indexing></a> Indexing
 
 In order to support lookups via more than one attribute of an entity, additional indexes can be built. However, because Accumulo tables can support any number of columns without specifying them beforehand, a single additional index will often suffice for supporting lookups of records in the main table. Here, the index has, as the rowID, the Value or Term from the main table, the column families are the same, and the column qualifier of the index table contains the rowID from the main table. 
 
@@ -133,7 +133,7 @@ To support efficient lookups of multiple
 
 One advantage of the dynamic schema capabilities of Accumulo is that different fields may be indexed into the same physical table. However, it may be necessary to create different index tables if the terms must be formatted differently in order to maintain proper sort order. For example, real numbers must be formatted differently than their usual notation in order to be sorted correctly. In these cases, usually one index per unique data type will suffice. 
 
-## <a id=SECTION00740000000000000000></a> Entity-Attribute and Graph Tables
+## <a id=Entity-Attribute_and_Graph_Tables></a> Entity-Attribute and Graph Tables
 
 Accumulo is ideal for storing entities and their attributes, especially of the attributes are sparse. It is often useful to join several datasets together on common entities within the same table. This can allow for the representation of graphs, including nodes, their attributes, and connections to other nodes. 
 
@@ -151,7 +151,7 @@ For example, to keep track of employees,
 
 To allow efficient updating of edge weights, an aggregating iterator can be configured to add the value of all mutations applied with the same key. These types of tables can easily be created from raw events by simply extracting the entities, attributes, and relationships from individual events and inserting the keys into Accumulo each with a count of 1. The aggregating iterator will take care of maintaining the edge weights. 
 
-## <a id=SECTION00750000000000000000></a> Document-Partitioned Indexing
+## <a id=Document-Partitioned_Indexing></a> Document-Partitioned Indexing
 
 Using a simple index as described above works well when looking for records that match one of a set of given criteria. When looking for records that match more than one criterion simultaneously, such as when looking for documents that contain all of the words `the' and `white' and `house', there are several issues. 
 
@@ -190,11 +190,11 @@ Of course, fast servers will return thei
 
 * * *
 
-   [1]: Table_Design.html#SECTION00710000000000000000
-   [2]: Table_Design.html#SECTION00720000000000000000
-   [3]: Table_Design.html#SECTION00730000000000000000
-   [4]: Table_Design.html#SECTION00740000000000000000
-   [5]: Table_Design.html#SECTION00750000000000000000
+   [1]: Table_Design.html#Basic_Table
+   [2]: Table_Design.html#RowID_Design
+   [3]: Table_Design.html#Indexing
+   [4]: Table_Design.html#Entity-Attribute_and_Graph_Tables
+   [5]: Table_Design.html#Document-Partitioned_Indexing
    [6]: img2.png
    [7]: img3.png
    [8]: img4.png

Modified: incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Writing_Accumulo_Clients.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Writing_Accumulo_Clients.mdtext?rev=1196033&r1=1196032&r2=1196033&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Writing_Accumulo_Clients.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/user_manual_1.4-incubating/Writing_Accumulo_Clients.mdtext Tue Nov  1 14:49:30 2011
@@ -23,7 +23,7 @@ Notice:    Licensed to the Apache Softwa
 
 * * *
 
-## <a id=SECTION00500000000000000000></a> Writing Accumulo Clients
+## <a id=Writing_Accumulo_Clients></a> Writing Accumulo Clients
 
 All clients must first identify the Accumulo instance to which they will be communicating. Code to do this is as follows: 
     
@@ -35,7 +35,7 @@ All clients must first identify the Accu
     Connector conn = inst.getConnector("user", "passwd");
     
 
-## <a id=SECTION00510000000000000000></a> Writing Data
+## <a id=Writing_Data></a> Writing Data
 
 Data are written to Accumulo by creating Mutation objects that represent all the changes to the columns of a single row. The changes are made atomically in the TabletServer. Clients then add Mutations to a BatchWriter which submits them to the appropriate TabletServers. 
 
@@ -54,7 +54,7 @@ Mutations can be created thus: 
     mutation.put(colFam, colQual, colVis, timestamp, value);
     
 
-### <a id=SECTION00511000000000000000></a> BatchWriter
+### <a id=BatchWriter></a> BatchWriter
 
 The BatchWriter is highly optimized to send Mutations to multiple TabletServers and automatically batches Mutations destined for the same TabletServer to amortize network overhead. Care must be taken to avoid changing the contents of any Object passed to the BatchWriter since it keeps objects in memory while batching. 
 
@@ -76,11 +76,11 @@ Mutations are added to a BatchWriter thu
 An example of using the batch writer can be found at   
 accumulo/docs/examples/README.batch 
 
-## <a id=SECTION00520000000000000000></a> Reading Data
+## <a id=Reading_Data></a> Reading Data
 
 Accumulo is optimized to quickly retrieve the value associated with a given key, and to efficiently return ranges of consecutive keys and their associated values. 
 
-### <a id=SECTION00521000000000000000></a> Scanner
+### <a id=Scanner></a> Scanner
 
 To retrieve data, Clients use a Scanner, which provides acts like an Iterator over keys and values. Scanners can be configured to start and stop at particular keys, and to return a subset of the columns available. 
     
@@ -100,7 +100,22 @@ To retrieve data, Clients use a Scanner,
     }
     
 
-### <a id=SECTION00522000000000000000></a> BatchScanner
+### <a id=Isolated_Scanner></a> Isolated Scanner
+
+Accumulo supports the ability to present an isolated view of rows when scanning. There are three possible ways that a row could change in accumulo : 
+
+* a mutation applied to a table 
+* iterators executed as part of a minor or major compaction 
+* bulk import of new files 
+
+Isolation guarantees that either all or none of the changes made by these operations on a row are seen. Use the IsolatedScanner to obtain an isolated view of a accumulo table. When using the regular scanner it is possible to see a non isolated view of a row. For example if a mutation modifies three columns, it is possible that you will only see two of those modifications. With the isolated scanner either all three of the changes are seen or none. 
+
+The IsolatedScanner buffers rows on the client side so a large row will not crash a tablet server. By default rows are buffered in memory, but the user can easily supply their own buffer if they wish to buffer to disk when rows are large. 
+
+For an example, look at the following   
+src/examples/src/main/java/org/apache/accumulo/examples/isolation/InterferenceTest.java
+
+### <a id=BatchScanner></a> BatchScanner
 
 For some types of access, it is more efficient to retrieve several ranges simultaneously. This arises when accessing a set of rows that are not consecutive whose IDs have been retrieved from a secondary index, for example. 
 
@@ -128,6 +143,6 @@ accumulo/docs/examples/README.batch 
 
 * * *
 
-   [1]: Writing_Accumulo_Clients.html#SECTION00510000000000000000
-   [2]: Writing_Accumulo_Clients.html#SECTION00520000000000000000
+   [1]: Writing_Accumulo_Clients.html#Writing_Data
+   [2]: Writing_Accumulo_Clients.html#Reading_Data