You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by bi...@apache.org on 2014/05/08 04:16:19 UTC

[2/5] ACCUMULO-1327 converted latex manual to asciidoc

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/design.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/design.tex b/docs/src/main/latex/accumulo_user_manual/chapters/design.tex
deleted file mode 100644
index 8507f24..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/design.tex
+++ /dev/null
@@ -1,186 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{Accumulo Design}
-
-\section{Data Model}
-
-Accumulo provides a richer data model than simple key-value stores, but is not a
-fully relational database. Data is represented as key-value pairs, where the key and
-value are comprised of the following elements:
-
-\begin{center}
-$\begin{array}{|c|c|c|c|c|c|} \hline
-\multicolumn{5}{|c|}{\mbox{Key}} & \multirow{3}{*}{\mbox{Value}}\\ \cline{1-5}
-\multirow{2}{*}{\mbox{Row ID}}& \multicolumn{3}{|c|}{\mbox{Column}} & \multirow{2}{*}{\mbox{Timestamp}} & \\ \cline{2-4}
-& \mbox{Family} & \mbox{Qualifier} & \mbox{Visibility} & & \\ \hline
-\end{array}$
-\end{center}
-
-All elements of the Key and the Value are represented as byte arrays except for
-Timestamp, which is a Long. Accumulo sorts keys by element and lexicographically
-in ascending order. Timestamps are sorted in descending order so that later
-versions of the same Key appear first in a sequential scan. Tables consist of a set of
-sorted key-value pairs.
-
-\section{Architecture}
-
-Accumulo is a distributed data storage and retrieval system and as such consists of
-several architectural components, some of which run on many individual servers.
-Much of the work Accumulo does involves maintaining certain properties of the
-data, such as organization, availability, and integrity, across many commodity-class
-machines.
-
-\section{Components}
-
-An instance of Accumulo includes many TabletServers, one Garbage Collector process, 
-one Master server and many Clients.
-
-\subsection{Tablet Server}
-
-The TabletServer manages some subset of all the tablets (partitions of tables). This includes receiving writes from clients, persisting writes to a
-write-ahead log, sorting new key-value pairs in memory, periodically
-flushing sorted key-value pairs to new files in HDFS, and responding
-to reads from clients, forming a merge-sorted view of all keys and
-values from all the files it has created and the sorted in-memory
-store.
-
-TabletServers also perform recovery of a tablet
-that was previously on a server that failed, reapplying any writes
-found in the write-ahead log to the tablet.
-
-\subsection{Garbage Collector}
-
-Accumulo processes will share files stored in HDFS. Periodically, the Garbage
-Collector will identify files that are no longer needed by any process, and
-delete them. Multiple garbage collectors can be run to provide hot-standby support.
-They will perform leader election among themselves to choose a single active instance.
-
-\subsection{Master}
-
-The Accumulo Master is responsible for detecting and responding to TabletServer
-failure. It tries to balance the load across TabletServer by assigning tablets carefully
-and instructing TabletServers to unload tablets when necessary. The Master ensures all
-tablets are assigned to one TabletServer each, and handles table creation, alteration,
-and deletion requests from clients. The Master also coordinates startup, graceful
-shutdown and recovery of changes in write-ahead logs when Tablet servers fail.
-
-Multiple masters may be run. The masters will choose among themselves a single master,
-and the others will become backups if the master should fail.
-
-\subsection{Tracer}
-
-The Accumulo Tracer process supports the distributed timing API provided by Accumulo.
-One to many of these processes can be run on a cluster which will write the timing
-information to a given Accumulo table for future reference. Seeing the section on
-Tracing for more information on this support.
-
-\subsection{Monitor}
-
-The Accumulo Monitor is a web application that provides a wealth of information about
-the state of an instance. The Monitor shows graphs and tables which contain information
-about read/write rates, cache hit/miss rates, and Accumulo table information such as scan
-rate and active/queued compactions. Additionally, the Monitor should always be the first
-point of entry when attempting to debug an Accumulo problem as it will show high-level problems
-in addition to aggregated errors from all nodes in the cluster. See the section on Monitoring
-for more information.
-
-Multiple Monitors can be run to provide hot-standby support in the face of failure. Due to the
-forwarding of logs from remote hosts to the Monitor, only one Monitor process should be active
-at one time. Leader election will be performed internally to choose the active Monitor.
-
-\subsection{Client}
-
-Accumulo includes a client library that is linked to every application. The client
-library contains logic for finding servers managing a particular tablet, and
-communicating with TabletServers to write and retrieve key-value pairs.
-
-\section{Data Management}
-
-Accumulo stores data in tables, which are partitioned into tablets. Tablets are
-partitioned on row boundaries so that all of the columns and values for a particular
-row are found together within the same tablet. The Master assigns Tablets to one
-TabletServer at a time. This enables row-level transactions to take place without
-using distributed locking or some other complicated synchronization mechanism. As
-clients insert and query data, and as machines are added and removed from the
-cluster, the Master migrates tablets to ensure they remain available and that the
-ingest and query load is balanced across the cluster.
-
-\begin{center}
-\includegraphics[scale=0.4]{images/data_distribution.png}
-\end{center}
-
-\section{Tablet Service}
-
-
-When a write arrives at a TabletServer it is written to a Write-Ahead Log and
-then inserted into a sorted data structure in memory called a MemTable. When the
-MemTable reaches a certain size the TabletServer writes out the sorted key-value
-pairs to a file in HDFS called Indexed Sequential Access Method (ISAM)
-file. This process is called a minor compaction. A new MemTable is then created
-and the fact of the compaction is recorded in the Write-Ahead Log.
-
-When a request to read data arrives at a TabletServer, the TabletServer does a
-binary search across the MemTable as well as the in-memory indexes associated
-with each ISAM file to find the relevant values. If clients are performing a
-scan, several key-value pairs are returned to the client in order from the
-MemTable and the set of ISAM files by performing a merge-sort as they are read.
-
-\section{Compactions}
-
-In order to manage the number of files per tablet, periodically the TabletServer
-performs Major Compactions of files within a tablet, in which some set of ISAM
-files are combined into one file. The previous files will eventually be removed
-by the Garbage Collector. This also provides an opportunity to permanently
-remove deleted key-value pairs by omitting key-value pairs suppressed by a
-delete entry when the new file is created.
-
-\section{Splitting}
-
-When a table is created it has one tablet. As the table grows its initial
-tablet eventually splits into two tablets. Its likely that one of these
-tablets will migrate to another tablet server. As the table continues to grow,
-its tablets will continue to split and be migrated. The decision to
-automatically split a tablet is based on the size of a tablets files. The
-size threshold at which a tablet splits is configurable per table. In addition
-to automatic splitting, a user can manually add split points to a table to
-create new tablets. Manually splitting a new table can parallelize reads and
-writes giving better initial performance without waiting for automatic
-splitting.
-
-As data is deleted from a table, tablets may shrink. Over time this can lead
-to small or empty tablets. To deal with this, merging of tablets was
-introduced in Accumulo 1.4. This is discussed in more detail later.
-
-\section{Fault-Tolerance}
-
-If a TabletServer fails, the Master detects it and automatically reassigns the tablets
-assigned from the failed server to other servers. Any key-value pairs that were in
-memory at the time the TabletServer fails are automatically reapplied from the Write-Ahead
-Log(WAL) to prevent any loss of data.
-
-Tablet servers write their WALs directly to HDFS so the logs are available to all tablet
-servers for recovery. To make the recovery process efficient, the updates within a log are
-grouped by tablet.  TabletServers can quickly apply the mutations from the sorted logs
-that are destined for the tablets they have now been assigned.
-
-TabletServer failures are noted on the Master's monitor page, accessible via\\
-\mbox{http://master-address:50095/monitor}.
-
-\begin{center}
-\includegraphics[scale=0.4]{images/failure_handling.png}
-\end{center}
-

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex b/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex
deleted file mode 100644
index fb7195d..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/development_clients.tex
+++ /dev/null
@@ -1,109 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{Development Clients}
-
-Normally, Accumulo consists of lots of moving parts. Even a stand-alone version of
-Accumulo requires Hadoop, Zookeeper, the Accumulo master, a tablet server, etc. If
-you want to write a unit test that uses Accumulo, you need a lot of infrastructure
-in place before your test can run.
-
-\section{Mock Accumulo}
-
-Mock Accumulo supplies mock implementations for much of the client API. It presently
-does not enforce users, logins, permissions, etc. It does support Iterators and Combiners.
-Note that MockAccumulo holds all data in memory, and will not retain any data or
-settings between runs.
-
-While normal interaction with the Accumulo client looks like this:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-Instance instance = new ZooKeeperInstance(...);
-Connector conn = instance.getConnector(user, passwordToken);
-\end{verbatim}\endgroup
-
-To interact with the MockAccumulo, just replace the ZooKeeperInstance with MockInstance:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-Instance instance = new MockInstance();
-\end{verbatim}\endgroup
-
-In fact, you can use the "fake" option to the Accumulo shell and interact with
-MockAccumulo:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-$ ./bin/accumulo shell --fake -u root -p ''
-
-Shell - Apache Accumulo Interactive Shell
--
-- version: 1.6
-- instance name: fake
-- instance id: mock-instance-id
--
-- type 'help' for a list of available commands
--
-root@fake> createtable test
-root@fake test> insert row1 cf cq value
-root@fake test> insert row2 cf cq value2
-root@fake test> insert row3 cf cq value3
-root@fake test> scan
-row1 cf:cq []    value
-row2 cf:cq []    value2
-row3 cf:cq []    value3
-root@fake test> scan -b row2 -e row2
-row2 cf:cq []    value2
-root@fake test>
-\end{verbatim}\endgroup
-
-When testing Map Reduce jobs, you can also set the Mock Accumulo on the AccumuloInputFormat
-and AccumuloOutputFormat classes:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-// ... set up job configuration
-AccumuloInputFormat.setMockInstance(job, "mockInstance");
-AccumuloOutputFormat.setMockInstance(job, "mockInstance");
-\end{verbatim}\endgroup
-
-\section{Mini Accumulo Cluster}
-
-While the Mock Accumulo provides a lightweight implementation of the client API for unit
-testing, it is often necessary to write more realistic end-to-end integration tests that
-take advantage of the entire ecosystem. The Mini Accumulo Cluster makes this possible by
-configuring and starting Zookeeper, initializing Accumulo, and starting the Master as well
-as some Tablet Servers. It runs against the local filesystem instead of having to start
-up HDFS.
-
-To start it up, you will need to supply an empty directory and a root password as arguments:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-File tempDirectory = // JUnit and Guava supply mechanisms for creating temp directories
-MiniAccumuloCluster accumulo = new MiniAccumuloCluster(tempDirectory, "password");
-accumulo.start();
-\end{verbatim}\endgroup
-
-Once we have our mini cluster running, we will want to interact with the Accumulo client API:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-Instance instance = new ZooKeeperInstance(accumulo.getInstanceName(), accumulo.getZooKeepers());
-Connector conn = instance.getConnector("root", new PasswordToken("password"));
-\end{verbatim}\endgroup
-
-Upon completion of our development code, we will want to shutdown our MiniAccumuloCluster:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-accumulo.stop()
-// delete your temporary folder
-\end{verbatim}\endgroup

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex b/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex
deleted file mode 100644
index ab766d0..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/high_speed_ingest.tex
+++ /dev/null
@@ -1,133 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{High-Speed Ingest}
-
-Accumulo is often used as part of a larger data processing and storage system. To
-maximize the performance of a parallel system involving Accumulo, the ingestion
-and query components should be designed to provide enough parallelism and
-concurrency to avoid creating bottlenecks for users and other systems writing to
-and reading from Accumulo. There are several ways to achieve high ingest
-performance.
-
-\section{Pre-Splitting New Tables}
-
-New tables consist of a single tablet by default. As mutations are applied, the table
-grows and splits into multiple tablets which are balanced by the Master across
-TabletServers. This implies that the aggregate ingest rate will be limited to fewer
-servers than are available within the cluster until the table has reached the point
-where there are tablets on every TabletServer.
-
-Pre-splitting a table ensures that there are as many tablets as desired available
-before ingest begins to take advantage of all the parallelism possible with the cluster
-hardware. Tables can be split anytime by using the shell:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance mytable> addsplits -sf /local_splitfile -t mytable
-\end{verbatim}\endgroup
-
-For the purposes of providing parallelism to ingest it is not necessary to create more
-tablets than there are physical machines within the cluster as the aggregate ingest
-rate is a function of the number of physical machines. Note that the aggregate ingest
-rate is still subject to the number of machines running ingest clients, and the
-distribution of rowIDs across the table. The aggregation ingest rate will be
-suboptimal if there are many inserts into a small number of rowIDs.
-
-\section{Multiple Ingester Clients}
-
-Accumulo is capable of scaling to very high rates of ingest, which is dependent upon
-not just the number of TabletServers in operation but also the number of ingest
-clients. This is because a single client, while capable of batching mutations and
-sending them to all TabletServers, is ultimately limited by the amount of data that
-can be processed on a single machine. The aggregate ingest rate will scale linearly
-with the number of clients up to the point at which either the aggregate I/O of
-TabletServers or total network bandwidth capacity is reached.
-
-In operational settings where high rates of ingest are paramount, clusters are often
-configured to dedicate some number of machines solely to running Ingester Clients.
-The exact ratio of clients to TabletServers necessary for optimum ingestion rates
-will vary according to the distribution of resources per machine and by data type.
-
-\section{Bulk Ingest}
-
-Accumulo supports the ability to import files produced by an external process such
-as MapReduce into an existing table. In some cases it may be faster to load data this
-way rather than via ingesting through clients using BatchWriters. This allows a large
-number of machines to format data the way Accumulo expects. The new files can
-then simply be introduced to Accumulo via a shell command.
-
-To configure MapReduce to format data in preparation for bulk loading, the job
-should be set to use a range partitioner instead of the default hash partitioner. The
-range partitioner uses the split points of the Accumulo table that will receive the
-data. The split points can be obtained from the shell and used by the MapReduce
-RangePartitioner. Note that this is only useful if the existing table is already split
-into multiple tablets.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance mytable> getsplits
-aa
-ab
-ac
-...
-zx
-zy
-zz
-\end{verbatim}\endgroup
-
-Run the MapReduce job, using the AccumuloFileOutputFormat to create the files to
-be introduced to Accumulo. Once this is complete, the files can be added to
-Accumulo via the shell:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance mytable> importdirectory /files_dir /failures
-\end{verbatim}\endgroup
-
-Note that the paths referenced are directories within the same HDFS instance over
-which Accumulo is running. Accumulo places any files that failed to be added to the
-second directory specified.
-
-A complete example of using Bulk Ingest can be found at\\
-accumulo/docs/examples/README.bulkIngest
-
-\section{Logical Time for Bulk Ingest}
-
-Logical time is important for bulk imported data, for which the client code may
-be choosing a timestamp. At bulk import time, the user can choose to enable
-logical time for the set of files being imported. When its enabled, Accumulo
-uses a specialized system iterator to lazily set times in a bulk imported file.
-This mechanism guarantees that times set by unsynchronized multi-node
-applications (such as those running on MapReduce) will maintain some semblance
-of causal ordering. This mitigates the problem of the time being wrong on the
-system that created the file for bulk import. These times are not set when the
-file is imported, but whenever it is read by scans or compactions. At import, a
-time is obtained and always used by the specialized system iterator to set that
-time.
-
-The timestamp assigned by Accumulo will be the same for every key in the file.
-This could cause problems if the file contains multiple keys that are identical
-except for the timestamp. In this case, the sort order of the keys will be
-undefined. This could occur if an insert and an update were in the same bulk
-import file.
-
-\section{MapReduce Ingest}
-It is possible to efficiently write many mutations to Accumulo in parallel via a
-MapReduce job. In this scenario the MapReduce is written to process data that lives
-in HDFS and write mutations to Accumulo using the AccumuloOutputFormat. See
-the MapReduce section under Analytics for details.
-
-An example of using MapReduce can be found under\\
-accumulo/docs/examples/README.mapred
-

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex b/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex
deleted file mode 100644
index 09d9f14..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/introduction.tex
+++ /dev/null
@@ -1,27 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{Introduction}
-Apache Accumulo is a highly scalable structured store based on Google's BigTable.
-Accumulo is written in Java and operates over the Hadoop Distributed File System
-(HDFS), which is part of the popular Apache Hadoop project. Accumulo supports
-efficient storage and retrieval of structured data, including queries for ranges, and
-provides support for using Accumulo tables as input and output for MapReduce
-jobs.
-
-Accumulo features automatic load-balancing and partitioning, data compression
-and fine-grained security labels.
-

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/multivolume.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/multivolume.tex b/docs/src/main/latex/accumulo_user_manual/chapters/multivolume.tex
deleted file mode 100644
index 0a0e6fe..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/multivolume.tex
+++ /dev/null
@@ -1,85 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{Multi-Volume Installations}
-
-This is an advanced configuration setting for very large clusters
-under a lot of write pressure.
-
-The HDFS NameNode holds all of the metadata about the files in
-HDFS. For fast performance, all of this information needs to be stored
-in memory.  A single NameNode with 64G of memory can store the
-metadata for tens of millions of files.However, when scaling beyond a
-thousand nodes, an active Accumulo system can generate lots of updates
-to the file system, especially when data is being ingested.  The large
-number of write transactions to the NameNode, and the speed of a
-single edit log, can become the limiting factor for large scale
-Accumulo installations.
-
-You can see the effect of slow write transactions when the Accumulo
-Garbage Collector takes a long time (more than 5 minutes) to delete
-the files Accumulo no longer needs.  If your Garbage Collector
-routinely runs in less than a minute, the NameNode is performing well.
-
-However, if you do begin to experience slow-down and poor GC
-performance, Accumulo can be configured to use multiple NameNode
-servers.  The configuration ``instance.volumes'' should be set to a
-comma-separated list, using full URI references to different NameNode
-servers:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-  <property>
-    <name>instance.volumes</name>
-    <value>hdfs://ns1:9001,hdfs://ns2:9001</value>
-  </property>
-\end{verbatim}\endgroup
-
-The introduction of multiple volume support in 1.6 changed the way Accumulo
-stores pointers to files.  It now stores fully qualified URI references to
-files.  Before 1.6, Accumulo stored paths that were relative to a table
-directory.   After an upgrade these relative paths will still exist and are
-resolved using instance.dfs.dir, instance.dfs.uri, and Hadoop configuration in
-the same way they were before 1.6. 
-
-If the URI for a namenode changes (e.g. namenode was running on host1 and its
-moved to host2), then Accumulo will no longer function.  Even if Hadoop and
-Accumulo configurations are changed, the fully qualified URIs stored in
-Accumulo will still contain the old URI.  To handle this Accumulo has the
-following configuration property for replacing URI stored in its metadata.  The
-example configuration below will replace ns1 with nsA and ns2 with nsB in
-Accumulo metadata.  For this property to take affect, Accumulo will need to be
-restarted.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-  <property>
-    <name>instance.volumes.replacements</name>
-    <value>hdfs://ns1:9001 hdfs://nsA:9001, hdfs://ns2:9001 hdfs://nsB:9001</value>
-  </property>
-\end{verbatim}\endgroup
-
-Using viewfs or HA namenode, introduced in Hadoop 2, offers another option for
-managing the fully qualified URIs stored in Accumulo.  Viewfs and HA namenode
-both introduce a level of indirection in the Hadoop configuration.   For
-example assume viewfs:///nn1 maps to hdfs://nn1 in the Hadoop configuration.
-If viewfs://nn1 is used by Accumulo, then its easy to map viewfs://nn1 to
-hdfs://nnA by changing the Hadoop configuration w/o doing anything to Accumulo.
-A production system should probably use a HA namenode.  Viewfs may be useful on
-a test system with a single non HA namenode.
-
-You may also want to configure your cluster to use Federation,
-available in Hadoop 2.0, which allows DataNodes to respond to multiple
-NameNode servers, so you do not have to partition your DataNodes by
-NameNode.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/security.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/security.tex b/docs/src/main/latex/accumulo_user_manual/chapters/security.tex
deleted file mode 100644
index 83cfb21..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/security.tex
+++ /dev/null
@@ -1,179 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{Security}
-
-Accumulo extends the BigTable data model to implement a security mechanism
-known as cell-level security. Every key-value pair has its own security label, stored
-under the column visibility element of the key, which is used to determine whether
-a given user meets the security requirements to read the value. This enables data of
-various security levels to be stored within the same row, and users of varying
-degrees of access to query the same table, while preserving data confidentiality.
-
-\section{Security Label Expressions}
-
-When mutations are applied, users can specify a security label for each value. This is
-done as the Mutation is created by passing a ColumnVisibility object to the put()
-method:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-Text rowID = new Text("row1");
-Text colFam = new Text("myColFam");
-Text colQual = new Text("myColQual");
-ColumnVisibility colVis = new ColumnVisibility("public");
-long timestamp = System.currentTimeMillis();
-
-Value value = new Value("myValue");
-
-Mutation mutation = new Mutation(rowID);
-mutation.put(colFam, colQual, colVis, timestamp, value);
-\end{verbatim}\endgroup
-
-\section{Security Label Expression Syntax}
-
-Security labels consist of a set of user-defined tokens that are required to read the
-value the label is associated with. The set of tokens required can be specified using
-syntax that supports logical AND and OR combinations of tokens, as well as nesting
-groups of tokens together.
-
-For example, suppose within our organization we want to label our data values with
-security labels defined in terms of user roles. We might have tokens such as:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-admin
-audit
-system
-\end{verbatim}\endgroup
-
-These can be specified alone or combined using logical operators:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-// Users must have admin privileges:
-admin
-
-// Users must have admin and audit privileges
-admin&audit
-
-// Users with either admin or audit privileges
-admin|audit
-
-// Users must have audit and one or both of admin or system
-(admin|system)&audit
-\end{verbatim}\endgroup
-
-When both \verb^|^ and \verb^&^ operators are used, parentheses must be used to specify
-precedence of the operators.
-
-\section{Authorization}
-
-When clients attempt to read data from Accumulo, any security labels present are
-examined against the set of authorizations passed by the client code when the
-Scanner or BatchScanner are created. If the authorizations are determined to be
-insufficient to satisfy the security label, the value is suppressed from the set of
-results sent back to the client.
-
-Authorizations are specified as a comma-separated list of tokens the user possesses:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-// user possess both admin and system level access
-Authorization auths = new Authorization("admin","system");
-
-Scanner s = connector.createScanner("table", auths);
-\end{verbatim}\endgroup
-
-\section{User Authorizations}
-
-Each Accumulo user has a set of associated security labels. To manipulate
-these in the shell while using the default authorizor, use the setuaths and getauths commands.
-These may also be modified for the default authorizor using the java security operations API. 
-
-When a user creates a scanner a set of Authorizations is passed. If the
-authorizations passed to the scanner are not a subset of the users
-authorizations, then an exception will be thrown.
-
-To prevent users from writing data they can not read, add the visibility
-constraint to a table. Use the -evc option in the createtable shell command to
-enable this constraint. For existing tables use the following shell command to
-enable the visibility constraint. Ensure the constraint number does not
-conflict with any existing constraints.
-  
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-config -t table -s table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint
-\end{verbatim}\endgroup
-
-Any user with the alter table permission can add or remove this constraint.
-This constraint is not applied to bulk imported data, if this a concern then
-disable the bulk import permission.
-
-\section{Pluggable Security}
-
-New in 1.5 of Accumulo is a pluggable security mechanism. It can be broken into three actions-
-authentication, authorization, and permission handling. By default all of these are handled in
-Zookeeper, which is how things were handled in Accumulo 1.4 and before. It is worth noting at this
-point, that it is a new feature in 1.5 and may be adjusted in future releases without the standard
-deprecation cycle.
-
-Authentication simply handles the ability for a user to verify their integrity. A combination of 
-principal and authentication token are used to verify a user is who they say they are. An 
-authentication token should be constructed, either directly through it's constructor, but it is 
-advised to use the init(Property) method to populate an authentication token. It is expected that a 
-user knows what the appropriate token to use for their system is. The default token is 
-PasswordToken. 
-
-Once a user is authenticated by the Authenticator, the user has access to the other actions within 
-Accumulo. All actions in Accumulo are ACLed, and this ACL check is handled by the Permission 
-Handler. This is what manages all of the permissions, which are divided in system and per table 
-level. From there, if a user is doing an action which requires authorizations, the Authorizor is 
-queried to determine what authorizations the user has.
-
-This setup allows a variety of different mechanisms to be used for handling different aspects of 
-Accumulo's security. A system like Kerberos can be used for authentication, then a system like LDAP 
-could be used to determine if a user has a specific permission, and then it may default back to the 
-default ZookeeperAuthorizor to determine what Authorizations a user is ultimately allowed to use. 
-This is a pluggable system so custom components can be created depending on your need.
-
-\section{Secure Authorizations Handling}
-
-For applications serving many users, it is not expected that an Accumulo user
-will be created for each application user. In this case an Accumulo user with
-all authorizations needed by any of the applications users must be created. To
-service queries, the application should create a scanner with the application
-user's authorizations. These authorizations could be obtained from a trusted 3rd
-party.
-
-Often production systems will integrate with Public-Key Infrastructure (PKI) and
-designate client code within the query layer to negotiate with PKI servers in order
-to authenticate users and retrieve their authorization tokens (credentials). This
-requires users to specify only the information necessary to authenticate themselves
-to the system. Once user identity is established, their credentials can be accessed by
-the client code and passed to Accumulo outside of the reach of the user.
-
-\section{Query Services Layer}
-
-Since the primary method of interaction with Accumulo is through the Java API,
-production environments often call for the implementation of a Query layer. This
-can be done using web services in containers such as Apache Tomcat, but is not a
-requirement. The Query Services Layer provides a mechanism for providing a
-platform on which user facing applications can be built. This allows the application
-designers to isolate potentially complex query logic, and enables a convenient point
-at which to perform essential security functions.
-
-Several production environments choose to implement authentication at this layer,
-where users identifiers are used to retrieve their access credentials which are then
-cached within the query layer and presented to Accumulo through the
-Authorizations mechanism.
-
-Typically, the query services layer sits between Accumulo and user workstations.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex b/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex
deleted file mode 100644
index f3c11ff..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/shell.tex
+++ /dev/null
@@ -1,138 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{Accumulo Shell} 
-Accumulo provides a simple shell that can be used to examine the contents and
-configuration settings of tables, insert/update/delete values, and change
-configuration settings. 
-
-The shell can be started by the following command:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-$ACCUMULO_HOME/bin/accumulo shell -u [username]
-\end{verbatim}\endgroup
-
-The shell will prompt for the corresponding password to the username specified
-and then display the following prompt:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-Shell - Apache Accumulo Interactive Shell
--
-- version 1.5
-- instance name: myinstance
-- instance id: 00000000-0000-0000-0000-000000000000
--
-- type 'help' for a list of available commands
--
-\end{verbatim}\endgroup
-
-\section{Basic Administration}
-
-The Accumulo shell can be used to create and delete tables, as well as to configure
-table and instance specific options.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> tables
-accumulo.metadata
-accumulo.root
-
-root@myinstance> createtable mytable
-
-root@myinstance mytable>
-
-root@myinstance mytable> tables
-accumulo.metadata
-accumulo.root
-mytable
-
-root@myinstance mytable> createtable testtable
-
-root@myinstance testtable>
-
-root@myinstance testtable> deletetable testtable
-deletetable { testtable } (yes|no)? yes
-Table: [testtable] has been deleted. 
-
-root@myinstance>
-\end{verbatim}\endgroup
-
-The Shell can also be used to insert updates and scan tables. This is useful for
-inspecting tables.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance mytable> scan
-
-root@myinstance mytable> insert row1 colf colq value1
-insert successful
-
-root@myinstance mytable> scan
-row1 colf:colq [] value1
-\end{verbatim}\endgroup
-
-The value in brackets "[]" would be the visibility labels. Since none were used, this is empty for this row.
-You can use the "-st" option to scan to see the timestamp for the cell, too.
-
-\section{Table Maintenance}
-
-The \textbf{compact} command instructs Accumulo to schedule a compaction of the table during which
-files are consolidated and deleted entries are removed.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance mytable> compact -t mytable
-07 16:13:53,201 [shell.Shell] INFO : Compaction of table mytable started for given range
-\end{verbatim}\endgroup
-
-The \textbf{flush} command instructs Accumulo to write all entries currently in memory for a given table
-to disk.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance mytable> flush -t mytable
-07 16:14:19,351 [shell.Shell] INFO : Flush of table mytable
-initiated...
-\end{verbatim}\endgroup
-
-\section{User Administration}
-
-The Shell can be used to add, remove, and grant privileges to users.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance mytable> createuser bob
-Enter new password for 'bob': *********
-Please confirm new password for 'bob': *********
-
-root@myinstance mytable> authenticate bob
-Enter current password for 'bob': *********
-Valid
-
-root@myinstance mytable> grant System.CREATE_TABLE -s -u bob
-
-root@myinstance mytable> user bob
-Enter current password for 'bob': *********
-
-bob@myinstance mytable> userpermissions
-System permissions: System.CREATE_TABLE
-Table permissions (accumulo.metadata): Table.READ
-Table permissions (mytable): NONE
-
-bob@myinstance mytable> createtable bobstable
-bob@myinstance bobstable>
-
-bob@myinstance bobstable> user root
-Enter current password for 'root': *********
-
-root@myinstance bobstable> revoke System.CREATE_TABLE -s -u bob
-\end{verbatim}\endgroup
-

http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex b/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex
deleted file mode 100644
index a19cb52..0000000
--- a/docs/src/main/latex/accumulo_user_manual/chapters/table_configuration.tex
+++ /dev/null
@@ -1,663 +0,0 @@
-
-% Licensed to the Apache Software Foundation (ASF) under one or more
-% contributor license agreements. See the NOTICE file distributed with
-% this work for additional information regarding copyright ownership.
-% The ASF licenses this file to You under the Apache License, Version 2.0
-% (the "License"); you may not use this file except in compliance with
-% the License. You may obtain a copy of the License at
-%
-%     http://www.apache.org/licenses/LICENSE-2.0
-%
-% Unless required by applicable law or agreed to in writing, software
-% distributed under the License is distributed on an "AS IS" BASIS,
-% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-% See the License for the specific language governing permissions and
-% limitations under the License.
-
-\chapter{Table Configuration}
-
-Accumulo tables have a few options that can be configured to alter the default
-behavior of Accumulo as well as improve performance based on the data stored.
-These include locality groups, constraints, bloom filters, iterators, and block
-cache. For a complete list of available configuration options, see
-Appendix~\ref{app:config}.
-
-\section{Locality Groups}
-Accumulo supports storing sets of column families separately on disk to allow
-clients to efficiently scan over columns that are frequently used together and to avoid
-scanning over column families that are not requested. After a locality group is set,
-Scanner and BatchScanner operations will automatically take advantage of them
-whenever the fetchColumnFamilies() method is used.
-
-By default, tables place all column families into the same ``default'' locality group.
-Additional locality groups can be configured anytime via the shell or
-programmatically as follows:
-
-\subsection{Managing Locality Groups via the Shell}
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-usage: setgroups <group>=<col fam>{,<col fam>}{ <group>=<col fam>{,<col
-fam>}} [-?] -t <table>
-
-user@myinstance mytable> setgroups group_one=colf1,colf2 -t mytable
-
-user@myinstance mytable> getgroups -t mytable
-\end{verbatim}\endgroup
-
-\subsection{Managing Locality Groups via the Client API}
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-Connector conn;
-
-HashMap<String,Set<Text>> localityGroups = new HashMap<String, Set<Text>>();
-
-HashSet<Text> metadataColumns = new HashSet<Text>();
-metadataColumns.add(new Text("domain"));
-metadataColumns.add(new Text("link"));
-
-HashSet<Text> contentColumns = new HashSet<Text>();
-contentColumns.add(new Text("body"));
-contentColumns.add(new Text("images"));
-
-localityGroups.put("metadata", metadataColumns);
-localityGroups.put("content", contentColumns);
-
-conn.tableOperations().setLocalityGroups("mytable", localityGroups);
-
-// existing locality groups can be obtained as follows
-Map<String, Set<Text>> groups =
-    conn.tableOperations().getLocalityGroups("mytable");
-\end{verbatim}\endgroup
-
-The assignment of Column Families to Locality Groups can be changed anytime. The
-physical movement of column families into their new locality groups takes place via
-the periodic Major Compaction process that takes place continuously in the
-background. Major Compaction can also be scheduled to take place immediately
-through the shell:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance mytable> compact -t mytable
-\end{verbatim}\endgroup
-
-\section{Constraints}
-
-Accumulo supports constraints applied on mutations at insert time. This can be
-used to disallow certain inserts according to a user defined policy. Any mutation
-that fails to meet the requirements of the constraint is rejected and sent back to the
-client.
-
-Constraints can be enabled by setting a table property as follows:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance mytable> constraint -t mytable -a com.test.ExampleConstraint com.test.AnotherConstraint
-user@myinstance mytable> constraint -l
-com.test.ExampleConstraint=1
-com.test.AnotherConstraint=2
-\end{verbatim}\endgroup
-
-Currently there are no general-purpose constraints provided with the Accumulo
-distribution. New constraints can be created by writing a Java class that implements
-the following interface:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-org.apache.accumulo.core.constraints.Constraint
-\end{verbatim}\endgroup
-
-To deploy a new constraint, create a jar file containing the class implementing the
-new constraint and place it in the lib directory of the Accumulo installation. New
-constraint jars can be added to Accumulo and enabled without restarting but any
-change to an existing constraint class requires Accumulo to be restarted.
-
-An example of constraints can be found in\\
-\texttt{accumulo/docs/examples/README.constraints} with corresponding code under\\
-\texttt{accumulo/examples/simple/src/main/java/accumulo/examples/simple/constraints} .
-
-\section{Bloom Filters}
-As mutations are applied to an Accumulo table, several files are created per tablet. If
-bloom filters are enabled, Accumulo will create and load a small data structure into
-memory to determine whether a file contains a given key before opening the file.
-This can speed up lookups considerably.
-
-To enable bloom filters, enter the following command in the Shell:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance> config -t mytable -s table.bloom.enabled=true
-\end{verbatim}\endgroup
-
-An extensive example of using Bloom Filters can be found at\\
-\texttt{accumulo/docs/examples/README.bloom} .
-
-\section{Iterators}
-Iterators provide a modular mechanism for adding functionality to be executed by
-TabletServers when scanning or compacting data. This allows users to efficiently
-summarize, filter, and aggregate data. In fact, the built-in features of cell-level
-security and column fetching are implemented using Iterators.
-Some useful Iterators are provided with Accumulo and can be found in the org.apache.accumulo.core.iterators.user package.
-In each case, any custom Iterators must be included in Accumulo's classpath,
-typically by including a jar in \texttt{\$ACCUMULO\_HOME/lib} or
-\texttt{\$ACCUMULO\_HOME/lib/ext}, although the VFS classloader allows for
-classpath manipulation using a variety of schemes including URLs and HDFS URIs.
-
-\subsection{Setting Iterators via the Shell}
-
-Iterators can be configured on a table at scan, minor compaction and/or major
-compaction scopes. If the Iterator implements the OptionDescriber interface, the
-setiter command can be used which will interactively prompt the user to provide
-values for the given necessary options. 
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-usage: setiter [-?] -ageoff | -agg | -class <name> | -regex | 
--reqvis | -vers   [-majc] [-minc] [-n <itername>] -p <pri>   
-[-scan] [-t <table>]
-
-user@myinstance mytable> setiter -t mytable -scan -p 15 -n myiter -class com.company.MyIterator
-\end{verbatim}\endgroup
-
-The config command can always be used to manually configure iterators which is useful 
-in cases where the Iterator does not implement the OptionDescriber interface.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-config -t mytable -s table.iterator.scan.myiter=15,com.company.MyIterator
-config -t mytable -s table.iterator.minc.myiter=15,com.company.MyIterator
-config -t mytable -s table.iterator.majc.myiter=15,com.company.MyIterator
-config -t mytable -s table.iterator.scan.myiter.opt.myoptionname=myoptionvalue
-config -t mytable -s table.iterator.minc.myiter.opt.myoptionname=myoptionvalue
-config -t mytable -s table.iterator.majc.myiter.opt.myoptionname=myoptionvalue
-\end{verbatim}\endgroup
-
-\subsection{Setting Iterators Programmatically}
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-scanner.addIterator(new IteratorSetting(
-    15, // priority
-    "myiter", // name this iterator
-    "com.company.MyIterator" // class name
-));
-\end{verbatim}\endgroup
-
-Some iterators take additional parameters from client code, as in the following
-example:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-IteratorSetting iter = new IteratorSetting(...);
-iter.addOption("myoptionname", "myoptionvalue");
-scanner.addIterator(iter)
-\end{verbatim}\endgroup
-
-Tables support separate Iterator settings to be applied at scan time, upon minor
-compaction and upon major compaction. For most uses, tables will have identical
-iterator settings for all three to avoid inconsistent results.
-
-\subsection{Versioning Iterators and Timestamps}
-
-Accumulo provides the capability to manage versioned data through the use of
-timestamps within the Key. If a timestamp is not specified in the key created by the
-client then the system will set the timestamp to the current time. Two keys with
-identical rowIDs and columns but different timestamps are considered two versions
-of the same key. If two inserts are made into Accumulo with the same rowID,
-column, and timestamp, then the behavior is non-deterministic.
-
-Timestamps are sorted in descending order, so the most recent data comes first.
-Accumulo can be configured to return the top k versions, or versions later than a
-given date. The default is to return the one most recent version.
-
-The version policy can be changed by changing the VersioningIterator options for a
-table as follows:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance mytable> config -t mytable -s table.iterator.scan.vers.opt.maxVersions=3
-
-user@myinstance mytable> config -t mytable -s table.iterator.minc.vers.opt.maxVersions=3
-
-user@myinstance mytable> config -t mytable -s table.iterator.majc.vers.opt.maxVersions=3
-\end{verbatim}\endgroup
-
-When a table is created, by default its configured to use the
-VersioningIterator and keep one version. A table can be created without the
-VersioningIterator with the -ndi option in the shell. Also the Java API
-has the following method 
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-connector.tableOperations.create(String tableName, boolean limitVersion).
-\end{verbatim}\endgroup
-
-
-\subsubsection{Logical Time}
-
-Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
-set by Accumulo always move forward. This helps avoid problems caused by
-TabletServers that have different time settings. The per tablet counter gives unique
-one up time stamps on a per mutation basis. When using time in milliseconds, if
-two things arrive within the same millisecond then both receive the same
-timestamp. When using time in milliseconds, Accumulo set times will still
-always move forward and never backwards.
-
-A table can be configured to use logical timestamps at creation time as follows:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance> createtable -tl logical
-\end{verbatim}\endgroup
-
-\subsubsection{Deletes}
-Deletes are special keys in Accumulo that get sorted along will all the other data.
-When a delete key is inserted, Accumulo will not show anything that has a
-timestamp less than or equal to the delete key. During major compaction, any keys
-older than a delete key are omitted from the new file created, and the omitted keys
-are removed from disk as part of the regular garbage collection process.
-
-\subsection{Filters}
-When scanning over a set of key-value pairs it is possible to apply an arbitrary
-filtering policy through the use of a Filter. Filters are types of iterators that return
-only key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters
-that can be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be added
-by writing a Java class that extends the\\
-org.apache.accumulo.core.iterators.Filter class.
-
-The AgeOff filter can be configured to remove data older than a certain date or a fixed
-amount of time from the present. The following example sets a table to delete
-everything inserted over 30 seconds ago:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@myinstance> createtable filtertest
-user@myinstance filtertest> setiter -t filtertest -scan -minc -majc -p 10 -n myfilter -ageoff
-AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
-----------> set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter negate, default false
-                keeps k/v that pass accept method, true rejects k/v that pass accept method: 
-----------> set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter ttl, time to
-                live (milliseconds): 3000
-----------> set org.apache.accumulo.core.iterators.user.AgeOffFilter parameter currentTime, if set,
-                use the given value as the absolute time in milliseconds as the current time of day: 
-user@myinstance filtertest> 
-user@myinstance filtertest> scan
-user@myinstance filtertest> insert foo a b c
-user@myinstance filtertest> scan
-foo a:b [] c
-user@myinstance filtertest> sleep 4
-user@myinstance filtertest> scan
-user@myinstance filtertest>
-\end{verbatim}\endgroup
-
-To see the iterator settings for a table, use:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-user@example filtertest> config -t filtertest -f iterator
----------+---------------------------------------------+------------------
-SCOPE    | NAME                                        | VALUE
----------+---------------------------------------------+------------------
-table    | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
-table    | table.iterator.majc.myfilter.opt.ttl ...... | 3000
-table    | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
-table    | table.iterator.majc.vers.opt.maxVersions .. | 1
-table    | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
-table    | table.iterator.minc.myfilter.opt.ttl ...... | 3000
-table    | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
-table    | table.iterator.minc.vers.opt.maxVersions .. | 1
-table    | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
-table    | table.iterator.scan.myfilter.opt.ttl ...... | 3000
-table    | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
-table    | table.iterator.scan.vers.opt.maxVersions .. | 1
----------+---------------------------------------------+------------------
-\end{verbatim}\endgroup
-
-\subsection{Combiners}
-
-Accumulo allows Combiners to be configured on tables and column
-families. When a Combiner is set it is applied across the values
-associated with any keys that share rowID, column family, and column qualifier.
-This is similar to the reduce step in MapReduce, which applied some function to all
-the values associated with a particular key.
-
-For example, if a summing combiner were configured on a table and the following
-mutations were inserted:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-Row     Family Qualifier Timestamp  Value
-rowID1  colfA  colqA     20100101   1
-rowID1  colfA  colqA     20100102   1
-\end{verbatim}\endgroup
-
-The table would reflect only one aggregate value:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-rowID1  colfA  colqA     -          2
-\end{verbatim}\endgroup
-
-Combiners can be enabled for a table using the setiter command in the shell. Below is an example.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@a14 perDayCounts> setiter -t perDayCounts -p 10 -scan -minc -majc -n daycount 
-                       -class org.apache.accumulo.core.iterators.user.SummingCombiner
-TypedValueCombiner can interpret Values as a variety of number encodings 
-  (VLong, Long, or String) before combining
-----------> set SummingCombiner parameter columns, 
-            <col fam>[:<col qual>]{,<col fam>[:<col qual>]} : day
-----------> set SummingCombiner parameter type, <VARNUM|LONG|STRING>: STRING
-
-root@a14 perDayCounts> insert foo day 20080101 1
-root@a14 perDayCounts> insert foo day 20080101 1
-root@a14 perDayCounts> insert foo day 20080103 1
-root@a14 perDayCounts> insert bar day 20080101 1
-root@a14 perDayCounts> insert bar day 20080101 1
-
-root@a14 perDayCounts> scan
-bar day:20080101 []    2
-foo day:20080101 []    2
-foo day:20080103 []    1
-\end{verbatim}\endgroup
-
-Accumulo includes some useful Combiners out of the box. To find these look in
-the\\ \texttt{org.apache.accumulo.core.iterators.user} package.
-
-Additional Combiners can be added by creating a Java class that extends\\
-\texttt{org.apache.accumulo.core.iterators.Combiner} and adding a jar containing that
-class to Accumulo's lib/ext directory.
-
-An example of a Combiner can be found under
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-accumulo/examples/simple/src/main/java/org/apache/accumulo/examples/simple/combiner/StatsCombiner.java
-\end{verbatim}\endgroup
-
-
-\section{Block Cache}
-
-In order to increase throughput of commonly accessed entries, Accumulo employs a block cache.
-This block cache buffers data in memory so that it doesn't have to be read off of disk.
-The RFile format that Accumulo prefers is a mix of index blocks and data blocks, where the index blocks are used to find the appropriate data blocks.
-Typical queries to Accumulo result in a binary search over several index blocks followed by a linear scan of one or more data blocks.
-
-The block cache can be configured on a per-table basis, and all tablets hosted on a tablet server share a single resource pool.
-To configure the size of the tablet server's block cache, set the following properties:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-tserver.cache.data.size: Specifies the size of the cache for file data blocks.
-tserver.cache.index.size: Specifies the size of the cache for file indices.
-\end{verbatim}\endgroup
-
-To enable the block cache for your table, set the following properties:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-table.cache.block.enable: Determines whether file (data) block cache is enabled.
-table.cache.index.enable: Determines whether index cache is enabled.
-\end{verbatim}\endgroup
-
-The block cache can have a significant effect on alleviating hot spots, as well as reducing query latency.
-It is enabled by default for the metadata tables.
-
-\section{Compaction}
-
-As data is written to Accumulo it is buffered in memory. The data buffered in
-memory is eventually written to HDFS on a per tablet basis. Files can also be
-added to tablets directly by bulk import. In the background tablet servers run
-major compactions to merge multiple files into one. The tablet server has to
-decide which tablets to compact and which files within a tablet to compact.
-This decision is made using the compaction ratio, which is configurable on a
-per table basis. To configure this ratio modify the following property:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-table.compaction.major.ratio
-\end{verbatim}\endgroup
-
-Increasing this ratio will result in more files per tablet and less compaction
-work. More files per tablet means more higher query latency. So adjusting
-this ratio is a trade off between ingest and query performance. The ratio
-defaults to 3. 
-
-The way the ratio works is that a set of files is compacted into one file if the
-sum of the sizes of the files in the set is larger than the ratio multiplied by
-the size of the largest file in the set. If this is not true for the set of all
-files in a tablet, the largest file is removed from consideration, and the
-remaining files are considered for compaction. This is repeated until a
-compaction is triggered or there are no files left to consider.
-
-The number of background threads tablet servers use to run major compactions is
-configurable. To configure this modify the following property:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-tserver.compaction.major.concurrent.max
-\end{verbatim}\endgroup
-
-Also, the number of threads tablet servers use for minor compactions is
-configurable. To configure this modify the following property:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-tserver.compaction.minor.concurrent.max
-\end{verbatim}\endgroup
-
-The numbers of minor and major compactions running and queued is visible on the
-Accumulo monitor page. This allows you to see if compactions are backing up
-and adjustments to the above settings are needed. When adjusting the number of
-threads available for compactions, consider the number of cores and other tasks
-running on the nodes such as maps and reduces.
-
-If major compactions are not keeping up, then the number of files per tablet
-will grow to a point such that query performance starts to suffer. One way to
-handle this situation is to increase the compaction ratio. For example, if the
-compaction ratio were set to 1, then every new file added to a tablet by minor
-compaction would immediately queue the tablet for major compaction. So if a
-tablet has a 200M file and minor compaction writes a 1M file, then the major
-compaction will attempt to merge the 200M and 1M file. If the tablet server
-has lots of tablets trying to do this sort of thing, then major compactions
-will back up and the number of files per tablet will start to grow, assuming
-data is being continuously written. Increasing the compaction ratio will
-alleviate backups by lowering the amount of major compaction work that needs to
-be done.
-
-Another option to deal with the files per tablet growing too large is to adjust
-the following property:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-table.file.max  
-\end{verbatim}\endgroup
-
-When a tablet reaches this number of files and needs to flush its in-memory
-data to disk, it will choose to do a merging minor compaction. A merging minor
-compaction will merge the tablet's smallest file with the data in memory at
-minor compaction time. Therefore the number of files will not grow beyond this
-limit. This will make minor compactions take longer, which will cause ingest
-performance to decrease. This can cause ingest to slow down until major
-compactions have enough time to catch up. When adjusting this property, also
-consider adjusting the compaction ratio. Ideally, merging minor compactions
-never need to occur and major compactions will keep up. It is possible to
-configure the file max and compaction ratio such that only merging minor
-compactions occur and major compactions never occur. This should be avoided
-because doing only merging minor compactions causes $O(N^2)$ work to be done.
-The amount of work done by major compactions is $O(N*\log_R(N))$ where
-\textit{R} is the compaction ratio.
-
-Compactions can be initiated manually for a table. To initiate a minor
-compaction, use the flush command in the shell. To initiate a major compaction,
-use the compact command in the shell. The compact command will compact all
-tablets in a table to one file. Even tablets with one file are compacted. This
-is useful for the case where a major compaction filter is configured for a
-table. In 1.4 the ability to compact a range of a table was added. To use this
-feature specify start and stop rows for the compact command. This will only
-compact tablets that overlap the given row range.
-
-\section{Pre-splitting tables}
-
-Accumulo will balance and distribute tables across servers. Before a
-table gets large, it will be maintained as a single tablet on a single
-server. This limits the speed at which data can be added or queried
-to the speed of a single node. To improve performance when the a table
-is new, or small, you can add split points and generate new tablets.
-
-In the shell:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> createtable newTable
-root@myinstance> addsplits -t newTable g n t
-\end{verbatim}\endgroup
-
-This will create a new table with 4 tablets. The table will be split
-on the letters ``g'', ``n'', and ``t'' which will work nicely if the
-row data start with lower-case alphabetic characters. If your row
-data includes binary information or numeric information, or if the
-distribution of the row information is not flat, then you would pick
-different split points. Now ingest and query can proceed on 4 nodes
-which can improve performance.
-
-\section{Merging tablets}
-
-Over time, a table can get very large, so large that it has hundreds
-of thousands of split points. Once there are enough tablets to spread
-a table across the entire cluster, additional splits may not improve
-performance, and may create unnecessary bookkeeping. The distribution
-of data may change over time. For example, if row data contains date
-information, and data is continually added and removed to maintain a
-window of current information, tablets for older rows may be empty.
-
-Accumulo supports tablet merging, which can be used to reduce 
-the number of split points. The following command will merge all rows
-from ``A'' to ``Z'' into a single tablet:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> merge -t myTable -s A -e Z
-\end{verbatim}\endgroup
-
-If the result of a merge produces a tablet that is larger than the
-configured split size, the tablet may be split by the tablet server.
-Be sure to increase your tablet size prior to any merges if the goal
-is to have larger tablets:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> config -t myTable -s table.split.threshold=2G
-\end{verbatim}\endgroup
-
-In order to merge small tablets, you can ask Accumulo to merge
-sections of a table smaller than a given size.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> merge -t myTable -s 100M
-\end{verbatim}\endgroup
-
-By default, small tablets will not be merged into tablets that are
-already larger than the given size. This can leave isolated small
-tablets. To force small tablets to be merged into larger tablets use
-the ``-{}-force'' option:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> merge -t myTable -s 100M --force
-\end{verbatim}\endgroup
-
-Merging away small tablets works on one section at a time. If your
-table contains many sections of small split points, or you are
-attempting to change the split size of the entire table, it will be
-faster to set the split point and merge the entire table:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> config -t myTable -s table.split.threshold=256M
-root@myinstance> merge -t myTable
-\end{verbatim}\endgroup
-
-\section{Delete Range}
-
-Consider an indexing scheme that uses date information in each row.
-For example ``20110823-15:20:25.013'' might be a row that specifies a
-date and time. In some cases, we might like to delete rows based on
-this date, say to remove all the data older than the current year.
-Accumulo supports a delete range operation which efficiently
-removes data between two rows. For example:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> deleterange -t myTable -s 2010 -e 2011
-\end{verbatim}\endgroup
-
-This will delete all rows starting with ``2010'' and it will stop at
-any row starting ``2011''. You can delete any data prior to 2011
-with:
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@myinstance> deleterange -t myTable -e 2011 --force
-\end{verbatim}\endgroup
-
-The shell will not allow you to delete an unbounded range (no start)
-unless you provide the ``-{}-force'' option.
-
-Range deletion is implemented using splits at the given start/end
-positions, and will affect the number of splits in the table.
-
-\section{Cloning Tables}
-
-A new table can be created that points to an existing table's data. This is a
-very quick metadata operation, no data is actually copied. The cloned table
-and the source table can change independently after the clone operation. One
-use case for this feature is testing. For example to test a new filtering
-iterator, clone the table, add the filter to the clone, and force a major
-compaction. To perform a test on less data, clone a table and then use delete
-range to efficiently remove a lot of data from the clone. Another use case is
-generating a snapshot to guard against human error. To create a snapshot,
-clone a table and then disable write permissions on the clone.
-
-The clone operation will point to the source table's files. This is why the
-flush option is present and is enabled by default in the shell. If the flush
-option is not enabled, then any data the source table currently has in memory
-will not exist in the clone.
-
-A cloned table copies the configuration of the source table. However the
-permissions of the source table are not copied to the clone. After a clone is
-created, only the user that created the clone can read and write to it.
-
-In the following example we see that data inserted after the clone operation is
-not visible in the clone.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@a14> createtable people
-root@a14 people> insert 890435 name last Doe
-root@a14 people> insert 890435 name first John
-root@a14 people> clonetable people test  
-root@a14 people> insert 890436 name first Jane
-root@a14 people> insert 890436 name last Doe  
-root@a14 people> scan
-890435 name:first []    John
-890435 name:last []    Doe
-890436 name:first []    Jane
-890436 name:last []    Doe
-root@a14 people> table test
-root@a14 test> scan
-890435 name:first []    John
-890435 name:last []    Doe
-root@a14 test> 
-\end{verbatim}\endgroup
-
-The du command in the shell shows how much space a table is using in HDFS.
-This command can also show how much overlapping space two cloned tables have in
-HDFS. In the example below du shows table ci is using 428M. Then ci is cloned
-to cic and du shows that both tables share 428M. After three entries are
-inserted into cic and its flushed, du shows the two tables still share 428M but
-cic has 226 bytes to itself. Finally, table cic is compacted and then du shows
-that each table uses 428M.
-
-\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim}
-root@a14> du ci           
-             428,482,573 [ci]
-root@a14> clonetable ci cic
-root@a14> du ci cic
-             428,482,573 [ci, cic]
-root@a14> table cic
-root@a14 cic> insert r1 cf1 cq1 v1
-root@a14 cic> insert r1 cf1 cq2 v2
-root@a14 cic> insert r1 cf1 cq3 v3 
-root@a14 cic> flush -t cic -w 
-27 15:00:13,908 [shell.Shell] INFO : Flush of table cic completed.
-root@a14 cic> du ci cic       
-             428,482,573 [ci, cic]
-                     226 [cic]
-root@a14 cic> compact -t cic -w
-27 15:00:35,871 [shell.Shell] INFO : Compacting table ...
-27 15:03:03,303 [shell.Shell] INFO : Compaction of table cic completed for given range
-root@a14 cic> du ci cic        
-             428,482,573 [ci]
-             428,482,612 [cic]
-root@a14 cic> 
-\end{verbatim}\endgroup
-
-\section{Exporting Tables}
-
-Accumulo supports exporting tables for the purpose of copying tables to another
-cluster. Exporting and importing tables preserves the tables configuration,
-splits, and logical time. Tables are exported and then copied via the hadoop
-distcp command. To export a table, it must be offline and stay offline while
-discp runs. The reason it needs to stay offline is to prevent files from being
-deleted. A table can be cloned and the clone taken offline inorder to avoid
-losing access to the table. See docs/examples/README.export for an example.