You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2015/02/03 01:07:57 UTC
accumulo git commit: ACCUMULO-1515 Reorganized README and converted to markdown

Repository: accumulo
Updated Branches:
  refs/heads/master d7dcb8773 -> c479f874a


ACCUMULO-1515 Reorganized README and converted to markdown


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/c479f874
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/c479f874
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/c479f874

Branch: refs/heads/master
Commit: c479f874aa7cf9a842e021c3b719068c4c2abeef
Parents: d7dcb87
Author: Keith Turner <kt...@apache.org>
Authored: Mon Feb 2 19:01:33 2015 -0500
Committer: Keith Turner <kt...@apache.org>
Committed: Mon Feb 2 19:01:33 2015 -0500

----------------------------------------------------------------------
 INSTALL.md                                 | 160 ++++++++
 NOTICE                                     |  32 ++
 README                                     | 467 ------------------------
 README.md                                  | 102 ++++++
 TESTING                                    | 113 ------
 TESTING.md                                 | 114 ++++++
 assemble/src/main/assemblies/component.xml |   4 +-
 7 files changed, 411 insertions(+), 581 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/INSTALL.md
----------------------------------------------------------------------
diff --git a/INSTALL.md b/INSTALL.md
new file mode 100644
index 0000000..32f74ca
--- /dev/null
+++ b/INSTALL.md
@@ -0,0 +1,160 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+Installing Accumulo
+===================
+
+This document covers installing Accumulo on single and multi-node environments.
+Either [download][1] or [build][2] a binary distribution of Accumulo from
+source code.  Unpack as follows.
+
+    cd <install location>
+    tar xzf <some dir>/accumulo-X.Y.Z-bin.tar.gz
+    cd accumulo-X.Y.Z
+
+Accumulo has some optional native code that improves its performance and
+stability.  Before configuring Accumulo attempt to build this native code
+with the following command.
+
+    ./bin/build_native_library.sh
+
+If the command fails, its ok to continue with setup and resolve the issue
+later.
+
+
+Configuring
+-----------
+
+The Accumulo conf directory needs to be populated with initial config files.
+The following script is provided to assist with this.  Run the script and
+answer the questions.  When the script ask about memory-map type, choose Native
+if the build native script was successful.  Otherwise choose Java.
+
+    ./bin/bootstrap_config.sh
+
+The script will prompt for memory usage.   Please note that the footprints are
+only for the Accumulo system processes, so ample space should be left for other
+processes like hadoop, zookeeper, and the accumulo client code.  If Accumulo
+worker processes are swapped out and unresponsive, they may be killed.
+
+After this script runs, the conf directory should be populated and now a few
+edits are needed.
+
+### Secret
+
+Accumulo coordination and worker processes can only communicate with each other
+if they share the same secret key.  To change the secret key set
+`instance.secret` in `conf/accumulo-site.xml`.  Changing this secret key from
+the default is highly recommended.
+
+### Dependencies
+
+Accumulo requires running [Zookeeper][3] and [HDFS][4] instances.  Also, the
+Accumulo binary distribution does not include jars for Zookeeper and Hadoop.
+When configuring Accumulo the following information about these dependencies
+must be provided.
+
+ * **Location of Zookeepers** :  Provide this by setting `instance.zookeeper.host`
+   in `conf/accumulo-site.xml`.
+ * **Where to store data** :  Provide this by setting `instance.volumes` in
+   `conf/accumulo-site.xml`.  If your namenode is running at 192.168.1.9:9000
+   and you want to store data in `/accumulo` in HDFS, then set
+  `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
+ * **Location of Zoookeeper and Hadoop jars** :  Setting `ZOOKEEPER_HOME` and
+   `HADOOP_PREFIX` in `conf/accumulo-env.sh` will help Accumulo find these
+   jars.
+
+If Accumulo has problems later on finding jars, then run `bin/accumulo
+classpath` to print out info about where Accumulo is finding jars.  If the
+settings mentioned above are correct, then inspect `general.classpaths` in
+`conf/accumulo-site.xml`.
+
+Initialization
+--------------
+
+Accumulo needs to initialize the locations where it stores data in Zookeeper
+and HDFS.  The following command will do this.
+
+    ./bin/accumulo init
+
+The initialization command will prompt for the following information.
+
+ * **Instance name** : This is the name of the Accumulo instance and its
+   Accumulo clients need to know it inorder to connect.
+ * **Root password** : Initialization sets up an initial Accumulo root user and
+   prompts for its password.  This information will be needed to later connect
+   to Accumulo.
+
+Multiple Nodes
+--------------
+
+Skip this section if running Accumulo on a single node.  Accumulo has
+coordinating, monitoring, and worker processes that run on specified nodes in
+the cluster.  The following files should be populated with a newline separated
+list of node names.  Must change from localhost.
+
+ * `conf/masters` : Accumulo primary coordinating process.  Must specify one
+                    node.  Can specify a few for fault tolerance.
+ * `conf/gc`      : Accumulo garbage collector.  Must specify one node.  Can
+                    specify a few for fault tolerance.
+ * `conf/monitor` : Node where Accumulo monitoring web server is run.
+ * `conf/slaves`  : Accumulo worker processes.   List all of the nodes where
+                    tablet servers should run in this file.
+ * `conf/tracers` : Optional capability. Can specify zero or more nodes. 
+
+The Accumulo, Hadoop, and Zookeeper software should be present at the same
+location on every node.  Also the files in the `conf` directory must be copied
+to every node.  There are many ways to replicate the software and
+configuration, two possible tools that can help replicate software and/or
+config are [pdcp][5] and [prsync][6].
+
+Starting Accumulo
+-----------------
+
+The Accumulo scripts use ssh to start processes on remote nodes.  Before
+attempting to start Accumulo, [passwordless ssh][7] must be setup on the
+cluster.
+
+After configuring and initializing Accumulo, use the following command to start
+it.
+
+    ./bin/start-all.sh
+
+First steps
+-----------
+
+Once the `start-all.sh` script completes, use the following command to run the
+Accumulo shell.
+
+    ./bin/accumulo shell -u root
+
+Use your web browser to connect the Accumulo monitor page on port 50095.
+
+    http://<hostname in conf/monitor>:50095/
+
+When finished, use the following command to stop Accumulo.
+
+    ./bin/stop-all.sh
+
+[1]: http://accumulo.apache.org/
+[2]: README.md#building-
+[3]: http://zookeeper.apache.org/
+[4]: http://http://hadoop.apache.org/
+[5]: https://code.google.com/p/pdsh/
+[6]: https://code.google.com/p/parallel-ssh/
+[7]: https://www.google.com/search?q=hadoop+passwordless+ssh&ie=utf-8&oe=utf-8
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/NOTICE
----------------------------------------------------------------------
diff --git a/NOTICE b/NOTICE
index af212c2..72eb8c1 100644
--- a/NOTICE
+++ b/NOTICE
@@ -6,3 +6,35 @@ The Apache Software Foundation (http://www.apache.org/).
 
 This product includes JCommander (https://github.com/cbeust/jcommander),
 Copyright 2010 Cedric Beust cedric@beust.com.
+
+******************************************************************************
+
+Export Control  
+
+This distribution includes cryptographic software. The country in which you 
+currently reside may have restrictions on the import, possession, use, and/or
+re-export to another country, of encryption software. BEFORE using any 
+encryption software, please check your country's laws, regulations and 
+policies concerning the import, possession, or use, and re-export of encryption
+software, to see if this is permitted. See <http://www.wassenaar.org/> for more
+information.
+
+The U.S. Government Department of Commerce, Bureau of Industry and Security 
+(BIS), has classified this software as Export Commodity Control Number (ECCN) 
+5D002.C.1, which includes information security software using or performing 
+cryptographic functions with asymmetric algorithms. The form and manner of this
+Apache Software Foundation distribution makes it eligible for export under the 
+License Exception ENC Technology Software Unrestricted (TSU) exception (see the
+BIS Export Administration Regulations, Section 740.13) for both object code and
+source code.
+
+The following provides more details on the included cryptographic software: ...
+
+Apache Accumulo uses the built-in java cryptography libraries in it's RFile 
+encryption implementation. See 
+http://www.oracle.com/us/products/export/export-regulations-345813.html
+for more details for on Java's cryptography features. Apache Accumulo also uses
+the bouncycastle library for some crypographic technology as well. See 
+http://www.bouncycastle.org/wiki/display/JA1/Frequently+Asked+Questions for
+more details on bouncycastle's cryptography features.
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/README
----------------------------------------------------------------------
diff --git a/README b/README
deleted file mode 100644
index 4ebb078..0000000
--- a/README
+++ /dev/null
@@ -1,467 +0,0 @@
-Title: Apache Accumulo
-Notice:    Licensed to the Apache Software Foundation (ASF) under one
-           or more contributor license agreements.  See the NOTICE file
-           distributed with this work for additional information
-           regarding copyright ownership.  The ASF licenses this file
-           to you under the Apache License, Version 2.0 (the
-           "License"); you may not use this file except in compliance
-           with the License.  You may obtain a copy of the License at
-           .
-             http://www.apache.org/licenses/LICENSE-2.0
-           .
-           Unless required by applicable law or agreed to in writing,
-           software distributed under the License is distributed on an
-           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-           KIND, either express or implied.  See the License for the
-           specific language governing permissions and limitations
-           under the License.
-
-******************************************************************************
-0. Introduction
-
-Apache Accumulo is a sorted, distributed key/value store based on Google's 
-BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It 
-features a few novel improvements on the BigTable design in the form of 
-cell-level access labels and a server-side programming mechanism that can modify
-key/value pairs at various points in the data management process.
-
-******************************************************************************
-1. Building
-
-In the normal tarball release of accumulo, everything is built and
-ready to go on x86 GNU/Linux: there is no build step.
-
-However, if you only have source code, or you wish to make changes, you need to
-have maven configured to get Accumulo prerequisites from repositories.  See
-the pom.xml file for the necessary components. 
-
-You can build an Accumulo binary distribution, which is created in the 
-assemble/target directory, using the following command. Note that maven 3
-is required starting with Accumulo v1.5.0. By default, Accumulo compiles
-against Apache Hadoop 2.2.0, but these artifacts should be compatible with newer
-releases of Hadoop which are backwards-compatible with Hadoop 2.2.0.
-
-  mvn package -P assemble
-
-By default, Accumulo compiles against Apache Hadoop 2.2.0. To compile against 
-a  different Hadoop 2-compatible version, specify the profile and version,
-e.g. "-Dhadoop.version=0.23.5".
-
-Support for Apache Hadoop 1.x versions has been dropped. To support these
-versions of Hadoop, use an older version of Accumulo.
-
-If you are running on another Unix-like operating system (OSX, etc) then
-you may wish to build the native libraries.  They are not strictly necessary
-but having them available suppresses a runtime warning and enables Accumulo
-to run faster. You can execute the following script to automatically unpack
-and install the native library. Be sure to have a JDK and a C++ compiler 
-installed with the JAVA_HOME environment variable set.
-
-  $ ./bin/build_native_library.sh
-
-If your system's default compiler options are insufficient, you can add
-additional compiler options to the command line, such as options for the
-architecture. These will be passed to the Makefile in the environment variable
-USERFLAGS:
-
-  $ ./bin/build_native_library.sh -m32
-
-Alternatively, you can manually unpack the accumulo-native tarball in the 
-$ACCUMULO_HOME/lib directory. Change to the accumulo-native directory in 
-the current directory and issue `make`. Then, copy the resulting 'libaccumulo' 
-library into the $ACCUMULO_HOME/lib/native/map.
-
-  $ mkdir -p $ACCUMULO_HOME/lib/native/map
-  $ cp libaccumulo.* $ACCUMULO_HOME/lib/native/map
-
-
-Building Documentation
-
-Use the following command to build the User Manual (docs/target/generated-docs/accumulo_user_manual.html)
-and the configuration HTML page (docs/target/config.html)
-
-  mvn package -P docs -DskipTests
-
-******************************************************************************
-2. Deployment
-
-Copy the accumulo tar file produced by mvn package from the assemble/target/
-directory to the desired destination, then untar it (e.g. 
-tar xzf accumulo-1.6.0-bin.tar.gz).
-
-Another option is to package Accumulo directly to a working directory. For example,
-
-  mvn package -DskipTests -DDEV_ACCUMULO_HOME=/var/tmp
-
-The above command would create a directory with a name similar to
-/var/tmp/accumulo-1.6.0-dev/accumulo-1.6.0/, containing all the contents
-that are normally contained in accumulo-1.6.0-bin.tar.gz, but already unpacked.
-If the DEV_ACCUMULO_HOME parameter is not specified, this directory would
-normally be created in assemble/target, but that is subject to deletion by
-the 'mvn clean' command. Specifying an external directory would not be subject
-to 'mvn clean'. When executed more than once, newer files overwrite older files,
-and files a user adds (such as configuration files in conf/) will be left alone.
-
-If HDFS and Zookeeper are running, you can run Accumulo directly from this
-working directory. See the 'Running Apache Accumulo' section later in this document.
-
-You can avoid specifying the working directory each time you compile by adding
-a profile to maven's settings.xml file. Below is an example of $HOME/.m2/settings.xml
-
- <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
-   <profiles>
-     <profile>
-       <id>inject-accumulo-home</id>
-       <properties>
-         <DEV_ACCUMULO_HOME>/var/tmp</DEV_ACCUMULO_HOME>
-       </properties>
-     </profile>
-   </profiles>
-   <activeProfiles>
-     <activeProfile>inject-accumulo-home</activeProfile>
-   </activeProfiles>
- </settings>
-
-******************************************************************************
-3. Upgrading
-
-3.1. From 1.5 to 1.6
-
- This happens automatically the first time Accumulo 1.6 is started.
-
- If your instance previously upgraded from 1.4 to 1.5, you must verify that your
- 1.5 instance has no outstanding local write ahead logs. You can do this by ensuring
- either:
-
-  - All of your tables are online and the Monitor shows all tablets hosted
-  - The directory for write ahead logs (logger.dir.walog) from 1.4 has no files remaining
-    on any tablet server / logger hosts
-
- To upgrade from 1.5 to 1.6 you must:
-
-  * Verify that there are no outstanding FATE operations
-    - Under 1.5 you can list what's in FATE by running
-      $ACCUMULO_HOME/bin/accumulo org.apache.accumulo.server.fate.Admin print
-    - Note that operations in any state will prevent an upgrade. It is safe
-      to delete operations with status SUCCESSFUL. For others, you should restart
-      your 1.5 cluster and allow them to finish.
-  * Stop the 1.5 instance.
-  * Configure 1.6 to use the hdfs directory and zookeepers that 1.5 was using.
-  * Copy other 1.5 configuration options as needed.
-  * Start Accumulo 1.6.
-
-  The upgrade process must make changes to Accumulo's internal state in both ZooKeeper and
-  the table metadata. This process may take some time if Tablet Servers have to go through
-  recovery. During this time, the Monitor will claim that the Master is down and some
-  services may send the Monitor log messages about failure to communicate with each other.
-  These messages are safe to ignore. If you need detail on the upgrade's progress you should
-  view the local logs on the Tablet Servers and active Master.
-
-3.2. From 1.4 to 1.6
-
- To upgrade from 1.4 to 1.6 you must perform a manual initial step.
-
- Prior to upgrading you must:
-   * Verify that there are no outstanding FATE operations
-     - Under 1.4 you can list what's in FATE by running
-       $ACCUMULO_HOME/bin/accumulo org.apache.accumulo.server.fate.Admin print
-     - Note that operations in any state will prevent an upgrade. It is safe
-       to delete operations with status SUCCESSFUL. For others, you should restart
-       your 1.4 cluster and allow them to finish.
-   * Stop the 1.4 instance.
-   * Configure 1.6 to use the hdfs directory, walog directories, and zookeepers
-     that 1.4 was using.
-   * Copy other 1.4 configuration options as needed.
-
-  Prior to starting the 1.6 instance you will need to run the LocalWALRecovery tool
-  on each node that previously ran an instance of the Logger role.
-
-    $ACCUMULO_HOME/bin/accumulo org.apache.accumulo.tserver.log.LocalWALRecovery
-
-  The recovery tool will rewrite the 1.4 write ahead logs into a format that 1.6 can read.
-  After this step has completed on all nodes, start the 1.6 cluster to continue the upgrade.
-
-  The upgrade process must make changes to Accumulo's internal state in both ZooKeeper and
-  the table metadata. This process may take some time if Tablet Servers have to go through
-  recovery. During this time, the Monitor will claim that the Master is down and some
-  services may send the Monitor log messages about failure to communicate with each other.
-  While the upgrade is in progress, the Garbage Collector may complain about invalid paths.
-  The Master may also complain about failure to create the trace table because it already
-  exists. These messages are safe to ignore. If other error messages occur, you should seek
-  out support before continuing to use Accumulo. If you need detail on the upgrade's progress
-  you should view the local logs on the Tablet Servers and active Master.
-
-  Note that the LocalWALRecovery tool does not delete the local files. Once you confirm that
-  1.6 is successfully running, you should delete these files on the local filesystem.
-
-******************************************************************************
-4. Configuring
-
-Apache Accumulo has two prerequisites, hadoop and zookeeper. Zookeeper 3.4.* series is
-preferred. Both zookeeper and hadoop must be installed and configured. Some versions of
-Zookeeper may only allow 10 connections from one computer by default. On a single-host
-install, this number is a little too low. Add the following to
-the $ZOOKEEPER_HOME/conf/zoo.cfg file:
-
-   maxClientCnxns=100
-
-Ensure you (or the some special hadoop user account) have accounts on all of
-the machines in the cluster and that hadoop and accumulo install files can be
-found in the same location on every machine in the cluster.  You will need to
-have password-less ssh set up as described in the hadoop documentation. 
-
-You will need to have hadoop installed and configured on your system.  Accumulo
-1.6.0 has been tested with hadoop version 1.0.4.  To avoid data loss,
-you must enable HDFS durable sync.  How you enable this depends on your version
-of Hadoop. Please consult the table below for information regarding your version.
-If you need to set the coniguration, please be sure to restart HDFS. See 
-ACCUMULO-623 and ACCUMULO-1637 for more information.
-
-The following releases of Apache Hadoop require special configuration to ensure 
-that data is not inadvertently lost; however, in all releases of Apache Hadoop, 
-`dfs.durable.sync` and `dfs.support.append` should *not* be configured as `false`.
-
-VERSION        NAME=VALUE
-0.20.205.0  -  dfs.support.append=true
-1.0.x       -  dfs.support.append=true
-
-Additionally, it is strongly recommended that you enable 'dfs.datanode.synconclose'
-(only available in Apache Hadoop >=1.1.1 or >=0.23) in your hdfs-site.xml configuration 
-file to ensure that, in the face of unexpected power loss to a datanode, files are 
-wholly synced to disk.
-
-Accumulo's own configuration files can be bootstrapped with the
-$ACCUMULO_HOME/bin/bootstrap_config.sh script. This script will allow you to
-select options which correspond closely to your particular environment. The
-configuration files produced by this script are examples. You should always
-inspect any configuration files you use to ensure they are appropriate for your
-environment, and to tailor them to your needs, as they are not guaranteed to be
-suitable for all users and all environments.
-
-Some example accumulo configuration files are placed in directories based on the
-memory footprint for the accumulo processes. These are pre-generated from
-particular selections from the bootstrap_config.sh script for your convenience.
-If you are using native libraries for you tablet server in-memory map, then you
-can use the files in "native-standalone".  If you get warnings about not being
-able to load the native libraries, you can use the configuration files in
-"standalone".
-
-For testing on a single computer, use a fairly small configuration:
-
-  $ cp conf/examples/512MB/native-standalone/* conf
-
-Please note that the footprints are for only the Accumulo system processes, so 
-ample space should be left for other processes like hadoop, zookeeper, and the 
-accumulo client code.  These directories must be at the same location on every 
-node in the cluster.
-
-If you are configuring a larger cluster you will need to create the configuration
-files yourself and propogate the changes to the $ACCUMULO_CONF_DIR directories:
-
-   Create a "slaves" file in $ACCUMULO_CONF_DIR/.  This is a list of machines
-   where tablet servers and loggers will run.
-
-   Create a "masters" file in $ACCUMULO_CONF_DIR/.  This is a list of
-   machines where the master server will run. 
-
-   Create conf/accumulo-env.sh following the template of
-   example/3GB/native-standalone/accumulo-env.sh.  
-
-However you create your configuration files, you will need to set 
-JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME in conf/accumulo-env.sh
-
-Note that zookeeper client jar files must be installed on every machine, but 
-the server should not be run on every machine.
-
-Create the $ACCUMULO_LOG_DIR on every machine in the slaves file.
-
-* Note that you will be specifying the Java heap space in accumulo-env.sh.  
-You should make sure that the total heap space used for the accumulo tserver,
-logger and the hadoop datanode and tasktracker is less than the available
-memory on each slave node in the cluster.  On large clusters, it is recommended
-that the accumulo master, hadoop namenode, secondary namenode, and hadoop
-jobtracker all be run on separate machines to allow them to use more heap
-space.  If you are running these on the same machine on a small cluster, make
-sure their heap space settings fit within the available memory.  The zookeeper
-instances are also time sensitive and should be on machines that will not be
-heavily loaded, or over-subscribed for memory.
-
-Edit conf/accumulo-site.xml.  You must set the zookeeper servers in this
-file (instance.zookeeper.host).  Look at the "Configuration Management" section
-of the user manual to see what additional variables you can modify and what
-the defaults are.
-
-It is advisable to change the instance secret (instance.secret) to some new
-value.  Also ensure that the accumulo-site.xml file is not readable by other
-users on the machine.
-
-Synchronize your accumulo conf directory across the cluster.  As a precaution
-against mis-configured systems, servers using different configuration files
-will not communicate with the rest of the cluster.
-
-Accumulo requires the hadoop "commons-io" java package.  This is normally
-distributed with hadoop.  However, it was not distributed with hadoop-0.20.
-If your hadoop distribution does not provide this package, you will need
-to obtain it and put the commons-io jar file in $ACCUMULO_HOME/lib. See the
-pom.xml file for version information.
-
-******************************************************************************
-5. Running Apache Accumulo
-
-Make sure hadoop is configured on all of the machines in the cluster, including
-access to a shared hdfs instance.  Make sure hdfs is running.
-
-Make sure zookeeper is configured and running on at least one machine in the
-cluster.
-
-Run "bin/accumulo init" to create the hdfs directory structure
-(hdfs:///accumulo/*) and initial zookeeper settings. This will also allow you
-to also configure the initial root password. Only do this once. 
-
-Start accumulo using the bin/start-all.sh script.
-
-Use the "bin/accumulo shell -u <username>" command to run an accumulo shell
-interpreter.  Within this interpreter, run "createtable <tablename>" to create
-a table, and run "table <tablename>" followed by "scan" to scan a table.
-
-In the example below a table is created, data is inserted, and the table is
-scanned.
-
-    $ ./bin/accumulo shell -u root
-    Enter current password for 'root'@'accumulo': ******
-
-    Shell - Apache Accumulo Interactive Shell
-    - 
-    - version: 1.5.0
-    - instance name: accumulo
-    - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f
-    - 
-    - type 'help' for a list of available commands
-    - 
-    root@ac> createtable foo
-    root@ac foo> insert row1 colf1 colq1 val1
-    root@ac foo> insert row1 colf1 colq2 val2
-    root@ac foo> scan
-    row1 colf1:colq1 []    val1
-    row1 colf1:colq2 []    val2
-
-The example below start the shell, switches to table foo, and scans for a
-certain column.
-
-    $ ./bin/accumulo shell -u root
-    Enter current password for 'root'@'accumulo': ******
-
-    Shell - Apache Accumulo Interactive Shell
-    - 
-    - version: 1.5.0
-    - instance name: accumulo
-    - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f
-    - 
-    - type 'help' for a list of available commands
-    - 
-    root@ac> table foo
-    root@ac foo> scan -c colf1:colq2
-    row1 colf1:colq2 []    val2
-
-
-For information on how to configure Accumulo for on top of Secure HDFS with
-Kerberos, please consult the Accumulo user manual section specifically devoted
-to client and server configuration with Kerberos.
-
-******************************************************************************
-6. Monitoring Apache Accumulo
-
-You can point your browser to the master host, on port 50095 to see the status
-of accumulo across the cluster.  You can even do this with the text-based
-browser "links":
-
- $ links http://localhost:50095
-
-From this GUI, you can ensure that tablets are assigned, tables are online,
-tablet servers are up. You can monitor query and ingest rates across the
-cluster.
-
-******************************************************************************
-7. Stopping Apache Accumulo
-
-Do not kill the tabletservers or run bin/tdown.sh unless absolutely necessary.
-Recovery from a catastrophic loss of servers can take a long time. To shutdown
-cleanly, run "bin/stop-all.sh" and the master will orchestrate the shutdown of
-all the tablet servers.  Shutdown waits for all writes to finish, so it may
-take some time for particular configurations.  
-
-******************************************************************************
-8. Logging
-
-DEBUG and above are logged to the logs/ dir.  To modify this behavior change
-the scripts in conf/.  To change the logging dir, set ACCUMULO_LOG_DIR in
-conf/accumulo-env.sh.  Stdout and stderr of each accumulo process is
-redirected to the log dir.
-
-******************************************************************************
-9. API
-
-The public Accumulo API is composed of :
-
- * All public classes and interfaces in the org.apache.accumulo.core.client
-   package, as as well as all of its subpackages excluding those named "impl".
- * Key, Mutation, Value, Range, Condition, and ConditionalMutation in
-   org.apache.accumulo.core.data.
- * All public classes and interfaces in the org.apache.accumulo.minicluster
-   package, as well as all of its subpackages excluding those named "impl".
- * Anything with public or protected acccess within any Class or Interface that
-   is in the public API. This includes, but is not limited to: methods, members
-   classes, interfaces, and enums.
-
-The Accumulo project maintains binary compatibility across this API within a major
-release, as defined in the Java Language Specification 3rd ed. API changes should
-only be made on major releases, with continued support of deprecated API elements
-for at least one major revision.
-
-To get started using accumulo review the example and the javadoc for the
-packages and classes mentioned above.
-
-******************************************************************************
-10. Performance Tuning
-
-Apache Accumulo has exposed several configuration properties that can be 
-changed.  These properties and configuration management are described in detail 
-in the user manual. While the default value is usually optimal, there are
-cases where a change can increase query and ingest performance.
-
-Before changing a property from its default in a production system, you should 
-develop a good understanding of the property and consider creating a test to 
-prove the increased performance.
-
-******************************************************************************
-
-11. Export Control  
-
-This distribution includes cryptographic software. The country in which you 
-currently reside may have restrictions on the import, possession, use, and/or
-re-export to another country, of encryption software. BEFORE using any 
-encryption software, please check your country's laws, regulations and 
-policies concerning the import, possession, or use, and re-export of encryption
-software, to see if this is permitted. See <http://www.wassenaar.org/> for more
-information.
-
-The U.S. Government Department of Commerce, Bureau of Industry and Security 
-(BIS), has classified this software as Export Commodity Control Number (ECCN) 
-5D002.C.1, which includes information security software using or performing 
-cryptographic functions with asymmetric algorithms. The form and manner of this
-Apache Software Foundation distribution makes it eligible for export under the 
-License Exception ENC Technology Software Unrestricted (TSU) exception (see the
-BIS Export Administration Regulations, Section 740.13) for both object code and
-source code.
-
-The following provides more details on the included cryptographic software: ...
-
-Apache Accumulo uses the built-in java cryptography libraries in it's RFile 
-encryption implementation. See 
-http://www.oracle.com/us/products/export/export-regulations-345813.html
-for more details for on Java's cryptography features. Apache Accumulo also uses
-the bouncycastle library for some crypographic technology as well. See 
-http://www.bouncycastle.org/wiki/display/JA1/Frequently+Asked+Questions for
-more details on bouncycastle's cryptography features.
-
-******************************************************************************

http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..15be8db
--- /dev/null
+++ b/README.md
@@ -0,0 +1,102 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+Apache Accumulo
+===============
+
+The [Apache Accumulo™][1] sorted, distributed key/value store is a robust,
+scalable, high performance data storage and retrieval system.  Apache Accumulo
+is based on Google's [BigTable][4] design and is built on top of Apache
+[Hadoop][5], [Zookeeper][6], and [Thrift][7]. Apache Accumulo features a few
+novel improvements on the BigTable design in the form of cell-based access
+control and a server-side programming mechanism that can modify key/value pairs
+at various points in the data management process. Other notable improvements
+and feature are outlined [here][8].
+
+To install and run an Accumulo binary distribution, follow the [install][2]
+instructions.
+  
+Documentation
+-------------
+
+Accumulo provides the following documentation :
+
+ * **User Manual** : In-depth developer and administrator documentation.
+ * **Examples** : Code with corresponding readme files that give step by step
+                  instructions for running example code.
+
+This documentation is available on the [Accumulo site][1].  In the source and
+binary distributions of Accumulo, the documentation is at different locations.
+
+In the Accumulo binary distribution, all documentation is in the `docs`
+directory.  The binary distribution does not include example source code, but
+it does include a jar with the compiled examples.   This examples jar makes it
+easy to step through the example readmes, after following the [install][2]
+instructions.
+
+In the Accumulo source, documentations is found at the following locations.
+
+ * [Example Source](examples/simple/src/main/java/org/apache/accumulo/examples/simple)
+ * [Example Readmes](docs/src/main/resources/examples)
+ * [User Manual Source](docs/src/main/asciidoc)
+
+Building 
+--------
+
+Accumulo uses [Maven][9] to compile, [test][3], and package its source.  The
+following command will build the binary tar.gz from source.  Note, these
+instructions will not work for the Accumulo binary distribution as it does not
+include source.
+
+    mvn package -P assemble
+
+This command produces a file at the following location.
+
+    assemble/target/accumulo-X.Y.Z-SNAPSHOT-bin.tar.gz
+
+This will not include documentation, adding the `-P docs` option to the maven
+command will build documentation.
+
+API
+---
+
+The public Accumulo API is composed of :
+
+ * All public classes and interfaces in the org.apache.accumulo.core.client
+   package, as as well as all of its subpackages excluding those named *impl*.
+ * Key, Mutation, Value, Range, Condition, and ConditionalMutation in
+   org.apache.accumulo.core.data.
+ * All public classes and interfaces in the org.apache.accumulo.minicluster
+   package, as well as all of its subpackages excluding those named *impl*.
+ * Anything with public or protected acccess within any Class or Interface that
+   is in the public API. This includes, but is not limited to: methods, members
+   classes, interfaces, and enums.
+
+The Accumulo project maintains binary compatibility across this API within a
+major release, as defined in the Java Language Specification 3rd ed. Starting
+with Accumulo 1.6.2 and 1.7.0 all API changes will follow [semver 2.0][12]
+
+[1]: http://accumulo.apache.org
+[2]: INSTALL.md
+[3]: TESTING.md
+[4]: http://research.google.com/archive/bigtable.html
+[5]: http://hadoop.apache.org
+[6]: http://zookeeper.apache.org
+[7]: http://thrift.apache.org/
+[8]: http://accumulo.apache.org/notable_features.html
+[9]: http://maven.apache.org/
+[12]: http://semver.org/spec/v2.0.0.html

http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/TESTING
----------------------------------------------------------------------
diff --git a/TESTING b/TESTING
deleted file mode 100644
index cf2afba..0000000
--- a/TESTING
+++ /dev/null
@@ -1,113 +0,0 @@
-Title: Testing Apache Accumulo
-Notice:    Licensed to the Apache Software Foundation (ASF) under one
-           or more contributor license agreements.  See the NOTICE file
-           distributed with this work for additional information
-           regarding copyright ownership.  The ASF licenses this file
-           to you under the Apache License, Version 2.0 (the
-           "License"); you may not use this file except in compliance
-           with the License.  You may obtain a copy of the License at
-           .
-             http://www.apache.org/licenses/LICENSE-2.0
-           .
-           Unless required by applicable law or agreed to in writing,
-           software distributed under the License is distributed on an
-           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-           KIND, either express or implied.  See the License for the
-           specific language governing permissions and limitations
-           under the License.
-
-# Testing Apache Accumulo
-
-This document is meant to serve as a quick reference to the automated test suites included in Apache Accumulo for users
-to run which validate the product and developers to continue to iterate upon to ensure that the product is stable and as
-free of bugs as possible.
-
-The automated testing suite can be categorized as two sets of tests: unit tests and integration tests. These are the
-traditional unit and integrations tests as defined by the Apache Maven lifecycle phases (unit tests run at `test` and
-integration tests run at `integration-test`).
-
-# Unit tests
-
-Unit tests can be run by invoking `mvn test` at the top of the Apache Accumulo source tree; however, it is more often
-the case that these tests are automatically run by invoking `mvn package` instead.  Either invocation should work
-successfully.
-
-The unit tests should run rather quickly (order of minutes for the entire project) and, in nearly all cases, do not
-require any noticable amount of computer resources (the compilation of the files typically exceeds running the tests).
-Maven will automatically generate a report for each unit test run and will give a summary at the end of each Maven
-module for the total run/failed/errored/skipped tests.
-
-The Apache Accumulo developers expect that these tests are always passing on every revision of the code. If this is not
-the case, it is almost certainly in error.
-
-# Integration tests
-
-Integration tests can be run by invoking `mvn integration-test` at the top of the Apache Accumulo source tree; however,
-like `mvn package` being recommended for unit tests, `mvn verify` is often the recommended avenue to run the integration tests.
-
-The integration tests are medium length tests (order minutes for each test class and order hours for the complete suite
-with single threaded execution) but are very encompassing of checking for regressions that were previously seen in the
-codebase. These tests do require a noticable amount of resources, at least another gigabyte of memory over what Maven
-itself requires. As such, it's recommended to have at least 3-4GB of free memory and 10GB of free disk space.
-
-Take note that when invoking the `integration-test` lifecycle phase, other functions will also be enabled which include
-static analysis (findbugs) and software license checks (release analysis tool -- RAT).
-
-## Accumulo for testing
-
-The primary reason these tests take so much longer than the unit tests is that most are using an Accumulo instance to
-perform the test. It's a necessary evil; however, there are things we can do to improve this.
-
-## MiniAccumuloCluster
-
-By default, these tests will use a MiniAccumuloCluster which is a multi-process "implementation" of Accumulo, managed
-through Java interfaces. This MiniAccumuloCluster has the ability to use the local filesystem or Apache Hadoop's
-MiniDFSCluster, as well as starting one to many tablet servers. MiniAccumuloCluster tends to be a very useful tool in
-that it can automatically provide a workable instance that mimics how an actual deployment functions.
-
-The downside of using MiniAccumuloCluster is that a significant portion of each test is now devoted to starting and
-stopping the MiniAccumuloCluster.  While this is a surefire way to isolate tests from interferring with one another, it
-increases the actual runtime of the test by, on average, 10x.
-
-## Standalone Cluster
-
-An alternative to the MiniAccumuloCluster for testing, a standalone Accumulo cluster can also be configured for use by
-most tests. This requires a manual step of building and deploying the Accumulo cluster by hand. The build can then be
-configured to use this cluster instead of always starting a MiniAccumuloCluster.  Not all of the integration tests are
-good candidates to run against a standalone Accumulo cluster, these tests will still launch a MiniAccumuloCluster for
-their use.
-
-Use of a standalone cluster can be enabled using system properties on the Maven command line or, more concisely, by
-providing a Java properties file on the Maven command line. The use of a properties file is recommended since it is
-typically a fixed file per standalone cluster you want to run the tests against.
-
-### Configuration
-
-The following properties can be used to configure a standalone cluster:
-
-- `accumulo.it.cluster.type`, Required: The type of cluster is being defined (valid options: MINI and STANDALONE)
-- `accumulo.it.cluster.standalone.principal`, Required: Standalone cluster principal (user)
-- `accumulo.it.cluster.standalone.password`, Required: Password for the principal
-- `accumulo.it.cluster.standalone.zookeepers`, Required: ZooKeeper quorum used by the standalone cluster
-- `accumulo.it.cluster.standalone.instance.name`, Required: Accumulo instance name for the cluster
-- `accumulo.it.cluster.standalone.home`, Optional: `ACCUMULO_HOME`
-- `accumulo.it.cluster.standalone.conf`, Optional: `ACCUMULO_CONF_DIR`
-- `accumulo.it.cluster.standalone.hadoop.conf`, Optional: `HADOOP_CONF_DIR`
-
-Each of the above properties can be set on the commandline (-Daccumulo.it.cluster.standalone.principal=root), or the
-collection can be placed into a properties file and referenced using "accumulo.it.cluster.properties".  For example, the
-following might be similar to what is executed for a standalone cluster.
-
-  `mvn verify -Daccumulo.it.properties=/home/user/my_cluster.properties`
-
-For the optional properties, each of them will be extracted from the environment if not explicitly provided.
-Specifically, `ACCUMULO_HOME` and `ACCUMULO_CONF_DIR` are used to ensure the correct version of the bundled
-Accumulo scripts are invoked and, in the event that multiple Accumulo processes exist on the same physical machine,
-but for different instances, the correct version is terminated. `HADOOP_CONF_DIR` is used to ensure that the necessary
-files to construct the FileSystem object for the cluster can be constructed (e.g. core-site.xml and hdfs-site.xml).
-
-# Manual Distributed Testing
-
-Apache Accumulo also contains a number of tests which are suitable for running against large clusters for hours to days
-at a time, for example the Continuous Ingest and Randomwalk test suites. These all exist in the repository under
-`test/system` and contain their own README files for configuration and use.

http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/TESTING.md
----------------------------------------------------------------------
diff --git a/TESTING.md b/TESTING.md
new file mode 100644
index 0000000..fc4d574
--- /dev/null
+++ b/TESTING.md
@@ -0,0 +1,114 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements.  See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License.  You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Testing Apache Accumulo
+
+This document is meant to serve as a quick reference to the automated test suites included in Apache Accumulo for users
+to run which validate the product and developers to continue to iterate upon to ensure that the product is stable and as
+free of bugs as possible.
+
+The automated testing suite can be categorized as two sets of tests: unit tests and integration tests. These are the
+traditional unit and integrations tests as defined by the Apache Maven [lifecycle][3] phases.
+
+# Unit tests
+
+Unit tests can be run by invoking `mvn test` at the root of the Apache Accumulo source tree.  For more information see
+the [maven-surefire-plugin docs][4].
+
+The unit tests should run rather quickly (order of minutes for the entire project) and, in nearly all cases, do not
+require any noticable amount of computer resources (the compilation of the files typically exceeds running the tests).
+Maven will automatically generate a report for each unit test run and will give a summary at the end of each Maven
+module for the total run/failed/errored/skipped tests.
+
+The Apache Accumulo developers expect that these tests are always passing on every revision of the code. If this is not
+the case, it is almost certainly in error.
+
+# Integration tests
+
+Integration tests can be run by invoking `mvn verify` at the root of the Apache Accumulo source tree.  For more
+information see the [maven-failsafe-plugin docs][5].
+
+The integration tests are medium length tests (order minutes for each test class and order hours for the complete suite)
+but are checking for regressions that were previously seen in the codebase. These tests do require a noticable amount of
+resources, at least another gigabyte of memory over what Maven itself requires. As such, it's recommended to have at
+least 3-4GB of free memory and 10GB of free disk space.
+
+## Accumulo for testing
+
+The primary reason these tests take so much longer than the unit tests is that most are using an Accumulo instance to
+perform the test. It's a necessary evil; however, there are things we can do to improve this.
+
+## MiniAccumuloCluster
+
+By default, these tests will use a MiniAccumuloCluster which is a multi-process "implementation" of Accumulo, managed
+through Java interfaces. This MiniAccumuloCluster has the ability to use the local filesystem or Apache Hadoop's
+MiniDFSCluster, as well as starting one to many tablet servers. MiniAccumuloCluster tends to be a very useful tool in
+that it can automatically provide a workable instance that mimics how an actual deployment functions.
+
+The downside of using MiniAccumuloCluster is that a significant portion of each test is now devoted to starting and
+stopping the MiniAccumuloCluster.  While this is a surefire way to isolate tests from interferring with one another, it
+increases the actual runtime of the test by, on average, 10x.
+
+## Standalone Cluster
+
+An alternative to the MiniAccumuloCluster for testing, a standalone Accumulo cluster can also be configured for use by
+most tests. This requires a manual step of building and deploying the Accumulo cluster by hand. The build can then be
+configured to use this cluster instead of always starting a MiniAccumuloCluster.  Not all of the integration tests are
+good candidates to run against a standalone Accumulo cluster, these tests will still launch a MiniAccumuloCluster for
+their use.
+
+Use of a standalone cluster can be enabled using system properties on the Maven command line or, more concisely, by
+providing a Java properties file on the Maven command line. The use of a properties file is recommended since it is
+typically a fixed file per standalone cluster you want to run the tests against.
+
+### Configuration
+
+The following properties can be used to configure a standalone cluster:
+
+- `accumulo.it.cluster.type`, Required: The type of cluster is being defined (valid options: MINI and STANDALONE)
+- `accumulo.it.cluster.standalone.principal`, Required: Standalone cluster principal (user)
+- `accumulo.it.cluster.standalone.password`, Required: Password for the principal
+- `accumulo.it.cluster.standalone.zookeepers`, Required: ZooKeeper quorum used by the standalone cluster
+- `accumulo.it.cluster.standalone.instance.name`, Required: Accumulo instance name for the cluster
+- `accumulo.it.cluster.standalone.home`, Optional: `ACCUMULO_HOME`
+- `accumulo.it.cluster.standalone.conf`, Optional: `ACCUMULO_CONF_DIR`
+- `accumulo.it.cluster.standalone.hadoop.conf`, Optional: `HADOOP_CONF_DIR`
+
+Each of the above properties can be set on the commandline (-Daccumulo.it.cluster.standalone.principal=root), or the
+collection can be placed into a properties file and referenced using "accumulo.it.cluster.properties".  For example, the
+following might be similar to what is executed for a standalone cluster.
+
+  `mvn verify -Daccumulo.it.properties=/home/user/my_cluster.properties`
+
+For the optional properties, each of them will be extracted from the environment if not explicitly provided.
+Specifically, `ACCUMULO_HOME` and `ACCUMULO_CONF_DIR` are used to ensure the correct version of the bundled
+Accumulo scripts are invoked and, in the event that multiple Accumulo processes exist on the same physical machine,
+but for different instances, the correct version is terminated. `HADOOP_CONF_DIR` is used to ensure that the necessary
+files to construct the FileSystem object for the cluster can be constructed (e.g. core-site.xml and hdfs-site.xml).
+
+# Manual Distributed Testing
+
+Apache Accumulo also contains a number of tests which are suitable for running against large clusters for hours to days
+at a time, for example the [Continuous Ingest][1] and [Randomwalk test][2] suites. These all exist in the repository under
+`test/system` and contain their own README files for configuration and use.
+
+[1]: test/system/continuous/README.md
+[2]: test/system/randomwalk/README.md
+[3]: https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html
+[4]: http://maven.apache.org/surefire/maven-surefire-plugin/
+[5]: http://maven.apache.org/surefire/maven-failsafe-plugin/
+

http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/assemble/src/main/assemblies/component.xml
----------------------------------------------------------------------
diff --git a/assemble/src/main/assemblies/component.xml b/assemble/src/main/assemblies/component.xml
index 3f18da3..8dfce4d 100644
--- a/assemble/src/main/assemblies/component.xml
+++ b/assemble/src/main/assemblies/component.xml
@@ -248,7 +248,9 @@
         <include>CHANGES</include>
         <include>LICENSE</include>
         <include>NOTICE</include>
-        <include>README</include>
+        <include>README.md</include>
+        <include>INSTALL.md</include>
+        <include>BUILD.md</include>
       </includes>
     </fileSet>
   </fileSets>