You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by kt...@apache.org on 2015/02/03 01:07:57 UTC
accumulo git commit: ACCUMULO-1515 Reorganized README and converted
to markdown
Repository: accumulo
Updated Branches:
refs/heads/master d7dcb8773 -> c479f874a
ACCUMULO-1515 Reorganized README and converted to markdown
Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/c479f874
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/c479f874
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/c479f874
Branch: refs/heads/master
Commit: c479f874aa7cf9a842e021c3b719068c4c2abeef
Parents: d7dcb87
Author: Keith Turner <kt...@apache.org>
Authored: Mon Feb 2 19:01:33 2015 -0500
Committer: Keith Turner <kt...@apache.org>
Committed: Mon Feb 2 19:01:33 2015 -0500
----------------------------------------------------------------------
INSTALL.md | 160 ++++++++
NOTICE | 32 ++
README | 467 ------------------------
README.md | 102 ++++++
TESTING | 113 ------
TESTING.md | 114 ++++++
assemble/src/main/assemblies/component.xml | 4 +-
7 files changed, 411 insertions(+), 581 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/INSTALL.md
----------------------------------------------------------------------
diff --git a/INSTALL.md b/INSTALL.md
new file mode 100644
index 0000000..32f74ca
--- /dev/null
+++ b/INSTALL.md
@@ -0,0 +1,160 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+Installing Accumulo
+===================
+
+This document covers installing Accumulo on single and multi-node environments.
+Either [download][1] or [build][2] a binary distribution of Accumulo from
+source code. Unpack as follows.
+
+ cd <install location>
+ tar xzf <some dir>/accumulo-X.Y.Z-bin.tar.gz
+ cd accumulo-X.Y.Z
+
+Accumulo has some optional native code that improves its performance and
+stability. Before configuring Accumulo attempt to build this native code
+with the following command.
+
+ ./bin/build_native_library.sh
+
+If the command fails, its ok to continue with setup and resolve the issue
+later.
+
+
+Configuring
+-----------
+
+The Accumulo conf directory needs to be populated with initial config files.
+The following script is provided to assist with this. Run the script and
+answer the questions. When the script ask about memory-map type, choose Native
+if the build native script was successful. Otherwise choose Java.
+
+ ./bin/bootstrap_config.sh
+
+The script will prompt for memory usage. Please note that the footprints are
+only for the Accumulo system processes, so ample space should be left for other
+processes like hadoop, zookeeper, and the accumulo client code. If Accumulo
+worker processes are swapped out and unresponsive, they may be killed.
+
+After this script runs, the conf directory should be populated and now a few
+edits are needed.
+
+### Secret
+
+Accumulo coordination and worker processes can only communicate with each other
+if they share the same secret key. To change the secret key set
+`instance.secret` in `conf/accumulo-site.xml`. Changing this secret key from
+the default is highly recommended.
+
+### Dependencies
+
+Accumulo requires running [Zookeeper][3] and [HDFS][4] instances. Also, the
+Accumulo binary distribution does not include jars for Zookeeper and Hadoop.
+When configuring Accumulo the following information about these dependencies
+must be provided.
+
+ * **Location of Zookeepers** : Provide this by setting `instance.zookeeper.host`
+ in `conf/accumulo-site.xml`.
+ * **Where to store data** : Provide this by setting `instance.volumes` in
+ `conf/accumulo-site.xml`. If your namenode is running at 192.168.1.9:9000
+ and you want to store data in `/accumulo` in HDFS, then set
+ `instance.volumes` to `hdfs://192.168.1.9:9000/accumulo`.
+ * **Location of Zoookeeper and Hadoop jars** : Setting `ZOOKEEPER_HOME` and
+ `HADOOP_PREFIX` in `conf/accumulo-env.sh` will help Accumulo find these
+ jars.
+
+If Accumulo has problems later on finding jars, then run `bin/accumulo
+classpath` to print out info about where Accumulo is finding jars. If the
+settings mentioned above are correct, then inspect `general.classpaths` in
+`conf/accumulo-site.xml`.
+
+Initialization
+--------------
+
+Accumulo needs to initialize the locations where it stores data in Zookeeper
+and HDFS. The following command will do this.
+
+ ./bin/accumulo init
+
+The initialization command will prompt for the following information.
+
+ * **Instance name** : This is the name of the Accumulo instance and its
+ Accumulo clients need to know it inorder to connect.
+ * **Root password** : Initialization sets up an initial Accumulo root user and
+ prompts for its password. This information will be needed to later connect
+ to Accumulo.
+
+Multiple Nodes
+--------------
+
+Skip this section if running Accumulo on a single node. Accumulo has
+coordinating, monitoring, and worker processes that run on specified nodes in
+the cluster. The following files should be populated with a newline separated
+list of node names. Must change from localhost.
+
+ * `conf/masters` : Accumulo primary coordinating process. Must specify one
+ node. Can specify a few for fault tolerance.
+ * `conf/gc` : Accumulo garbage collector. Must specify one node. Can
+ specify a few for fault tolerance.
+ * `conf/monitor` : Node where Accumulo monitoring web server is run.
+ * `conf/slaves` : Accumulo worker processes. List all of the nodes where
+ tablet servers should run in this file.
+ * `conf/tracers` : Optional capability. Can specify zero or more nodes.
+
+The Accumulo, Hadoop, and Zookeeper software should be present at the same
+location on every node. Also the files in the `conf` directory must be copied
+to every node. There are many ways to replicate the software and
+configuration, two possible tools that can help replicate software and/or
+config are [pdcp][5] and [prsync][6].
+
+Starting Accumulo
+-----------------
+
+The Accumulo scripts use ssh to start processes on remote nodes. Before
+attempting to start Accumulo, [passwordless ssh][7] must be setup on the
+cluster.
+
+After configuring and initializing Accumulo, use the following command to start
+it.
+
+ ./bin/start-all.sh
+
+First steps
+-----------
+
+Once the `start-all.sh` script completes, use the following command to run the
+Accumulo shell.
+
+ ./bin/accumulo shell -u root
+
+Use your web browser to connect the Accumulo monitor page on port 50095.
+
+ http://<hostname in conf/monitor>:50095/
+
+When finished, use the following command to stop Accumulo.
+
+ ./bin/stop-all.sh
+
+[1]: http://accumulo.apache.org/
+[2]: README.md#building-
+[3]: http://zookeeper.apache.org/
+[4]: http://http://hadoop.apache.org/
+[5]: https://code.google.com/p/pdsh/
+[6]: https://code.google.com/p/parallel-ssh/
+[7]: https://www.google.com/search?q=hadoop+passwordless+ssh&ie=utf-8&oe=utf-8
+
http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/NOTICE
----------------------------------------------------------------------
diff --git a/NOTICE b/NOTICE
index af212c2..72eb8c1 100644
--- a/NOTICE
+++ b/NOTICE
@@ -6,3 +6,35 @@ The Apache Software Foundation (http://www.apache.org/).
This product includes JCommander (https://github.com/cbeust/jcommander),
Copyright 2010 Cedric Beust cedric@beust.com.
+
+******************************************************************************
+
+Export Control
+
+This distribution includes cryptographic software. The country in which you
+currently reside may have restrictions on the import, possession, use, and/or
+re-export to another country, of encryption software. BEFORE using any
+encryption software, please check your country's laws, regulations and
+policies concerning the import, possession, or use, and re-export of encryption
+software, to see if this is permitted. See <http://www.wassenaar.org/> for more
+information.
+
+The U.S. Government Department of Commerce, Bureau of Industry and Security
+(BIS), has classified this software as Export Commodity Control Number (ECCN)
+5D002.C.1, which includes information security software using or performing
+cryptographic functions with asymmetric algorithms. The form and manner of this
+Apache Software Foundation distribution makes it eligible for export under the
+License Exception ENC Technology Software Unrestricted (TSU) exception (see the
+BIS Export Administration Regulations, Section 740.13) for both object code and
+source code.
+
+The following provides more details on the included cryptographic software: ...
+
+Apache Accumulo uses the built-in java cryptography libraries in it's RFile
+encryption implementation. See
+http://www.oracle.com/us/products/export/export-regulations-345813.html
+for more details for on Java's cryptography features. Apache Accumulo also uses
+the bouncycastle library for some crypographic technology as well. See
+http://www.bouncycastle.org/wiki/display/JA1/Frequently+Asked+Questions for
+more details on bouncycastle's cryptography features.
+
http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/README
----------------------------------------------------------------------
diff --git a/README b/README
deleted file mode 100644
index 4ebb078..0000000
--- a/README
+++ /dev/null
@@ -1,467 +0,0 @@
-Title: Apache Accumulo
-Notice: Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
- .
- http://www.apache.org/licenses/LICENSE-2.0
- .
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
-
-******************************************************************************
-0. Introduction
-
-Apache Accumulo is a sorted, distributed key/value store based on Google's
-BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It
-features a few novel improvements on the BigTable design in the form of
-cell-level access labels and a server-side programming mechanism that can modify
-key/value pairs at various points in the data management process.
-
-******************************************************************************
-1. Building
-
-In the normal tarball release of accumulo, everything is built and
-ready to go on x86 GNU/Linux: there is no build step.
-
-However, if you only have source code, or you wish to make changes, you need to
-have maven configured to get Accumulo prerequisites from repositories. See
-the pom.xml file for the necessary components.
-
-You can build an Accumulo binary distribution, which is created in the
-assemble/target directory, using the following command. Note that maven 3
-is required starting with Accumulo v1.5.0. By default, Accumulo compiles
-against Apache Hadoop 2.2.0, but these artifacts should be compatible with newer
-releases of Hadoop which are backwards-compatible with Hadoop 2.2.0.
-
- mvn package -P assemble
-
-By default, Accumulo compiles against Apache Hadoop 2.2.0. To compile against
-a different Hadoop 2-compatible version, specify the profile and version,
-e.g. "-Dhadoop.version=0.23.5".
-
-Support for Apache Hadoop 1.x versions has been dropped. To support these
-versions of Hadoop, use an older version of Accumulo.
-
-If you are running on another Unix-like operating system (OSX, etc) then
-you may wish to build the native libraries. They are not strictly necessary
-but having them available suppresses a runtime warning and enables Accumulo
-to run faster. You can execute the following script to automatically unpack
-and install the native library. Be sure to have a JDK and a C++ compiler
-installed with the JAVA_HOME environment variable set.
-
- $ ./bin/build_native_library.sh
-
-If your system's default compiler options are insufficient, you can add
-additional compiler options to the command line, such as options for the
-architecture. These will be passed to the Makefile in the environment variable
-USERFLAGS:
-
- $ ./bin/build_native_library.sh -m32
-
-Alternatively, you can manually unpack the accumulo-native tarball in the
-$ACCUMULO_HOME/lib directory. Change to the accumulo-native directory in
-the current directory and issue `make`. Then, copy the resulting 'libaccumulo'
-library into the $ACCUMULO_HOME/lib/native/map.
-
- $ mkdir -p $ACCUMULO_HOME/lib/native/map
- $ cp libaccumulo.* $ACCUMULO_HOME/lib/native/map
-
-
-Building Documentation
-
-Use the following command to build the User Manual (docs/target/generated-docs/accumulo_user_manual.html)
-and the configuration HTML page (docs/target/config.html)
-
- mvn package -P docs -DskipTests
-
-******************************************************************************
-2. Deployment
-
-Copy the accumulo tar file produced by mvn package from the assemble/target/
-directory to the desired destination, then untar it (e.g.
-tar xzf accumulo-1.6.0-bin.tar.gz).
-
-Another option is to package Accumulo directly to a working directory. For example,
-
- mvn package -DskipTests -DDEV_ACCUMULO_HOME=/var/tmp
-
-The above command would create a directory with a name similar to
-/var/tmp/accumulo-1.6.0-dev/accumulo-1.6.0/, containing all the contents
-that are normally contained in accumulo-1.6.0-bin.tar.gz, but already unpacked.
-If the DEV_ACCUMULO_HOME parameter is not specified, this directory would
-normally be created in assemble/target, but that is subject to deletion by
-the 'mvn clean' command. Specifying an external directory would not be subject
-to 'mvn clean'. When executed more than once, newer files overwrite older files,
-and files a user adds (such as configuration files in conf/) will be left alone.
-
-If HDFS and Zookeeper are running, you can run Accumulo directly from this
-working directory. See the 'Running Apache Accumulo' section later in this document.
-
-You can avoid specifying the working directory each time you compile by adding
-a profile to maven's settings.xml file. Below is an example of $HOME/.m2/settings.xml
-
- <settings xmlns="http://maven.apache.org/SETTINGS/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/SETTINGS/1.0.0 http://maven.apache.org/xsd/settings-1.0.0.xsd">
- <profiles>
- <profile>
- <id>inject-accumulo-home</id>
- <properties>
- <DEV_ACCUMULO_HOME>/var/tmp</DEV_ACCUMULO_HOME>
- </properties>
- </profile>
- </profiles>
- <activeProfiles>
- <activeProfile>inject-accumulo-home</activeProfile>
- </activeProfiles>
- </settings>
-
-******************************************************************************
-3. Upgrading
-
-3.1. From 1.5 to 1.6
-
- This happens automatically the first time Accumulo 1.6 is started.
-
- If your instance previously upgraded from 1.4 to 1.5, you must verify that your
- 1.5 instance has no outstanding local write ahead logs. You can do this by ensuring
- either:
-
- - All of your tables are online and the Monitor shows all tablets hosted
- - The directory for write ahead logs (logger.dir.walog) from 1.4 has no files remaining
- on any tablet server / logger hosts
-
- To upgrade from 1.5 to 1.6 you must:
-
- * Verify that there are no outstanding FATE operations
- - Under 1.5 you can list what's in FATE by running
- $ACCUMULO_HOME/bin/accumulo org.apache.accumulo.server.fate.Admin print
- - Note that operations in any state will prevent an upgrade. It is safe
- to delete operations with status SUCCESSFUL. For others, you should restart
- your 1.5 cluster and allow them to finish.
- * Stop the 1.5 instance.
- * Configure 1.6 to use the hdfs directory and zookeepers that 1.5 was using.
- * Copy other 1.5 configuration options as needed.
- * Start Accumulo 1.6.
-
- The upgrade process must make changes to Accumulo's internal state in both ZooKeeper and
- the table metadata. This process may take some time if Tablet Servers have to go through
- recovery. During this time, the Monitor will claim that the Master is down and some
- services may send the Monitor log messages about failure to communicate with each other.
- These messages are safe to ignore. If you need detail on the upgrade's progress you should
- view the local logs on the Tablet Servers and active Master.
-
-3.2. From 1.4 to 1.6
-
- To upgrade from 1.4 to 1.6 you must perform a manual initial step.
-
- Prior to upgrading you must:
- * Verify that there are no outstanding FATE operations
- - Under 1.4 you can list what's in FATE by running
- $ACCUMULO_HOME/bin/accumulo org.apache.accumulo.server.fate.Admin print
- - Note that operations in any state will prevent an upgrade. It is safe
- to delete operations with status SUCCESSFUL. For others, you should restart
- your 1.4 cluster and allow them to finish.
- * Stop the 1.4 instance.
- * Configure 1.6 to use the hdfs directory, walog directories, and zookeepers
- that 1.4 was using.
- * Copy other 1.4 configuration options as needed.
-
- Prior to starting the 1.6 instance you will need to run the LocalWALRecovery tool
- on each node that previously ran an instance of the Logger role.
-
- $ACCUMULO_HOME/bin/accumulo org.apache.accumulo.tserver.log.LocalWALRecovery
-
- The recovery tool will rewrite the 1.4 write ahead logs into a format that 1.6 can read.
- After this step has completed on all nodes, start the 1.6 cluster to continue the upgrade.
-
- The upgrade process must make changes to Accumulo's internal state in both ZooKeeper and
- the table metadata. This process may take some time if Tablet Servers have to go through
- recovery. During this time, the Monitor will claim that the Master is down and some
- services may send the Monitor log messages about failure to communicate with each other.
- While the upgrade is in progress, the Garbage Collector may complain about invalid paths.
- The Master may also complain about failure to create the trace table because it already
- exists. These messages are safe to ignore. If other error messages occur, you should seek
- out support before continuing to use Accumulo. If you need detail on the upgrade's progress
- you should view the local logs on the Tablet Servers and active Master.
-
- Note that the LocalWALRecovery tool does not delete the local files. Once you confirm that
- 1.6 is successfully running, you should delete these files on the local filesystem.
-
-******************************************************************************
-4. Configuring
-
-Apache Accumulo has two prerequisites, hadoop and zookeeper. Zookeeper 3.4.* series is
-preferred. Both zookeeper and hadoop must be installed and configured. Some versions of
-Zookeeper may only allow 10 connections from one computer by default. On a single-host
-install, this number is a little too low. Add the following to
-the $ZOOKEEPER_HOME/conf/zoo.cfg file:
-
- maxClientCnxns=100
-
-Ensure you (or the some special hadoop user account) have accounts on all of
-the machines in the cluster and that hadoop and accumulo install files can be
-found in the same location on every machine in the cluster. You will need to
-have password-less ssh set up as described in the hadoop documentation.
-
-You will need to have hadoop installed and configured on your system. Accumulo
-1.6.0 has been tested with hadoop version 1.0.4. To avoid data loss,
-you must enable HDFS durable sync. How you enable this depends on your version
-of Hadoop. Please consult the table below for information regarding your version.
-If you need to set the coniguration, please be sure to restart HDFS. See
-ACCUMULO-623 and ACCUMULO-1637 for more information.
-
-The following releases of Apache Hadoop require special configuration to ensure
-that data is not inadvertently lost; however, in all releases of Apache Hadoop,
-`dfs.durable.sync` and `dfs.support.append` should *not* be configured as `false`.
-
-VERSION NAME=VALUE
-0.20.205.0 - dfs.support.append=true
-1.0.x - dfs.support.append=true
-
-Additionally, it is strongly recommended that you enable 'dfs.datanode.synconclose'
-(only available in Apache Hadoop >=1.1.1 or >=0.23) in your hdfs-site.xml configuration
-file to ensure that, in the face of unexpected power loss to a datanode, files are
-wholly synced to disk.
-
-Accumulo's own configuration files can be bootstrapped with the
-$ACCUMULO_HOME/bin/bootstrap_config.sh script. This script will allow you to
-select options which correspond closely to your particular environment. The
-configuration files produced by this script are examples. You should always
-inspect any configuration files you use to ensure they are appropriate for your
-environment, and to tailor them to your needs, as they are not guaranteed to be
-suitable for all users and all environments.
-
-Some example accumulo configuration files are placed in directories based on the
-memory footprint for the accumulo processes. These are pre-generated from
-particular selections from the bootstrap_config.sh script for your convenience.
-If you are using native libraries for you tablet server in-memory map, then you
-can use the files in "native-standalone". If you get warnings about not being
-able to load the native libraries, you can use the configuration files in
-"standalone".
-
-For testing on a single computer, use a fairly small configuration:
-
- $ cp conf/examples/512MB/native-standalone/* conf
-
-Please note that the footprints are for only the Accumulo system processes, so
-ample space should be left for other processes like hadoop, zookeeper, and the
-accumulo client code. These directories must be at the same location on every
-node in the cluster.
-
-If you are configuring a larger cluster you will need to create the configuration
-files yourself and propogate the changes to the $ACCUMULO_CONF_DIR directories:
-
- Create a "slaves" file in $ACCUMULO_CONF_DIR/. This is a list of machines
- where tablet servers and loggers will run.
-
- Create a "masters" file in $ACCUMULO_CONF_DIR/. This is a list of
- machines where the master server will run.
-
- Create conf/accumulo-env.sh following the template of
- example/3GB/native-standalone/accumulo-env.sh.
-
-However you create your configuration files, you will need to set
-JAVA_HOME, HADOOP_HOME, and ZOOKEEPER_HOME in conf/accumulo-env.sh
-
-Note that zookeeper client jar files must be installed on every machine, but
-the server should not be run on every machine.
-
-Create the $ACCUMULO_LOG_DIR on every machine in the slaves file.
-
-* Note that you will be specifying the Java heap space in accumulo-env.sh.
-You should make sure that the total heap space used for the accumulo tserver,
-logger and the hadoop datanode and tasktracker is less than the available
-memory on each slave node in the cluster. On large clusters, it is recommended
-that the accumulo master, hadoop namenode, secondary namenode, and hadoop
-jobtracker all be run on separate machines to allow them to use more heap
-space. If you are running these on the same machine on a small cluster, make
-sure their heap space settings fit within the available memory. The zookeeper
-instances are also time sensitive and should be on machines that will not be
-heavily loaded, or over-subscribed for memory.
-
-Edit conf/accumulo-site.xml. You must set the zookeeper servers in this
-file (instance.zookeeper.host). Look at the "Configuration Management" section
-of the user manual to see what additional variables you can modify and what
-the defaults are.
-
-It is advisable to change the instance secret (instance.secret) to some new
-value. Also ensure that the accumulo-site.xml file is not readable by other
-users on the machine.
-
-Synchronize your accumulo conf directory across the cluster. As a precaution
-against mis-configured systems, servers using different configuration files
-will not communicate with the rest of the cluster.
-
-Accumulo requires the hadoop "commons-io" java package. This is normally
-distributed with hadoop. However, it was not distributed with hadoop-0.20.
-If your hadoop distribution does not provide this package, you will need
-to obtain it and put the commons-io jar file in $ACCUMULO_HOME/lib. See the
-pom.xml file for version information.
-
-******************************************************************************
-5. Running Apache Accumulo
-
-Make sure hadoop is configured on all of the machines in the cluster, including
-access to a shared hdfs instance. Make sure hdfs is running.
-
-Make sure zookeeper is configured and running on at least one machine in the
-cluster.
-
-Run "bin/accumulo init" to create the hdfs directory structure
-(hdfs:///accumulo/*) and initial zookeeper settings. This will also allow you
-to also configure the initial root password. Only do this once.
-
-Start accumulo using the bin/start-all.sh script.
-
-Use the "bin/accumulo shell -u <username>" command to run an accumulo shell
-interpreter. Within this interpreter, run "createtable <tablename>" to create
-a table, and run "table <tablename>" followed by "scan" to scan a table.
-
-In the example below a table is created, data is inserted, and the table is
-scanned.
-
- $ ./bin/accumulo shell -u root
- Enter current password for 'root'@'accumulo': ******
-
- Shell - Apache Accumulo Interactive Shell
- -
- - version: 1.5.0
- - instance name: accumulo
- - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f
- -
- - type 'help' for a list of available commands
- -
- root@ac> createtable foo
- root@ac foo> insert row1 colf1 colq1 val1
- root@ac foo> insert row1 colf1 colq2 val2
- root@ac foo> scan
- row1 colf1:colq1 [] val1
- row1 colf1:colq2 [] val2
-
-The example below start the shell, switches to table foo, and scans for a
-certain column.
-
- $ ./bin/accumulo shell -u root
- Enter current password for 'root'@'accumulo': ******
-
- Shell - Apache Accumulo Interactive Shell
- -
- - version: 1.5.0
- - instance name: accumulo
- - instance id: f5947fe6-081e-41a8-9877-43730c4dfc6f
- -
- - type 'help' for a list of available commands
- -
- root@ac> table foo
- root@ac foo> scan -c colf1:colq2
- row1 colf1:colq2 [] val2
-
-
-For information on how to configure Accumulo for on top of Secure HDFS with
-Kerberos, please consult the Accumulo user manual section specifically devoted
-to client and server configuration with Kerberos.
-
-******************************************************************************
-6. Monitoring Apache Accumulo
-
-You can point your browser to the master host, on port 50095 to see the status
-of accumulo across the cluster. You can even do this with the text-based
-browser "links":
-
- $ links http://localhost:50095
-
-From this GUI, you can ensure that tablets are assigned, tables are online,
-tablet servers are up. You can monitor query and ingest rates across the
-cluster.
-
-******************************************************************************
-7. Stopping Apache Accumulo
-
-Do not kill the tabletservers or run bin/tdown.sh unless absolutely necessary.
-Recovery from a catastrophic loss of servers can take a long time. To shutdown
-cleanly, run "bin/stop-all.sh" and the master will orchestrate the shutdown of
-all the tablet servers. Shutdown waits for all writes to finish, so it may
-take some time for particular configurations.
-
-******************************************************************************
-8. Logging
-
-DEBUG and above are logged to the logs/ dir. To modify this behavior change
-the scripts in conf/. To change the logging dir, set ACCUMULO_LOG_DIR in
-conf/accumulo-env.sh. Stdout and stderr of each accumulo process is
-redirected to the log dir.
-
-******************************************************************************
-9. API
-
-The public Accumulo API is composed of :
-
- * All public classes and interfaces in the org.apache.accumulo.core.client
- package, as as well as all of its subpackages excluding those named "impl".
- * Key, Mutation, Value, Range, Condition, and ConditionalMutation in
- org.apache.accumulo.core.data.
- * All public classes and interfaces in the org.apache.accumulo.minicluster
- package, as well as all of its subpackages excluding those named "impl".
- * Anything with public or protected acccess within any Class or Interface that
- is in the public API. This includes, but is not limited to: methods, members
- classes, interfaces, and enums.
-
-The Accumulo project maintains binary compatibility across this API within a major
-release, as defined in the Java Language Specification 3rd ed. API changes should
-only be made on major releases, with continued support of deprecated API elements
-for at least one major revision.
-
-To get started using accumulo review the example and the javadoc for the
-packages and classes mentioned above.
-
-******************************************************************************
-10. Performance Tuning
-
-Apache Accumulo has exposed several configuration properties that can be
-changed. These properties and configuration management are described in detail
-in the user manual. While the default value is usually optimal, there are
-cases where a change can increase query and ingest performance.
-
-Before changing a property from its default in a production system, you should
-develop a good understanding of the property and consider creating a test to
-prove the increased performance.
-
-******************************************************************************
-
-11. Export Control
-
-This distribution includes cryptographic software. The country in which you
-currently reside may have restrictions on the import, possession, use, and/or
-re-export to another country, of encryption software. BEFORE using any
-encryption software, please check your country's laws, regulations and
-policies concerning the import, possession, or use, and re-export of encryption
-software, to see if this is permitted. See <http://www.wassenaar.org/> for more
-information.
-
-The U.S. Government Department of Commerce, Bureau of Industry and Security
-(BIS), has classified this software as Export Commodity Control Number (ECCN)
-5D002.C.1, which includes information security software using or performing
-cryptographic functions with asymmetric algorithms. The form and manner of this
-Apache Software Foundation distribution makes it eligible for export under the
-License Exception ENC Technology Software Unrestricted (TSU) exception (see the
-BIS Export Administration Regulations, Section 740.13) for both object code and
-source code.
-
-The following provides more details on the included cryptographic software: ...
-
-Apache Accumulo uses the built-in java cryptography libraries in it's RFile
-encryption implementation. See
-http://www.oracle.com/us/products/export/export-regulations-345813.html
-for more details for on Java's cryptography features. Apache Accumulo also uses
-the bouncycastle library for some crypographic technology as well. See
-http://www.bouncycastle.org/wiki/display/JA1/Frequently+Asked+Questions for
-more details on bouncycastle's cryptography features.
-
-******************************************************************************
http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/README.md
----------------------------------------------------------------------
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..15be8db
--- /dev/null
+++ b/README.md
@@ -0,0 +1,102 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+Apache Accumulo
+===============
+
+The [Apache Accumuloâ„¢][1] sorted, distributed key/value store is a robust,
+scalable, high performance data storage and retrieval system. Apache Accumulo
+is based on Google's [BigTable][4] design and is built on top of Apache
+[Hadoop][5], [Zookeeper][6], and [Thrift][7]. Apache Accumulo features a few
+novel improvements on the BigTable design in the form of cell-based access
+control and a server-side programming mechanism that can modify key/value pairs
+at various points in the data management process. Other notable improvements
+and feature are outlined [here][8].
+
+To install and run an Accumulo binary distribution, follow the [install][2]
+instructions.
+
+Documentation
+-------------
+
+Accumulo provides the following documentation :
+
+ * **User Manual** : In-depth developer and administrator documentation.
+ * **Examples** : Code with corresponding readme files that give step by step
+ instructions for running example code.
+
+This documentation is available on the [Accumulo site][1]. In the source and
+binary distributions of Accumulo, the documentation is at different locations.
+
+In the Accumulo binary distribution, all documentation is in the `docs`
+directory. The binary distribution does not include example source code, but
+it does include a jar with the compiled examples. This examples jar makes it
+easy to step through the example readmes, after following the [install][2]
+instructions.
+
+In the Accumulo source, documentations is found at the following locations.
+
+ * [Example Source](examples/simple/src/main/java/org/apache/accumulo/examples/simple)
+ * [Example Readmes](docs/src/main/resources/examples)
+ * [User Manual Source](docs/src/main/asciidoc)
+
+Building
+--------
+
+Accumulo uses [Maven][9] to compile, [test][3], and package its source. The
+following command will build the binary tar.gz from source. Note, these
+instructions will not work for the Accumulo binary distribution as it does not
+include source.
+
+ mvn package -P assemble
+
+This command produces a file at the following location.
+
+ assemble/target/accumulo-X.Y.Z-SNAPSHOT-bin.tar.gz
+
+This will not include documentation, adding the `-P docs` option to the maven
+command will build documentation.
+
+API
+---
+
+The public Accumulo API is composed of :
+
+ * All public classes and interfaces in the org.apache.accumulo.core.client
+ package, as as well as all of its subpackages excluding those named *impl*.
+ * Key, Mutation, Value, Range, Condition, and ConditionalMutation in
+ org.apache.accumulo.core.data.
+ * All public classes and interfaces in the org.apache.accumulo.minicluster
+ package, as well as all of its subpackages excluding those named *impl*.
+ * Anything with public or protected acccess within any Class or Interface that
+ is in the public API. This includes, but is not limited to: methods, members
+ classes, interfaces, and enums.
+
+The Accumulo project maintains binary compatibility across this API within a
+major release, as defined in the Java Language Specification 3rd ed. Starting
+with Accumulo 1.6.2 and 1.7.0 all API changes will follow [semver 2.0][12]
+
+[1]: http://accumulo.apache.org
+[2]: INSTALL.md
+[3]: TESTING.md
+[4]: http://research.google.com/archive/bigtable.html
+[5]: http://hadoop.apache.org
+[6]: http://zookeeper.apache.org
+[7]: http://thrift.apache.org/
+[8]: http://accumulo.apache.org/notable_features.html
+[9]: http://maven.apache.org/
+[12]: http://semver.org/spec/v2.0.0.html
http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/TESTING
----------------------------------------------------------------------
diff --git a/TESTING b/TESTING
deleted file mode 100644
index cf2afba..0000000
--- a/TESTING
+++ /dev/null
@@ -1,113 +0,0 @@
-Title: Testing Apache Accumulo
-Notice: Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
- .
- http://www.apache.org/licenses/LICENSE-2.0
- .
- Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
-
-# Testing Apache Accumulo
-
-This document is meant to serve as a quick reference to the automated test suites included in Apache Accumulo for users
-to run which validate the product and developers to continue to iterate upon to ensure that the product is stable and as
-free of bugs as possible.
-
-The automated testing suite can be categorized as two sets of tests: unit tests and integration tests. These are the
-traditional unit and integrations tests as defined by the Apache Maven lifecycle phases (unit tests run at `test` and
-integration tests run at `integration-test`).
-
-# Unit tests
-
-Unit tests can be run by invoking `mvn test` at the top of the Apache Accumulo source tree; however, it is more often
-the case that these tests are automatically run by invoking `mvn package` instead. Either invocation should work
-successfully.
-
-The unit tests should run rather quickly (order of minutes for the entire project) and, in nearly all cases, do not
-require any noticable amount of computer resources (the compilation of the files typically exceeds running the tests).
-Maven will automatically generate a report for each unit test run and will give a summary at the end of each Maven
-module for the total run/failed/errored/skipped tests.
-
-The Apache Accumulo developers expect that these tests are always passing on every revision of the code. If this is not
-the case, it is almost certainly in error.
-
-# Integration tests
-
-Integration tests can be run by invoking `mvn integration-test` at the top of the Apache Accumulo source tree; however,
-like `mvn package` being recommended for unit tests, `mvn verify` is often the recommended avenue to run the integration tests.
-
-The integration tests are medium length tests (order minutes for each test class and order hours for the complete suite
-with single threaded execution) but are very encompassing of checking for regressions that were previously seen in the
-codebase. These tests do require a noticable amount of resources, at least another gigabyte of memory over what Maven
-itself requires. As such, it's recommended to have at least 3-4GB of free memory and 10GB of free disk space.
-
-Take note that when invoking the `integration-test` lifecycle phase, other functions will also be enabled which include
-static analysis (findbugs) and software license checks (release analysis tool -- RAT).
-
-## Accumulo for testing
-
-The primary reason these tests take so much longer than the unit tests is that most are using an Accumulo instance to
-perform the test. It's a necessary evil; however, there are things we can do to improve this.
-
-## MiniAccumuloCluster
-
-By default, these tests will use a MiniAccumuloCluster which is a multi-process "implementation" of Accumulo, managed
-through Java interfaces. This MiniAccumuloCluster has the ability to use the local filesystem or Apache Hadoop's
-MiniDFSCluster, as well as starting one to many tablet servers. MiniAccumuloCluster tends to be a very useful tool in
-that it can automatically provide a workable instance that mimics how an actual deployment functions.
-
-The downside of using MiniAccumuloCluster is that a significant portion of each test is now devoted to starting and
-stopping the MiniAccumuloCluster. While this is a surefire way to isolate tests from interferring with one another, it
-increases the actual runtime of the test by, on average, 10x.
-
-## Standalone Cluster
-
-An alternative to the MiniAccumuloCluster for testing, a standalone Accumulo cluster can also be configured for use by
-most tests. This requires a manual step of building and deploying the Accumulo cluster by hand. The build can then be
-configured to use this cluster instead of always starting a MiniAccumuloCluster. Not all of the integration tests are
-good candidates to run against a standalone Accumulo cluster, these tests will still launch a MiniAccumuloCluster for
-their use.
-
-Use of a standalone cluster can be enabled using system properties on the Maven command line or, more concisely, by
-providing a Java properties file on the Maven command line. The use of a properties file is recommended since it is
-typically a fixed file per standalone cluster you want to run the tests against.
-
-### Configuration
-
-The following properties can be used to configure a standalone cluster:
-
-- `accumulo.it.cluster.type`, Required: The type of cluster is being defined (valid options: MINI and STANDALONE)
-- `accumulo.it.cluster.standalone.principal`, Required: Standalone cluster principal (user)
-- `accumulo.it.cluster.standalone.password`, Required: Password for the principal
-- `accumulo.it.cluster.standalone.zookeepers`, Required: ZooKeeper quorum used by the standalone cluster
-- `accumulo.it.cluster.standalone.instance.name`, Required: Accumulo instance name for the cluster
-- `accumulo.it.cluster.standalone.home`, Optional: `ACCUMULO_HOME`
-- `accumulo.it.cluster.standalone.conf`, Optional: `ACCUMULO_CONF_DIR`
-- `accumulo.it.cluster.standalone.hadoop.conf`, Optional: `HADOOP_CONF_DIR`
-
-Each of the above properties can be set on the commandline (-Daccumulo.it.cluster.standalone.principal=root), or the
-collection can be placed into a properties file and referenced using "accumulo.it.cluster.properties". For example, the
-following might be similar to what is executed for a standalone cluster.
-
- `mvn verify -Daccumulo.it.properties=/home/user/my_cluster.properties`
-
-For the optional properties, each of them will be extracted from the environment if not explicitly provided.
-Specifically, `ACCUMULO_HOME` and `ACCUMULO_CONF_DIR` are used to ensure the correct version of the bundled
-Accumulo scripts are invoked and, in the event that multiple Accumulo processes exist on the same physical machine,
-but for different instances, the correct version is terminated. `HADOOP_CONF_DIR` is used to ensure that the necessary
-files to construct the FileSystem object for the cluster can be constructed (e.g. core-site.xml and hdfs-site.xml).
-
-# Manual Distributed Testing
-
-Apache Accumulo also contains a number of tests which are suitable for running against large clusters for hours to days
-at a time, for example the Continuous Ingest and Randomwalk test suites. These all exist in the repository under
-`test/system` and contain their own README files for configuration and use.
http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/TESTING.md
----------------------------------------------------------------------
diff --git a/TESTING.md b/TESTING.md
new file mode 100644
index 0000000..fc4d574
--- /dev/null
+++ b/TESTING.md
@@ -0,0 +1,114 @@
+<!--
+Licensed to the Apache Software Foundation (ASF) under one or more
+contributor license agreements. See the NOTICE file distributed with
+this work for additional information regarding copyright ownership.
+The ASF licenses this file to You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with
+the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Testing Apache Accumulo
+
+This document is meant to serve as a quick reference to the automated test suites included in Apache Accumulo for users
+to run which validate the product and developers to continue to iterate upon to ensure that the product is stable and as
+free of bugs as possible.
+
+The automated testing suite can be categorized as two sets of tests: unit tests and integration tests. These are the
+traditional unit and integrations tests as defined by the Apache Maven [lifecycle][3] phases.
+
+# Unit tests
+
+Unit tests can be run by invoking `mvn test` at the root of the Apache Accumulo source tree. For more information see
+the [maven-surefire-plugin docs][4].
+
+The unit tests should run rather quickly (order of minutes for the entire project) and, in nearly all cases, do not
+require any noticable amount of computer resources (the compilation of the files typically exceeds running the tests).
+Maven will automatically generate a report for each unit test run and will give a summary at the end of each Maven
+module for the total run/failed/errored/skipped tests.
+
+The Apache Accumulo developers expect that these tests are always passing on every revision of the code. If this is not
+the case, it is almost certainly in error.
+
+# Integration tests
+
+Integration tests can be run by invoking `mvn verify` at the root of the Apache Accumulo source tree. For more
+information see the [maven-failsafe-plugin docs][5].
+
+The integration tests are medium length tests (order minutes for each test class and order hours for the complete suite)
+but are checking for regressions that were previously seen in the codebase. These tests do require a noticable amount of
+resources, at least another gigabyte of memory over what Maven itself requires. As such, it's recommended to have at
+least 3-4GB of free memory and 10GB of free disk space.
+
+## Accumulo for testing
+
+The primary reason these tests take so much longer than the unit tests is that most are using an Accumulo instance to
+perform the test. It's a necessary evil; however, there are things we can do to improve this.
+
+## MiniAccumuloCluster
+
+By default, these tests will use a MiniAccumuloCluster which is a multi-process "implementation" of Accumulo, managed
+through Java interfaces. This MiniAccumuloCluster has the ability to use the local filesystem or Apache Hadoop's
+MiniDFSCluster, as well as starting one to many tablet servers. MiniAccumuloCluster tends to be a very useful tool in
+that it can automatically provide a workable instance that mimics how an actual deployment functions.
+
+The downside of using MiniAccumuloCluster is that a significant portion of each test is now devoted to starting and
+stopping the MiniAccumuloCluster. While this is a surefire way to isolate tests from interferring with one another, it
+increases the actual runtime of the test by, on average, 10x.
+
+## Standalone Cluster
+
+An alternative to the MiniAccumuloCluster for testing, a standalone Accumulo cluster can also be configured for use by
+most tests. This requires a manual step of building and deploying the Accumulo cluster by hand. The build can then be
+configured to use this cluster instead of always starting a MiniAccumuloCluster. Not all of the integration tests are
+good candidates to run against a standalone Accumulo cluster, these tests will still launch a MiniAccumuloCluster for
+their use.
+
+Use of a standalone cluster can be enabled using system properties on the Maven command line or, more concisely, by
+providing a Java properties file on the Maven command line. The use of a properties file is recommended since it is
+typically a fixed file per standalone cluster you want to run the tests against.
+
+### Configuration
+
+The following properties can be used to configure a standalone cluster:
+
+- `accumulo.it.cluster.type`, Required: The type of cluster is being defined (valid options: MINI and STANDALONE)
+- `accumulo.it.cluster.standalone.principal`, Required: Standalone cluster principal (user)
+- `accumulo.it.cluster.standalone.password`, Required: Password for the principal
+- `accumulo.it.cluster.standalone.zookeepers`, Required: ZooKeeper quorum used by the standalone cluster
+- `accumulo.it.cluster.standalone.instance.name`, Required: Accumulo instance name for the cluster
+- `accumulo.it.cluster.standalone.home`, Optional: `ACCUMULO_HOME`
+- `accumulo.it.cluster.standalone.conf`, Optional: `ACCUMULO_CONF_DIR`
+- `accumulo.it.cluster.standalone.hadoop.conf`, Optional: `HADOOP_CONF_DIR`
+
+Each of the above properties can be set on the commandline (-Daccumulo.it.cluster.standalone.principal=root), or the
+collection can be placed into a properties file and referenced using "accumulo.it.cluster.properties". For example, the
+following might be similar to what is executed for a standalone cluster.
+
+ `mvn verify -Daccumulo.it.properties=/home/user/my_cluster.properties`
+
+For the optional properties, each of them will be extracted from the environment if not explicitly provided.
+Specifically, `ACCUMULO_HOME` and `ACCUMULO_CONF_DIR` are used to ensure the correct version of the bundled
+Accumulo scripts are invoked and, in the event that multiple Accumulo processes exist on the same physical machine,
+but for different instances, the correct version is terminated. `HADOOP_CONF_DIR` is used to ensure that the necessary
+files to construct the FileSystem object for the cluster can be constructed (e.g. core-site.xml and hdfs-site.xml).
+
+# Manual Distributed Testing
+
+Apache Accumulo also contains a number of tests which are suitable for running against large clusters for hours to days
+at a time, for example the [Continuous Ingest][1] and [Randomwalk test][2] suites. These all exist in the repository under
+`test/system` and contain their own README files for configuration and use.
+
+[1]: test/system/continuous/README.md
+[2]: test/system/randomwalk/README.md
+[3]: https://maven.apache.org/guides/introduction/introduction-to-the-lifecycle.html
+[4]: http://maven.apache.org/surefire/maven-surefire-plugin/
+[5]: http://maven.apache.org/surefire/maven-failsafe-plugin/
+
http://git-wip-us.apache.org/repos/asf/accumulo/blob/c479f874/assemble/src/main/assemblies/component.xml
----------------------------------------------------------------------
diff --git a/assemble/src/main/assemblies/component.xml b/assemble/src/main/assemblies/component.xml
index 3f18da3..8dfce4d 100644
--- a/assemble/src/main/assemblies/component.xml
+++ b/assemble/src/main/assemblies/component.xml
@@ -248,7 +248,9 @@
<include>CHANGES</include>
<include>LICENSE</include>
<include>NOTICE</include>
- <include>README</include>
+ <include>README.md</include>
+ <include>INSTALL.md</include>
+ <include>BUILD.md</include>
</includes>
</fileSet>
</fileSets>