You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@accumulo.apache.org by ct...@apache.org on 2013/12/13 23:12:55 UTC
[1/2] git commit: ACCUMULO-2011 Escape unescaped underscores to fix
LaTeX
Updated Branches:
refs/heads/1.5.1-SNAPSHOT 5fa1a75e3 -> b65bdd60c
ACCUMULO-2011 Escape unescaped underscores to fix LaTeX
Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/07a0433b
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/07a0433b
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/07a0433b
Branch: refs/heads/1.5.1-SNAPSHOT
Commit: 07a0433bd73295d733bc3430db64de86b4088ea3
Parents: 72106a6
Author: Christopher Tubbs <ct...@apache.org>
Authored: Fri Dec 13 16:05:46 2013 -0500
Committer: Christopher Tubbs <ct...@apache.org>
Committed: Fri Dec 13 16:05:46 2013 -0500
----------------------------------------------------------------------
docs/src/user_manual/chapters/administration.tex | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/accumulo/blob/07a0433b/docs/src/user_manual/chapters/administration.tex
----------------------------------------------------------------------
diff --git a/docs/src/user_manual/chapters/administration.tex b/docs/src/user_manual/chapters/administration.tex
index d0533d9..4723481 100644
--- a/docs/src/user_manual/chapters/administration.tex
+++ b/docs/src/user_manual/chapters/administration.tex
@@ -186,14 +186,14 @@ take some time for particular configurations.
\subsection{Adding a Node}
-Update your \texttt{\$ACCUMULO_HOME/conf/slaves} (or \texttt{\$ACCUMULO_CONF_DIR/slaves}) file to account for the addition.
+Update your \texttt{\$ACCUMULO\_HOME/conf/slaves} (or \texttt{\$ACCUMULO\_CONF\_DIR/slaves}) file to account for the addition.
\begin{verbatim}
$ACCUMULO_HOME/bin/accumulo admin start <host(s)> {<host> ...}
\end{verbatim}
Alternatively, you can ssh to each of the hosts you want to add and run
-\texttt{\$ACCUMULO_HOME/bin/start-here.sh}.
+\texttt{\$ACCUMULO\_HOME/bin/start-here.sh}.
Make sure the host in question has the new configuration, or else the tablet
server won't start; at a minimum this needs to be on the host(s) being added,
@@ -209,9 +209,9 @@ $ACCUMULO_HOME/bin/accumulo admin stop <host(s)> {<host> ...}
\end{verbatim}
Alternatively, you can ssh to each of the hosts you want to remove and run
-\texttt{\$ACCUMULO_HOME/bin/stop-here.sh}.
+\texttt{\$ACCUMULO\_HOME/bin/stop-here.sh}.
-Be sure to update your \texttt{\$ACCUMULO_HOME/conf/slaves} (or \texttt{\$ACCUMULO_CONF_DIR/slaves}) file to
+Be sure to update your \texttt{\$ACCUMUL\_HOME/conf/slaves} (or \texttt{\$ACCUMULO\_CONF\_DIR/slaves}) file to
account for the removal of these hosts. Bear in mind that the monitor will not re-read the
slaves file automatically, so it will report the decomissioned servers as down; it's
recommended that you restart the monitor so that the node list is up to date.
[2/2] git commit: Merge remote-tracking branch
'origin/1.4.5-SNAPSHOT' into 1.5.1-SNAPSHOT
Posted by ct...@apache.org.
Merge remote-tracking branch 'origin/1.4.5-SNAPSHOT' into 1.5.1-SNAPSHOT
Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/b65bdd60
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/b65bdd60
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/b65bdd60
Branch: refs/heads/1.5.1-SNAPSHOT
Commit: b65bdd60cd928e05da91d0b22658fac4c638624f
Parents: 5fa1a75 07a0433
Author: Christopher Tubbs <ct...@apache.org>
Authored: Fri Dec 13 16:46:04 2013 -0500
Committer: Christopher Tubbs <ct...@apache.org>
Committed: Fri Dec 13 16:46:04 2013 -0500
----------------------------------------------------------------------
.../latex/accumulo_user_manual/chapters/administration.tex | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/accumulo/blob/b65bdd60/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
----------------------------------------------------------------------
diff --cc docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
index 086f41c,0000000..204f549
mode 100644,000000..100644
--- a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
+++ b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
@@@ -1,394 -1,0 +1,394 @@@
+
+% Licensed to the Apache Software Foundation (ASF) under one or more
+% contributor license agreements. See the NOTICE file distributed with
+% this work for additional information regarding copyright ownership.
+% The ASF licenses this file to You under the Apache License, Version 2.0
+% (the "License"); you may not use this file except in compliance with
+% the License. You may obtain a copy of the License at
+%
+% http://www.apache.org/licenses/LICENSE-2.0
+%
+% Unless required by applicable law or agreed to in writing, software
+% distributed under the License is distributed on an "AS IS" BASIS,
+% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+% See the License for the specific language governing permissions and
+% limitations under the License.
+
+\chapter{Administration}
+
+\section{Hardware}
+
+Because we are running essentially two or three systems simultaneously layered
+across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
+consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have
+at least one core and 2 - 4 GB each.
+
+One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
+typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks.
+
+It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB
+each, but in this case it is recommended to only run up to two processes per
+machine - i.e. DataNode and TabletServer or DataNode and MapReduce worker but
+not all three. The constraint here is having enough available heap space for all the
+processes on a machine.
+
+\section{Network}
+
+Accumulo communicates via remote procedure calls over TCP/IP for both passing
+data and control messages. In addition, Accumulo uses HDFS clients to
+communicate with HDFS. To achieve good ingest and query performance, sufficient
+network bandwidth must be available between any two machines.
+
+\section{Installation}
+Choose a directory for the Accumulo installation. This directory will be referenced
+by the environment variable \texttt{\$ACCUMULO\_HOME}. Run the following:
+
+\small
+\begin{verbatim}
+$ tar xzf accumulo-1.5.0-bin.tar.gz # unpack to subdirectory
+$ mv accumulo-1.5.0 $ACCUMULO_HOME # move to desired location
+\end{verbatim}
+\normalsize
+
+Repeat this step at each machine within the cluster. Usually all machines have the
+same \texttt{\$ACCUMULO\_HOME}.
+
+\section{Dependencies}
+Accumulo requires HDFS and ZooKeeper to be configured and running
+before starting. Password-less SSH should be configured between at least the
+Accumulo master and TabletServer machines. It is also a good idea to run Network
+Time Protocol (NTP) within the cluster to ensure nodes' clocks don't get too out of
+sync, which can cause problems with automatically timestamped data.
+
+\section{Configuration}
+
+Accumulo is configured by editing several Shell and XML files found in
+\texttt{\$ACCUMULO\_HOME/conf}. The structure closely resembles Hadoop's configuration
+files.
+
+\subsection{Edit conf/accumulo-env.sh}
+
+Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh
+and specify the following:
+
+\begin{enumerate}
+\item{Enter the location of the installation directory of Accumulo for \texttt{\$ACCUMULO\_HOME}}
+\item{Enter your system's Java home for \texttt{\$JAVA\_HOME}}
+\item{Enter the location of Hadoop for \texttt{\$HADOOP\_PREFIX}}
+\item{Choose a location for Accumulo logs and enter it for \texttt{\$ACCUMULO\_LOG\_DIR}}
+\item{Enter the location of ZooKeeper for \texttt{\$ZOOKEEPER\_HOME}}
+\end{enumerate}
+
+By default Accumulo TabletServers are set to use 1GB of memory. You may change
+this by altering the value of \texttt{\$ACCUMULO\_TSERVER\_OPTS}. Note the syntax is that of
+the Java JVM command line options. This value should be less than the physical
+memory of the machines running TabletServers.
+
+There are similar options for the master's memory usage and the garbage collector
+process. Reduce these if they exceed the physical RAM of your hardware and
+increase them, within the bounds of the physical RAM, if a process fails because of
+insufficient memory.
+
+Note that you will be specifying the Java heap space in accumulo-env.sh. You should
+make sure that the total heap space used for the Accumulo tserver and the Hadoop
+DataNode and TaskTracker is less than the available memory on each slave node in
+the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop
+NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
+machines to allow them to use more heap space. If you are running these on the
+same machine on a small cluster, likewise make sure their heap space settings fit
+within the available memory.
+
+\subsection{Native Map}
+
+The tablet server uses a data structure called a MemTable to store sorted key/value
+pairs in memory when they are first received from the client. When a minor compaction
+occurs, this data structure is written to HDFS. The MemTable will default to using
+memory in the JVM but a JNI version, called the native map, can be used to significantly
+speed up performance by utilizing the memory space of the native operating system. The
+native map also avoids the performance implications brought on by garbage collection
+in the JVM by causing it to pause much less frequently.
+
+32-bit and 64-bit Linux versions of the native map ship with the Accumulo dist package.
+For other operating systems, the native map can be built from the codebase in two ways-
+from maven or from the Makefile.
+
+\begin{enumerate}
+\item{Build from maven using the following command: \texttt{mvn clean package -Pnative.}}
+\item{Build from the c++ source by running \texttt{make} in the \texttt{\$ACCUMULO\_HOME/server/src/main/c++} directory.}
+\end{enumerate}
+
+After building the native map from the source, you will find the artifact in
+\texttt{\$ACCUMULO\_HOME/lib/native.} Upon starting up, the tablet server will look
+in this directory for the map library. If the file is renamed or moved from its
+target directory, the tablet server may not be able to find it.
+
+\subsection{Cluster Specification}
+
+On the machine that will serve as the Accumulo master:
+
+\begin{enumerate}
+\item{Write the IP address or domain name of the Accumulo Master to the\\\texttt{\$ACCUMULO\_HOME/conf/masters} file.}
+\item{Write the IP addresses or domain name of the machines that will be TabletServers in\\\texttt{\$ACCUMULO\_HOME/conf/slaves}, one per line.}
+\end{enumerate}
+
+Note that if using domain names rather than IP addresses, DNS must be configured
+properly for all machines participating in the cluster. DNS can be a confusing source
+of errors.
+
+\subsection{Accumulo Settings}
+Specify appropriate values for the following settings in\\
+\texttt{\$ACCUMULO\_HOME/conf/accumulo-site.xml} :
+
+\small
+\begin{verbatim}
+<property>
+ <name>zookeeper</name>
+ <value>zooserver-one:2181,zooserver-two:2181</value>
+ <description>list of zookeeper servers</description>
+</property>
+\end{verbatim}
+\normalsize
+
+This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate
+settings between processes and helps finalize TabletServer failure.
+
+
+\small
+\begin{verbatim}
+<property>
+ <name>instance.secret</name>
+ <value>DEFAULT</value>
+</property>
+\end{verbatim}
+\normalsize
+
+The instance needs a secret to enable secure communication between servers. Configure your
+secret and make sure that the \texttt{accumulo-site.xml} file is not readable to other users.
+
+Some settings can be modified via the Accumulo shell and take effect immediately, but
+some settings require a process restart to take effect. See the configuration documentation
+(available on the monitor web pages) for details.
+
+\subsection{Deploy Configuration}
+
+Copy the masters, slaves, accumulo-env.sh, and if necessary, accumulo-site.xml
+from the\\\texttt{\$ACCUMULO\_HOME/conf/} directory on the master to all the machines
+specified in the slaves file.
+
+\section{Initialization}
+
+Accumulo must be initialized to create the structures it uses internally to locate
+data across the cluster. HDFS is required to be configured and running before
+Accumulo can be initialized.
+
+Once HDFS is started, initialization can be performed by executing\\
+\texttt{\$ACCUMULO\_HOME/bin/accumulo init} . This script will prompt for a name
+for this instance of Accumulo. The instance name is used to identify a set of tables
+and instance-specific settings. The script will then write some information into
+HDFS so Accumulo can start properly.
+
+The initialization script will prompt you to set a root password. Once Accumulo is
+initialized it can be started.
+
+\section{Running}
+
+\subsection{Starting Accumulo}
+
+Make sure Hadoop is configured on all of the machines in the cluster, including
+access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
+Make sure ZooKeeper is configured and running on at least one machine in the
+cluster.
+Start Accumulo using the \texttt{bin/start-all.sh} script.
+
+To verify that Accumulo is running, check the Status page as described under
+\emph{Monitoring}. In addition, the Shell can provide some information about the status of
+tables via reading the !METADATA table.
+
+\subsection{Stopping Accumulo}
+
+To shutdown cleanly, run \texttt{bin/stop-all.sh} and the master will orchestrate the
+shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may
+take some time for particular configurations.
+
+\subsection{Adding a Node}
+
- Update your \texttt{\$ACCUMULO_HOME/conf/slaves} (or \texttt{\$ACCUMULO_CONF_DIR/slaves}) file to account for the addition.
++Update your \texttt{\$ACCUMULO\_HOME/conf/slaves} (or \texttt{\$ACCUMULO\_CONF\_DIR/slaves}) file to account for the addition.
+
+\begin{verbatim}
+$ACCUMULO_HOME/bin/accumulo admin start <host(s)> {<host> ...}
+\end{verbatim}
+
+Alternatively, you can ssh to each of the hosts you want to add and run
- \texttt{\$ACCUMULO_HOME/bin/start-here.sh}.
++\texttt{\$ACCUMULO\_HOME/bin/start-here.sh}.
+
+Make sure the host in question has the new configuration, or else the tablet
+server won't start; at a minimum this needs to be on the host(s) being added,
+but in practice it's good to ensure consistent configuration across all nodes.
+
+\subsection{Decomissioning a Node}
+
+If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet
+server. Accumulo will automatically rebalance the tablets across the available tablet servers.
+
+\begin{verbatim}
+$ACCUMULO_HOME/bin/accumulo admin stop <host(s)> {<host> ...}
+\end{verbatim}
+
+Alternatively, you can ssh to each of the hosts you want to remove and run
- \texttt{\$ACCUMULO_HOME/bin/stop-here.sh}.
++\texttt{\$ACCUMULO\_HOME/bin/stop-here.sh}.
+
- Be sure to update your \texttt{\$ACCUMULO_HOME/conf/slaves} (or \texttt{\$ACCUMULO_CONF_DIR/slaves}) file to
++Be sure to update your \texttt{\$ACCUMUL\_HOME/conf/slaves} (or \texttt{\$ACCUMULO\_CONF\_DIR/slaves}) file to
+account for the removal of these hosts. Bear in mind that the monitor will not re-read the
+slaves file automatically, so it will report the decomissioned servers as down; it's
+recommended that you restart the monitor so that the node list is up to date.
+
+\section{Monitoring}
+
+The Accumulo Master provides an interface for monitoring the status and health of
+Accumulo components. This interface can be accessed by pointing a web browser to\\
+\texttt{http://accumulomaster:50095/status}
+
+\section{Tracing}
+It can be difficult to determine why some operations are taking longer
+than expected. For example, you may be looking up items with very low
+latency, but sometimes the lookups take much longer. Determining the
+cause of the delay is difficult because the system is distributed, and
+the typical lookup is fast.
+
+Accumulo has been instrumented to record the time that various
+operations take when tracing is turned on. The fact that tracing is
+enabled follows all the requests made on behalf of the user throughout
+the distributed infrastructure of accumulo, and across all threads of
+execution.
+
+These time spans will be inserted into the \texttt{trace} table in
+Accumulo. You can browse recent traces from the Accumulo monitor
+page. You can also read the \texttt{trace} table directly like any
+other table.
+
+The design of Accumulo's distributed tracing follows that of
+\href{http://research.google.com/pubs/pub36356.html}{Google's Dapper}.
+
+\subsection{Tracers}
+To collect traces, Accumulo needs at least one server listed in
+\\\texttt{\$ACCUMULO\_HOME/conf/tracers}. The server collects traces
+from clients and writes them to the \texttt{trace} table. The Accumulo
+user that the tracer connects to Accumulo with can be configured with
+the following properties
+
+\begin{verbatim}
+trace.user
+trace.token.property.password
+\end{verbatim}
+
+\subsection{Instrumenting a Client}
+Tracing can be used to measure a client operation, such as a scan, as
+the operation traverses the distributed system. To enable tracing for
+your application call
+
+\begin{verbatim}
+DistributedTrace.enable(instance, new ZooReader(instance), hostname, "myApplication");
+\end{verbatim}
+
+Once tracing has been enabled, a client can wrap an operation in a trace.
+
+\begin{verbatim}
+Trace.on("Client Scan");
+BatchScanner scanner = conn.createBatchScanner(...);
+// Configure your scanner
+for (Entry entry : scanner) {
+}
+Trace.off();
+\end{verbatim}
+
+Additionally, the user can create additional Spans within a Trace.
+\begin{verbatim}
+Trace.on("Client Update");
+...
+Span readSpan = Trace.start("Read");
+...
+readSpan.stop();
+...
+Span writeSpan = Trace.start("Write");
+...
+writeSpan.stop();
+Trace.off();
+\end{verbatim}
+
+Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace.
+\begin{verbatim}
+...
+int numberOfEntriesRead = 0;
+Span readSpan = Trace.start("Read");
+// Do the read, update the counter
+...
+readSpan.data("Number of Entries Read", String.valueOf(numberOfEntriesRead));
+\end{verbatim}
+
+Some client operations may have a high volume within your
+application. As such, you may wish to only sample a percentage of
+operations for tracing. As seen below, the CountSampler can be used to
+help enable tracing for 1-in-1000 operations
+\begin{verbatim}
+Sampler sampler = new CountSampler(1000);
+...
+if (sampler.next()) {
+ Trace.on("Read");
+}
+...
+Trace.offNoFlush();
+\end{verbatim}
+
+It should be noted that it is safe to turn off tracing even if it
+isn't currently active. The Trace.offNoFlush() should be used if the
+user does not wish to have Trace.off() block while flushing trace
+data.
+
+\subsection{Viewing Collected Traces}
+To view collected traces, use the "Recent Traces" link on the Monitor
+UI. You can also programmatically access and print traces using the
+\texttt{TraceDump} class.
+
+\subsection{Tracing from the Shell}
+You can enable tracing for operations run from the shell by using the
+\texttt{trace on} and \texttt{trace off} commands.
+
+\begin{verbatim}
+root@test test> trace on
+root@test test> scan
+a b:c [] d
+root@test test> trace off
+Waiting for trace information
+Waiting for trace information
+Trace started at 2013/08/26 13:24:08.332
+Time Start Service@Location Name
+ 3628+0 shell@localhost shell:root
+ 8+1690 shell@localhost scan
+ 7+1691 shell@localhost scan:location
+ 6+1692 tserver@localhost startScan
+ 5+1692 tserver@localhost tablet read ahead 6
+\end{verbatim}
+
+\section{Logging}
+Accumulo processes each write to a set of log files. By default these are found under\\
+\texttt{\$ACCUMULO/logs/}.
+
+\section{Recovery}
+
+In the event of TabletServer failure or error on shutting Accumulo down, some
+mutations may not have been minor compacted to HDFS properly. In this case,
+Accumulo will automatically reapply such mutations from the write-ahead log
+either when the tablets from the failed server are reassigned by the Master, in the
+case of a single TabletServer failure or the next time Accumulo starts, in the event of
+failure during shutdown.
+
+Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing
+updates. The sort status of each file is displayed on
+Accumulo monitor status page. Once the recovery is complete any
+tablets involved should return to an ``online" state. Until then those tablets will be
+unavailable to clients.
+
+The Accumulo client library is configured to retry failed mutations and in many
+cases clients will be able to continue processing after the recovery process without
+throwing an exception.
+