You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@accumulo.apache.org by el...@apache.org on 2014/12/08 04:09:38 UTC

[1/3] accumulo git commit: ACCUMULO-3308 Expand docs on configuration of IMM.

Repository: accumulo
Updated Branches:
  refs/heads/1.6 a3267d3e7 -> 359f851e5
  refs/heads/master 53991fe7b -> 7580952a7


ACCUMULO-3308 Expand docs on configuration of IMM.

Expand on how the minc threshold, IMM size and WAL size
must be altered together. Additionally, mention how
the HDFS block size for the WAL should be increased to 2GB
when the WAL size is increased over 2GB.


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/359f851e
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/359f851e
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/359f851e

Branch: refs/heads/1.6
Commit: 359f851e57e0a5765352988e2db5c495ffe8d552
Parents: a3267d3
Author: Josh Elser <el...@apache.org>
Authored: Sun Dec 7 21:44:54 2014 -0500
Committer: Josh Elser <el...@apache.org>
Committed: Sun Dec 7 21:44:54 2014 -0500

----------------------------------------------------------------------
 .../chapters/administration.tex                 | 30 ++++++++++++++++++++
 1 file changed, 30 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/359f851e/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
index d524def..e326cb0 100644
--- a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
+++ b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
@@ -143,6 +143,8 @@ speed up performance by utilizing the memory space of the native operating syste
 native map also avoids the performance implications brought on by garbage collection
 in the JVM by causing it to pause much less frequently.
 
+\subsubsection{Building}
+
 32-bit and 64-bit Linux and Mac OS X versions of the native map can be built
 from the Accumulo bin package by executing
 \texttt{\$ACCUMULO\_HOME/bin/build\_native\_library.sh}. If your system's
@@ -164,6 +166,34 @@ target directory, the tablet server may not be able to find it. The system can
 also locate the native maps shared library by setting LD\_LIBRARY\_PATH
 (or DYLD\_LIBRARY\_PATH on Mac OS X) in \texttt{\$ACCUMULO\_HOME/conf/accumulo-env.sh}.
 
+\subsubsection{Configuration}
+
+As mentioned, Accumulo will use the native libraries if they are found in the expected
+location and if it is not configured to ignore them. Using the native maps over JVM
+Maps nets a noticable improvement in ingest rates; however, certain configuration
+variables are important to modify when increasing the size of the native map.
+
+To adjust the size of the native map, increase the value of \texttt{tserver.memory.maps.max}.
+By default, the maximum size of the native map is 1GB. When increasing this value, it is
+also important to adjust the values of \texttt{table.compaction.minor.logs.threshold} and
+\texttt{tserver.walog.max.size}. \texttt{table.compaction.minor.logs.threshold} is the maximum
+number of write-ahead log files that a tablet can reference before they will be automatically
+minor compacted. \texttt{tserver.walog.max.size} is the maximum size of a write-ahead log.
+
+The maximum size of the native maps for a server should be less than the product
+of the write-ahead log maximum size and minor compaction threshold for log files:
+
+\(\$table.compaction.minor.logs.threshold * \$tserver.walog.max.size >= \$tserver.memory.maps.max\)
+
+This formula ensures that minor compactions won't be automatically triggered before the native
+maps can be completely saturated.
+
+Subsequently, when increasing the size of the write-ahead logs, it can also be important
+to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log.
+This is controlled via \texttt{tserver.wal.blocksize}. A basic recommendation is that when
+\texttt{tserver.walog.max.size} is larger than 2GB in size, set \texttt{tserver.wal.blocksize}
+to 2GB.
+
 \subsection{Cluster Specification}
 
 On the machine that will serve as the Accumulo master:

[3/3] accumulo git commit: Merge branch '1.6'

Posted by el...@apache.org.

Merge branch '1.6'

Conflicts:
	docs/src/main/latex/accumulo_user_manual/chapters/administration.tex


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/7580952a
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/7580952a
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/7580952a

Branch: refs/heads/master
Commit: 7580952a783a5c585a638c5abf811be3131d62d4
Parents: 53991fe 359f851
Author: Josh Elser <el...@apache.org>
Authored: Sun Dec 7 22:08:17 2014 -0500
Committer: Josh Elser <el...@apache.org>
Committed: Sun Dec 7 22:08:17 2014 -0500

----------------------------------------------------------------------
 .../main/asciidoc/chapters/administration.txt   | 29 ++++++++++++++++++++
 1 file changed, 29 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/7580952a/docs/src/main/asciidoc/chapters/administration.txt
----------------------------------------------------------------------
diff --cc docs/src/main/asciidoc/chapters/administration.txt
index bf0b0d1,0000000..e0e1535
mode 100644,000000..100644
--- a/docs/src/main/asciidoc/chapters/administration.txt
+++ b/docs/src/main/asciidoc/chapters/administration.txt
@@@ -1,602 -1,0 +1,631 @@@
 +// Licensed to the Apache Software Foundation (ASF) under one or more
 +// contributor license agreements.  See the NOTICE file distributed with
 +// this work for additional information regarding copyright ownership.
 +// The ASF licenses this file to You under the Apache License, Version 2.0
 +// (the "License"); you may not use this file except in compliance with
 +// the License.  You may obtain a copy of the License at
 +//
 +//     http://www.apache.org/licenses/LICENSE-2.0
 +//
 +// Unless required by applicable law or agreed to in writing, software
 +// distributed under the License is distributed on an "AS IS" BASIS,
 +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 +// See the License for the specific language governing permissions and
 +// limitations under the License.
 +
 +== Administration
 +
 +=== Hardware
 +
 +Because we are running essentially two or three systems simultaneously layered
 +across the cluster: HDFS, Accumulo and MapReduce, it is typical for hardware to
 +consist of 4 to 8 cores, and 8 to 32 GB RAM. This is so each running process can have
 +at least one core and 2 - 4 GB each.
 +
 +One core running HDFS can typically keep 2 to 4 disks busy, so each machine may
 +typically have as little as 2 x 300GB disks and as much as 4 x 1TB or 2TB disks.
 +
 +It is possible to do with less than this, such as with 1u servers with 2 cores and 4GB
 +each, but in this case it is recommended to only run up to two processes per
 +machine -- i.e. DataNode and TabletServer or DataNode and MapReduce worker but
 +not all three. The constraint here is having enough available heap space for all the
 +processes on a machine.
 +
 +=== Network
 +
 +Accumulo communicates via remote procedure calls over TCP/IP for both passing
 +data and control messages. In addition, Accumulo uses HDFS clients to
 +communicate with HDFS. To achieve good ingest and query performance, sufficient
 +network bandwidth must be available between any two machines.
 +
 +In addition to needing access to ports associated with HDFS and ZooKeeper, Accumulo will
 +use the following default ports. Please make sure that they are open, or change
 +their value in conf/accumulo-site.xml.
 +
 +.Accumulo default ports
 +[width="75%",cols=">,^2,^2"]
 +[options="header"]
 +|====
 +|Port | Description | Property Name
 +|4445 | Shutdown Port (Accumulo MiniCluster) | n/a
 +|4560 | Accumulo monitor (for centralized log display) | monitor.port.log4j
 +|9997 | Tablet Server | tserver.port.client
 +|9999 | Master Server | master.port.client
 +|12234 | Accumulo Tracer | trace.port.client
 +|42424 | Accumulo Proxy Server | n/a
 +|50091 | Accumulo GC | gc.port.client
 +|50095 | Accumulo HTTP monitor | monitor.port.client
 +|10001 | Master Replication service | master.replication.coordinator.port
 +|10002 | TabletServer Replication service | replication.receipt.service.port
 +|====
 +
 +In addition, the user can provide +0+ and an ephemeral port will be chosen instead. This
 +ephemeral port is likely to be unique and not already bound. Thus, configuring ports to
 +use +0+ instead of an explicit value, should, in most cases, work around any issues of
 +running multiple distinct Accumulo instances (or any other process which tries to use the
 +same default ports) on the same hardware.
 +
 +=== Installation
 +Choose a directory for the Accumulo installation. This directory will be referenced
 +by the environment variable +$ACCUMULO_HOME+. Run the following:
 +
 +  $ tar xzf accumulo-1.6.0-bin.tar.gz    # unpack to subdirectory
 +  $ mv accumulo-1.6.0 $ACCUMULO_HOME # move to desired location
 +
 +Repeat this step at each machine within the cluster. Usually all machines have the
 +same +$ACCUMULO_HOME+.
 +
 +=== Dependencies
 +Accumulo requires HDFS and ZooKeeper to be configured and running
 +before starting. Password-less SSH should be configured between at least the
 +Accumulo master and TabletServer machines. It is also a good idea to run Network
 +Time Protocol (NTP) within the cluster to ensure nodes' clocks don't get too out of
 +sync, which can cause problems with automatically timestamped data.
 +
 +=== Configuration
 +
 +Accumulo is configured by editing several Shell and XML files found in
 ++$ACCUMULO_HOME/conf+. The structure closely resembles Hadoop's configuration
 +files.
 +
 +==== Edit conf/accumulo-env.sh
 +
 +Accumulo needs to know where to find the software it depends on. Edit accumulo-env.sh
 +and specify the following:
 +
 +. Enter the location of the installation directory of Accumulo for +$ACCUMULO_HOME+
 +. Enter your system's Java home for +$JAVA_HOME+
 +. Enter the location of Hadoop for +$HADOOP_PREFIX+
 +. Choose a location for Accumulo logs and enter it for +$ACCUMULO_LOG_DIR+
 +. Enter the location of ZooKeeper for +$ZOOKEEPER_HOME+
 +
 +By default Accumulo TabletServers are set to use 1GB of memory. You may change
 +this by altering the value of +$ACCUMULO_TSERVER_OPTS+. Note the syntax is that of
 +the Java JVM command line options. This value should be less than the physical
 +memory of the machines running TabletServers.
 +
 +There are similar options for the master's memory usage and the garbage collector
 +process. Reduce these if they exceed the physical RAM of your hardware and
 +increase them, within the bounds of the physical RAM, if a process fails because of
 +insufficient memory.
 +
 +Note that you will be specifying the Java heap space in accumulo-env.sh. You should
 +make sure that the total heap space used for the Accumulo tserver and the Hadoop
 +DataNode and TaskTracker is less than the available memory on each slave node in
 +the cluster. On large clusters, it is recommended that the Accumulo master, Hadoop
 +NameNode, secondary NameNode, and Hadoop JobTracker all be run on separate
 +machines to allow them to use more heap space. If you are running these on the
 +same machine on a small cluster, likewise make sure their heap space settings fit
 +within the available memory.
 +
 +==== Native Map
 +
 +The tablet server uses a data structure called a MemTable to store sorted key/value
 +pairs in memory when they are first received from the client. When a minor compaction
 +occurs, this data structure is written to HDFS. The MemTable will default to using
 +memory in the JVM but a JNI version, called the native map, can be used to significantly
 +speed up performance by utilizing the memory space of the native operating system. The
 +native map also avoids the performance implications brought on by garbage collection
 +in the JVM by causing it to pause much less frequently.
 +
++===== Building
++
 +32-bit and 64-bit Linux and Mac OS X versions of the native map can be built
 +from the Accumulo bin package by executing
 ++$ACCUMULO_HOME/bin/build_native_library.sh+. If your system's
 +default compiler options are insufficient, you can add additional compiler
 +options to the command line, such as options for the architecture. These will be
 +passed to the Makefile in the environment variable +USERFLAGS+.
 +
 +Examples:
 +
 +. +$ACCUMULO_HOME/bin/build_native_library.sh+
 +. +$ACCUMULO_HOME/bin/build_native_library.sh -m32+
 +
 +After building the native map from the source, you will find the artifact in
 ++$ACCUMULO_HOME/lib/native+. Upon starting up, the tablet server will look
 +in this directory for the map library. If the file is renamed or moved from its
 +target directory, the tablet server may not be able to find it. The system can
 +also locate the native maps shared library by setting +LD_LIBRARY_PATH+
 +(or +DYLD_LIBRARY_PATH+ on Mac OS X) in +$ACCUMULO_HOME/conf/accumulo-env.sh+.
 +
++===== Configuration
++
++As mentioned, Accumulo will use the native libraries if they are found in the expected
++location and if it is not configured to ignore them. Using the native maps over JVM
++Maps nets a noticable improvement in ingest rates; however, certain configuration
++variables are important to modify when increasing the size of the native map.
++
++To adjust the size of the native map, increase the value of +tserver.memory.maps.max+.
++By default, the maximum size of the native map is 1GB. When increasing this value, it is
++also important to adjust the values of +table.compaction.minor.logs.threshold+ and
+++tserver.walog.max.size+. +table.compaction.minor.logs.threshold+ is the maximum
++number of write-ahead log files that a tablet can reference before they will be automatically
++minor compacted. +tserver.walog.max.size+ is the maximum size of a write-ahead log.
++
++The maximum size of the native maps for a server should be less than the product
++of the write-ahead log maximum size and minor compaction threshold for log files:
++
+++$table.compaction.minor.logs.threshold * $tserver.walog.max.size >= $tserver.memory.maps.max+
++
++This formula ensures that minor compactions won't be automatically triggered before the native
++maps can be completely saturated.
++
++Subsequently, when increasing the size of the write-ahead logs, it can also be important
++to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log.
++This is controlled via +tserver.wal.blocksize+. A basic recommendation is that when
+++tserver.walog.max.size+ is larger than 2GB in size, set +tserver.wal.blocksize+ to 2GB.
++
 +==== Cluster Specification
 +
 +On the machine that will serve as the Accumulo master:
 +
 +. Write the IP address or domain name of the Accumulo Master to the +$ACCUMULO_HOME/conf/masters+ file.
 +. Write the IP addresses or domain name of the machines that will be TabletServers in +$ACCUMULO_HOME/conf/slaves+, one per line.
 +
 +Note that if using domain names rather than IP addresses, DNS must be configured
 +properly for all machines participating in the cluster. DNS can be a confusing source
 +of errors.
 +
 +==== Accumulo Settings
 +Specify appropriate values for the following settings in
 ++$ACCUMULO_HOME/conf/accumulo-site.xml+ :
 +
 +[source,xml]
 +<property>
 +    <name>instance.zookeeper.host</name>
 +    <value>zooserver-one:2181,zooserver-two:2181</value>
 +    <description>list of zookeeper servers</description>
 +</property>
 +
 +This enables Accumulo to find ZooKeeper. Accumulo uses ZooKeeper to coordinate
 +settings between processes and helps finalize TabletServer failure.
 +
 +[source,xml]
 +<property>
 +    <name>instance.secret</name>
 +    <value>DEFAULT</value>
 +</property>
 +
 +The instance needs a secret to enable secure communication between servers. Configure your
 +secret and make sure that the +accumulo-site.xml+ file is not readable to other users.
 +For alternatives to storing the +instance.secret+ in plaintext, please read the
 ++Sensitive Configuration Values+ section.
 +
 +Some settings can be modified via the Accumulo shell and take effect immediately, but
 +some settings require a process restart to take effect. See the configuration documentation
 +(available in the docs directory of the tarball and in <<configuration>>) for details.
 +
 +==== Deploy Configuration
 +
 +Copy the masters, slaves, accumulo-env.sh, and if necessary, accumulo-site.xml
 +from the +$ACCUMULO_HOME/conf/+ directory on the master to all the machines
 +specified in the slaves file.
 +
 +==== Sensitive Configuration Values
 +
 +Accumulo has a number of properties that can be specified via the accumulo-site.xml
 +file which are sensitive in nature, instance.secret and trace.token.property.password
 +are two common examples. Both of these properties, if compromised, have the ability
 +to result in data being leaked to users who should not have access to that data.
 +
 +In Hadoop-2.6.0, a new CredentialProvider class was introduced which serves as a common
 +implementation to abstract away the storage and retrieval of passwords from plaintext
 +storage in configuration files. Any Property marked with the +Sensitive+ annotation
 +is a candidate for use with these CredentialProviders. For version of Hadoop which lack
 +these classes, the feature will just be unavailable for use.
 +
 +A comma separated list of CredentialProviders can be configured using the Accumulo Property
 ++general.security.credential.provider.paths+. Each configured URL will be consulted
 +when the Configuration object for accumulo-site.xml is accessed.
 +
 +==== Using a JavaKeyStoreCredentialProvider for storage
 +
 +One of the implementations provided in Hadoop-2.6.0 is a Java KeyStore CredentialProvider.
 +Each entry in the KeyStore is the Accumulo Property key name. For example, to store the
 +\texttt{instance.secret}, the following command can be used:
 +
 +  hadoop credential create instance.secret --provider jceks://file/etc/accumulo/conf/accumulo.jceks
 +
 +The command will then prompt you to enter the secret to use and create a keystore in: 
 +
 +  /etc/accumulo/conf/accumulo.jceks
 +
 +Then, accumulo-site.xml must be configured to use this KeyStore as a CredentialProvider:
 +
 +[source,xml]
 +<property>
 +    <name>general.security.credential.provider.paths</name>
 +    <value>jceks://file/etc/accumulo/conf/accumulo.jceks</value>
 +</property>
 +
 +This configuration will then transparently extract the +instance.secret+ from
 +the configured KeyStore and alleviates a human readable storage of the sensitive
 +property.
 +
 +A KeyStore can also be stored in HDFS, which will make the KeyStore readily available to
 +all Accumulo servers. If the local filesystem is used, be aware that each Accumulo server
 +will expect the KeyStore in the same location.
 +
 +==== Custom Table Tags
 +
 +Accumulo has the ability for users to add custom tags to tables.  This allows
 +applications to set application-level metadata about a table.  These tags can be
 +anything from a table description, administrator notes, date created, etc.
 +This is done by naming and setting a property with a prefix +table.custom.*+.
 +
 +=== Initialization
 +
 +Accumulo must be initialized to create the structures it uses internally to locate
 +data across the cluster. HDFS is required to be configured and running before
 +Accumulo can be initialized.
 +
 +Once HDFS is started, initialization can be performed by executing
 ++$ACCUMULO_HOME/bin/accumulo init+ . This script will prompt for a name
 +for this instance of Accumulo. The instance name is used to identify a set of tables
 +and instance-specific settings. The script will then write some information into
 +HDFS so Accumulo can start properly.
 +
 +The initialization script will prompt you to set a root password. Once Accumulo is
 +initialized it can be started.
 +
 +=== Running
 +
 +==== Starting Accumulo
 +
 +Make sure Hadoop is configured on all of the machines in the cluster, including
 +access to a shared HDFS instance. Make sure HDFS and ZooKeeper are running.
 +Make sure ZooKeeper is configured and running on at least one machine in the
 +cluster.
 +Start Accumulo using the +bin/start-all.sh+ script.
 +
 +To verify that Accumulo is running, check the Status page as described in
 +<<monitoring>>. In addition, the Shell can provide some information about the status of
 +tables via reading the metadata tables.
 +
 +==== Stopping Accumulo
 +
 +To shutdown cleanly, run +bin/stop-all.sh+ and the master will orchestrate the
 +shutdown of all the tablet servers. Shutdown waits for all minor compactions to finish, so it may
 +take some time for particular configurations.
 +
 +==== Adding a Node
 +
 +Update your +$ACCUMULO_HOME/conf/slaves+ (or +$ACCUMULO_CONF_DIR/slaves+) file to account for the addition.
 +
 +  $ACCUMULO_HOME/bin/accumulo admin start <host(s)> {<host> ...}
 +
 +Alternatively, you can ssh to each of the hosts you want to add and run:
 +
 +  $ACCUMULO_HOME/bin/start-here.sh
 +
 +Make sure the host in question has the new configuration, or else the tablet
 +server won't start; at a minimum this needs to be on the host(s) being added,
 +but in practice it's good to ensure consistent configuration across all nodes.
 +
 +==== Decomissioning a Node
 +
 +If you need to take a node out of operation, you can trigger a graceful shutdown of a tablet
 +server. Accumulo will automatically rebalance the tablets across the available tablet servers.
 +
 +  $ACCUMULO_HOME/bin/accumulo admin stop <host(s)> {<host> ...}
 +
 +Alternatively, you can ssh to each of the hosts you want to remove and run:
 +
 +  $ACCUMULO_HOME/bin/stop-here.sh
 +
 +Be sure to update your +$ACCUMULO_HOME/conf/slaves+ (or +$ACCUMULO_CONF_DIR/slaves+) file to
 +account for the removal of these hosts. Bear in mind that the monitor will not re-read the
 +slaves file automatically, so it will report the decomissioned servers as down; it's
 +recommended that you restart the monitor so that the node list is up to date.
 +
 +[[monitoring]]
 +=== Monitoring
 +
 +==== Accumulo Monitor
 +The Accumulo Monitor provides an interface for monitoring the status and health of
 +Accumulo components. The Accumulo Monitor provides a web UI for accessing this information at
 ++http://_monitorhost_:50095/+.
 +
 +Things highlighted in yellow may be in need of attention.
 +If anything is highlighted in red on the monitor page, it is something that definitely needs attention.
 +
 +The Overview page contains some summary information about the Accumulo instance, including the version, instance name, and instance ID.
 +There is a table labeled Accumulo Master with current status, a table listing the active Zookeeper servers, and graphs displaying various metrics over time.
 +These include ingest and scan performance and other useful measurements.
 +
 +The Master Server, Tablet Servers, and Tables pages display metrics grouped in different ways (e.g. by tablet server or by table).
 +Metrics typically include number of entries (key/value pairs), ingest and query rates.
 +The number of running scans, major and minor compactions are in the form _number_running_ (_number_queued_).
 +Another important metric is hold time, which is the amount of time a tablet has been waiting but unable to flush its memory in a minor compaction.
 +
 +The Server Activity page graphically displays tablet server status, with each server represented as a circle or square.
 +Different metrics may be assigned to the nodes' color and speed of oscillation.
 +The Overall Avg metric is only used on the Server Activity page, and represents the average of all the other metrics (after normalization).
 +Similarly, the Overall Max metric picks the metric with the maximum normalized value.
 +
 +The Garbage Collector page displays a list of garbage collection cycles, the number of files found of each type (including deletion candidates in use and files actually deleted), and the length of the deletion cycle.
 +The Traces page displays data for recent traces performed (see the following section for information on <<tracing>>).
 +The Recent Logs page displays warning and error logs forwarded to the monitor from all Accumulo processes.
 +Also, the XML and JSON links provide metrics in XML and JSON formats, respectively.
 +
 +==== SSL
 +SSL may be enabled for the monitor page by setting the following properties in the +accumulo-site.xml+ file:
 +
 +  monitor.ssl.keyStore
 +  monitor.ssl.keyStorePassword
 +  monitor.ssl.trustStore
 +  monitor.ssl.trustStorePassword
 +
 +If the Accumulo conf directory has been configured (in particular the +accumulo-env.sh+ file must be set up), the +generate_monitor_certificate.sh+ script in the Accumulo +bin+ directory can be used to create the keystore and truststore files with random passwords.
 +The script will print out the properties that need to be added to the +accumulo-site.xml+ file.
 +The stores can also be generated manually with the Java +keytool+ command, whose usage can be seen in the +generate_monitor_certificate.sh+ script.
 +
 +If desired, the SSL ciphers allowed for connections can be controlled via the following properties in +accumulo-site.xml+:
 +
 +  monitor.ssl.include.ciphers
 +  monitor.ssl.exclude.ciphers
 +
 +If SSL is enabled, the monitor URL can only be accessed via https.
 +This also allows you to access the Accumulo shell through the monitor page.
 +The left navigation bar will have a new link to Shell.
 +An Accumulo user name and password must be entered for access to the shell.
 +
 +=== Metrics
 +Accumulo is capable of using the Hadoop Metrics2 library and is configured by default to use it. Metrics2 is a library
 +which allows for routing of metrics generated by registered MetricsSources to configured MetricsSinks. Examples of sinks
 +that are implemented by Hadoop include file-based logging, Graphite and Ganglia. All metric sources are exposed via JMX
 +when using Metrics2.
 +
 +Previous to Accumulo 1.7.0, JMX endpoints could be exposed in addition to file-based logging of those metrics configured via
 +the +accumulo-metrics.xml+ file. This mechanism can still be used by setting +general.legacy.metrics+ to +true+ in +accumulo-site.xml+.
 +
 +==== Metrics2 Configuration
 +
 +Metrics2 is configured by examining the classpath for a file that matches +hadoop-metrics2*.properties+. The example configuration
 +files that Accumulo provides for use include +hadoop-metrics2-accumulo.properties+ as a template which can be used to enable 
 +file, Graphite or Ganglia sinks (some minimal configuration required for Graphite and Ganglia). Because the Hadoop configuration is
 +also on the Accumulo classpath, be sure that you do not have multiple Metrics2 configuration files. It is recommended to consolidate
 +metrics in a single properties file in a central location to remove ambiguity. The contents of +hadoop-metrics2-accumulo.properties+
 +can be added to a central +hadoop-metrics2.properties+ in +$HADOOP_CONF_DIR+.
 +
 +As a note for configuring the file sink, the provided path should be absolute. A relative path or file name will be created relative
 +to the directory in which the Accumulo process was started. External tools, such as logrotate, can be used to prevent these files
 +from growing without bound.
 +
 +Each server process should have log messages from the Metrics2 library about the sinks that were created. Be sure to check
 +the Accumulo processes log files when debugging missing metrics output.
 +
 +For additional information on configuring Metrics2, visit the
 +https://hadoop.apache.org/docs/current/api/org/apache/hadoop/metrics2/package-summary.html[Javadoc page for Metrics2].
 +
 +[[tracing]]
 +=== Tracing
 +It can be difficult to determine why some operations are taking longer
 +than expected. For example, you may be looking up items with very low
 +latency, but sometimes the lookups take much longer. Determining the
 +cause of the delay is difficult because the system is distributed, and
 +the typical lookup is fast.
 +
 +Accumulo has been instrumented to record the time that various
 +operations take when tracing is turned on. The fact that tracing is
 +enabled follows all the requests made on behalf of the user throughout
 +the distributed infrastructure of accumulo, and across all threads of
 +execution.
 +
 +These time spans will be inserted into the +trace+ table in
 +Accumulo. You can browse recent traces from the Accumulo monitor
 +page. You can also read the +trace+ table directly like any
 +other table.
 +
 +The design of Accumulo's distributed tracing follows that of
 +http://research.google.com/pubs/pub36356.html[Google's Dapper].
 +
 +==== Tracers
 +To collect traces, Accumulo needs at least one server listed in
 + +$ACCUMULO_HOME/conf/tracers+. The server collects traces
 +from clients and writes them to the +trace+ table. The Accumulo
 +user that the tracer connects to Accumulo with can be configured with
 +the following properties
 +
 +  trace.user
 +  trace.token.property.password
 +
 +Other tracer configuration properties include
 +
 +  trace.port.client
 +  trace.table
 +
 +==== Configuring Tracing
 +Traces are collected via SpanReceivers. The default SpanReceiver
 +configured is org.apache.accumulo.core.trace.ZooTraceClient, which
 +sends spans to an Accumulo Tracer process, as discussed in the
 +previous section. This default can be changed to a different span
 +receiver, or additional span receivers can be added in a
 +comma-separated list, by modifying the property
 +
 +  trace.span.receivers
 +
 +Individual span receivers may require their own configuration
 +parameters, which are grouped under the trace.span.receiver.*
 +prefix.  The ZooTraceClient requires the following property that
 +indicates where the tracer servers will register themselves in
 +ZooKeeper.
 +
 +  trace.span.receiver.zookeeper.path
 +
 +This is configured to /tracers by default.  If multiple Accumulo
 +instances are sharing the same ZooKeeper quorum, take care to
 +configure Accumulo with unique values for this property.
 +
 +Hadoop can also be configured to send traces to Accumulo, as of
 +Hadoop 2.6.0, by setting the following properties in Hadoop's
 +core-site.xml file (the path property is optional if left as the
 +default).
 +
 +  <property>
 +    <name>hadoop.htrace.spanreceiver.classes</name>
 +    <value>org.apache.accumulo.core.trace.ZooTraceClient</value>
 +  </property>
 +  <property>
 +    <name>hadoop.tracer.zookeeper.host</name>
 +    <value>zookeeperHost:2181</value>
 +  </property>
 +  <property>
 +    <name>hadoop.tracer.zookeeper.path</name>
 +    <value>/tracers</value>
 +  </property>
 +
 +The accumulo-core, accumulo-trace, and libthrift jars must also
 +be placed on Hadoop's classpath.
 +
 +==== Instrumenting a Client
 +Tracing can be used to measure a client operation, such as a scan, as
 +the operation traverses the distributed system. To enable tracing for
 +your application call
 +
 +[source,java]
 +import org.apache.accumulo.core.trace.DistributedTrace;
 +...
 +DistributedTrace.enable(hostname, "myApplication");
 +// do some tracing
 +...
 +DistributedTrace.disable();
 +
 +Once tracing has been enabled, a client can wrap an operation in a trace.
 +
 +[source,java]
 +import org.htrace.Sampler;
 +import org.htrace.Trace;
 +import org.htrace.TraceScope;
 +...
 +TraceScope scope = Trace.startSpan("Client Scan", Sampler.ALWAYS);
 +BatchScanner scanner = conn.createBatchScanner(...);
 +// Configure your scanner
 +for (Entry entry : scanner) {
 +}
 +scope.close();
 +
 +Additionally, the user can create additional Spans within a Trace.
 +The sampler for the trace should only be specified with the first span, and subsequent spans will be collected depending on whether that first span was sampled.
 +
 +[source,java]
 +TraceScope scope = Trace.startSpan("Client Update", Sampler.ALWAYS);
 +...
 +TraceScope readScope = Trace.startSpan("Read");
 +...
 +readScope.close();
 +...
 +TraceScope writeScope = Trace.startSpan("Write");
 +...
 +writeScope.close();
 +scope.close();
 +
 +Like Dapper, Accumulo tracing supports user defined annotations to associate additional data with a Trace.
 +Checking whether currently tracing is necessary when using a sampler other than Sampler.ALWAYS.
 +
 +[source,java]
 +...
 +int numberOfEntriesRead = 0;
 +TraceScope readScope = Trace.startSpan("Read");
 +// Do the read, update the counter
 +...
 +if (Trace.isTracing)
 +  readScope.getSpan().addKVAnnotation("Number of Entries Read".getBytes(StandardCharsets.UTF_8),
 +      String.valueOf(numberOfEntriesRead).getBytes(StandardCharsets.UTF_8));
 +
 +It is also possible to add timeline annotations to your spans.
 +This associates a string with a given timestamp between the start and stop times for a span.
 +
 +[source,java]
 +...
 +writeScope.getSpan().addTimelineAnnotation("Initiating Flush");
 +
 +Some client operations may have a high volume within your
 +application. As such, you may wish to only sample a percentage of
 +operations for tracing. As seen below, the CountSampler can be used to
 +help enable tracing for 1-in-1000 operations
 +
 +[source,java]
 +import org.htrace.impl.CountSampler;
 +...
 +Sampler sampler = new CountSampler(1000);
 +...
 +TraceScope readScope = Trace.startSpan("Read", sampler);
 +...
 +readScope.close();
 +
 +Remember to close all spans and disable tracing when finished.
 +
 +[source,java]
 +DistributedTrace.disable();
 +
 +==== Viewing Collected Traces
 +To view collected traces, use the "Recent Traces" link on the Monitor
 +UI. You can also programmatically access and print traces using the
 ++TraceDump+ class.
 +
 +==== Tracing from the Shell
 +You can enable tracing for operations run from the shell by using the
 ++trace on+ and +trace off+ commands.
 +
 +----
 +root@test test> trace on
 +
 +root@test test> scan
 +a b:c []    d
 +
 +root@test test> trace off
 +Waiting for trace information
 +Waiting for trace information
 +Trace started at 2013/08/26 13:24:08.332
 +Time  Start  Service@Location       Name
 + 3628+0      shell@localhost shell:root
 +    8+1690     shell@localhost scan
 +    7+1691       shell@localhost scan:location
 +    6+1692         tserver@localhost startScan
 +    5+1692           tserver@localhost tablet read ahead 6
 +----
 +
 +=== Logging
 +Accumulo processes each write to a set of log files. By default these are found under
 ++$ACCUMULO/logs/+.
 +
 +=== Recovery
 +
 +In the event of TabletServer failure or error on shutting Accumulo down, some
 +mutations may not have been minor compacted to HDFS properly. In this case,
 +Accumulo will automatically reapply such mutations from the write-ahead log
 +either when the tablets from the failed server are reassigned by the Master (in the
 +case of a single TabletServer failure) or the next time Accumulo starts (in the event of
 +failure during shutdown).
 +
 +Recovery is performed by asking a tablet server to sort the logs so that tablets can easily find their missing
 +updates. The sort status of each file is displayed on
 +Accumulo monitor status page. Once the recovery is complete any
 +tablets involved should return to an ``online'' state. Until then those tablets will be
 +unavailable to clients.
 +
 +The Accumulo client library is configured to retry failed mutations and in many
 +cases clients will be able to continue processing after the recovery process without
 +throwing an exception.

[2/3] accumulo git commit: ACCUMULO-3308 Expand docs on configuration of IMM.

Posted by el...@apache.org.

ACCUMULO-3308 Expand docs on configuration of IMM.

Expand on how the minc threshold, IMM size and WAL size
must be altered together. Additionally, mention how
the HDFS block size for the WAL should be increased to 2GB
when the WAL size is increased over 2GB.


Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo
Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/359f851e
Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/359f851e
Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/359f851e

Branch: refs/heads/master
Commit: 359f851e57e0a5765352988e2db5c495ffe8d552
Parents: a3267d3
Author: Josh Elser <el...@apache.org>
Authored: Sun Dec 7 21:44:54 2014 -0500
Committer: Josh Elser <el...@apache.org>
Committed: Sun Dec 7 21:44:54 2014 -0500

----------------------------------------------------------------------
 .../chapters/administration.tex                 | 30 ++++++++++++++++++++
 1 file changed, 30 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/accumulo/blob/359f851e/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
----------------------------------------------------------------------
diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
index d524def..e326cb0 100644
--- a/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
+++ b/docs/src/main/latex/accumulo_user_manual/chapters/administration.tex
@@ -143,6 +143,8 @@ speed up performance by utilizing the memory space of the native operating syste
 native map also avoids the performance implications brought on by garbage collection
 in the JVM by causing it to pause much less frequently.
 
+\subsubsection{Building}
+
 32-bit and 64-bit Linux and Mac OS X versions of the native map can be built
 from the Accumulo bin package by executing
 \texttt{\$ACCUMULO\_HOME/bin/build\_native\_library.sh}. If your system's
@@ -164,6 +166,34 @@ target directory, the tablet server may not be able to find it. The system can
 also locate the native maps shared library by setting LD\_LIBRARY\_PATH
 (or DYLD\_LIBRARY\_PATH on Mac OS X) in \texttt{\$ACCUMULO\_HOME/conf/accumulo-env.sh}.
 
+\subsubsection{Configuration}
+
+As mentioned, Accumulo will use the native libraries if they are found in the expected
+location and if it is not configured to ignore them. Using the native maps over JVM
+Maps nets a noticable improvement in ingest rates; however, certain configuration
+variables are important to modify when increasing the size of the native map.
+
+To adjust the size of the native map, increase the value of \texttt{tserver.memory.maps.max}.
+By default, the maximum size of the native map is 1GB. When increasing this value, it is
+also important to adjust the values of \texttt{table.compaction.minor.logs.threshold} and
+\texttt{tserver.walog.max.size}. \texttt{table.compaction.minor.logs.threshold} is the maximum
+number of write-ahead log files that a tablet can reference before they will be automatically
+minor compacted. \texttt{tserver.walog.max.size} is the maximum size of a write-ahead log.
+
+The maximum size of the native maps for a server should be less than the product
+of the write-ahead log maximum size and minor compaction threshold for log files:
+
+\(\$table.compaction.minor.logs.threshold * \$tserver.walog.max.size >= \$tserver.memory.maps.max\)
+
+This formula ensures that minor compactions won't be automatically triggered before the native
+maps can be completely saturated.
+
+Subsequently, when increasing the size of the write-ahead logs, it can also be important
+to increase the HDFS block size that Accumulo uses when creating the files for the write-ahead log.
+This is controlled via \texttt{tserver.wal.blocksize}. A basic recommendation is that when
+\texttt{tserver.walog.max.size} is larger than 2GB in size, set \texttt{tserver.wal.blocksize}
+to 2GB.
+
 \subsection{Cluster Specification}
 
 On the machine that will serve as the Accumulo master: