You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zookeeper.apache.org by an...@apache.org on 2018/07/04 13:11:24 UTC
[04/12] zookeeper git commit: ZOOKEEPER-3022: MAVEN MIGRATION 3.4 -
Iteration 1 - docs, it
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml b/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml
new file mode 100644
index 0000000..d88ddbd
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/zookeeperAdmin.xml
@@ -0,0 +1,1861 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Copyright 2002-2004 The Apache Software Foundation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
+"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
+<article id="bk_Admin">
+ <title>ZooKeeper Administrator's Guide</title>
+
+ <subtitle>A Guide to Deployment and Administration</subtitle>
+
+ <articleinfo>
+ <legalnotice>
+ <para>Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License. You may
+ obtain a copy of the License at <ulink
+ url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+ <para>Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an "AS IS"
+ BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied. See the License for the specific language governing permissions
+ and limitations under the License.</para>
+ </legalnotice>
+
+ <abstract>
+ <para>This document contains information about deploying, administering
+ and mantaining ZooKeeper. It also discusses best practices and common
+ problems.</para>
+ </abstract>
+ </articleinfo>
+
+ <section id="ch_deployment">
+ <title>Deployment</title>
+
+ <para>This section contains information about deploying Zookeeper and
+ covers these topics:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para><xref linkend="sc_systemReq" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_zkMulitServerSetup" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_singleAndDevSetup" /></para>
+ </listitem>
+ </itemizedlist>
+
+ <para>The first two sections assume you are interested in installing
+ ZooKeeper in a production environment such as a datacenter. The final
+ section covers situations in which you are setting up ZooKeeper on a
+ limited basis - for evaluation, testing, or development - but not in a
+ production environment.</para>
+
+ <section id="sc_systemReq">
+ <title>System Requirements</title>
+
+ <section id="sc_supportedPlatforms">
+ <title>Supported Platforms</title>
+
+ <para>ZooKeeper consists of multiple components. Some components are
+ supported broadly, and other components are supported only on a smaller
+ set of platforms.</para>
+
+ <itemizedlist>
+ <listitem>
+ <para><emphasis role="bold">Client</emphasis> is the Java client
+ library, used by applications to connect to a ZooKeeper ensemble.
+ </para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">Server</emphasis> is the Java server
+ that runs on the ZooKeeper ensemble nodes.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">Native Client</emphasis> is a client
+ implemented in C, similar to the Java client, used by applications
+ to connect to a ZooKeeper ensemble.</para>
+ </listitem>
+ <listitem>
+ <para><emphasis role="bold">Contrib</emphasis> refers to multiple
+ optional add-on components.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>The following matrix describes the level of support committed for
+ running each component on different operating system platforms.</para>
+
+ <table>
+ <title>Support Matrix</title>
+ <tgroup cols="5" align="left" colsep="1" rowsep="1">
+ <thead>
+ <row>
+ <entry>Operating System</entry>
+ <entry>Client</entry>
+ <entry>Server</entry>
+ <entry>Native Client</entry>
+ <entry>Contrib</entry>
+ </row>
+ </thead>
+ <tbody>
+ <row>
+ <entry>GNU/Linux</entry>
+ <entry>Development and Production</entry>
+ <entry>Development and Production</entry>
+ <entry>Development and Production</entry>
+ <entry>Development and Production</entry>
+ </row>
+ <row>
+ <entry>Solaris</entry>
+ <entry>Development and Production</entry>
+ <entry>Development and Production</entry>
+ <entry>Not Supported</entry>
+ <entry>Not Supported</entry>
+ </row>
+ <row>
+ <entry>FreeBSD</entry>
+ <entry>Development and Production</entry>
+ <entry>Development and Production</entry>
+ <entry>Not Supported</entry>
+ <entry>Not Supported</entry>
+ </row>
+ <row>
+ <entry>Windows</entry>
+ <entry>Development and Production</entry>
+ <entry>Development and Production</entry>
+ <entry>Not Supported</entry>
+ <entry>Not Supported</entry>
+ </row>
+ <row>
+ <entry>Mac OS X</entry>
+ <entry>Development Only</entry>
+ <entry>Development Only</entry>
+ <entry>Not Supported</entry>
+ <entry>Not Supported</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>For any operating system not explicitly mentioned as supported in
+ the matrix, components may or may not work. The ZooKeeper community
+ will fix obvious bugs that are reported for other platforms, but there
+ is no full support.</para>
+ </section>
+
+ <section id="sc_requiredSoftware">
+ <title>Required Software </title>
+
+ <para>ZooKeeper runs in Java, release 1.6 or greater (JDK 6 or
+ greater). It runs as an <emphasis>ensemble</emphasis> of
+ ZooKeeper servers. Three ZooKeeper servers is the minimum
+ recommended size for an ensemble, and we also recommend that
+ they run on separate machines. At Yahoo!, ZooKeeper is
+ usually deployed on dedicated RHEL boxes, with dual-core
+ processors, 2GB of RAM, and 80GB IDE hard drives.</para>
+ </section>
+
+ </section>
+
+ <section id="sc_zkMulitServerSetup">
+ <title>Clustered (Multi-Server) Setup</title>
+
+ <para>For reliable ZooKeeper service, you should deploy ZooKeeper in a
+ cluster known as an <emphasis>ensemble</emphasis>. As long as a majority
+ of the ensemble are up, the service will be available. Because Zookeeper
+ requires a majority, it is best to use an
+ odd number of machines. For example, with four machines ZooKeeper can
+ only handle the failure of a single machine; if two machines fail, the
+ remaining two machines do not constitute a majority. However, with five
+ machines ZooKeeper can handle the failure of two machines. </para>
+ <note>
+ <para>
+ As mentioned in the
+ <ulink url="zookeeperStarted.html">ZooKeeper Getting Started Guide</ulink>
+ , a minimum of three servers are required for a fault tolerant
+ clustered setup, and it is strongly recommended that you have an
+ odd number of servers.
+ </para>
+ <para>Usually three servers is more than enough for a production
+ install, but for maximum reliability during maintenance, you may
+ wish to install five servers. With three servers, if you perform
+ maintenance on one of them, you are vulnerable to a failure on one
+ of the other two servers during that maintenance. If you have five
+ of them running, you can take one down for maintenance, and know
+ that you're still OK if one of the other four suddenly fails.
+ </para>
+ <para>Your redundancy considerations should include all aspects of
+ your environment. If you have three ZooKeeper servers, but their
+ network cables are all plugged into the same network switch, then
+ the failure of that switch will take down your entire ensemble.
+ </para>
+ </note>
+ <para>Here are the steps to setting a server that will be part of an
+ ensemble. These steps should be performed on every host in the
+ ensemble:</para>
+
+ <orderedlist>
+ <listitem>
+ <para>Install the Java JDK. You can use the native packaging system
+ for your system, or download the JDK from:</para>
+
+ <para><ulink
+ url="http://java.sun.com/javase/downloads/index.jsp">http://java.sun.com/javase/downloads/index.jsp</ulink></para>
+ </listitem>
+
+ <listitem>
+ <para>Set the Java heap size. This is very important to avoid
+ swapping, which will seriously degrade ZooKeeper performance. To
+ determine the correct value, use load tests, and make sure you are
+ well below the usage limit that would cause you to swap. Be
+ conservative - use a maximum heap size of 3GB for a 4GB
+ machine.</para>
+ </listitem>
+
+ <listitem>
+ <para>Install the ZooKeeper Server Package. It can be downloaded
+ from:
+ </para>
+ <para>
+ <ulink url="http://zookeeper.apache.org/releases.html">
+ http://zookeeper.apache.org/releases.html
+ </ulink>
+ </para>
+ </listitem>
+
+ <listitem>
+ <para>Create a configuration file. This file can be called anything.
+ Use the following settings as a starting point:</para>
+
+ <programlisting>
+tickTime=2000
+dataDir=/var/lib/zookeeper/
+clientPort=2181
+initLimit=5
+syncLimit=2
+server.1=zoo1:2888:3888
+server.2=zoo2:2888:3888
+server.3=zoo3:2888:3888</programlisting>
+
+ <para>You can find the meanings of these and other configuration
+ settings in the section <xref linkend="sc_configuration" />. A word
+ though about a few here:</para>
+
+ <para>Every machine that is part of the ZooKeeper ensemble should know
+ about every other machine in the ensemble. You accomplish this with
+ the series of lines of the form <emphasis
+ role="bold">server.id=host:port:port</emphasis>. The parameters <emphasis
+ role="bold">host</emphasis> and <emphasis
+ role="bold">port</emphasis> are straightforward. You attribute the
+ server id to each machine by creating a file named
+ <filename>myid</filename>, one for each server, which resides in
+ that server's data directory, as specified by the configuration file
+ parameter <emphasis role="bold">dataDir</emphasis>.</para></listitem>
+
+ <listitem><para>The myid file
+ consists of a single line containing only the text of that machine's
+ id. So <filename>myid</filename> of server 1 would contain the text
+ "1" and nothing else. The id must be unique within the
+ ensemble and should have a value between 1 and 255.</para>
+ </listitem>
+
+ <listitem>
+ <para>If your configuration file is set up, you can start a
+ ZooKeeper server:</para>
+
+ <para><computeroutput>$ java -cp zookeeper.jar:lib/slf4j-api-1.6.1.jar:lib/slf4j-log4j12-1.6.1.jar:lib/log4j-1.2.15.jar:conf \
+ org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg
+ </computeroutput></para>
+
+ <para>QuorumPeerMain starts a ZooKeeper server,
+ <ulink url="http://java.sun.com/javase/technologies/core/mntr-mgmt/javamanagement/">JMX</ulink>
+ management beans are also registered which allows
+ management through a JMX management console.
+ The <ulink url="zookeeperJMX.html">ZooKeeper JMX
+ document</ulink> contains details on managing ZooKeeper with JMX.
+ </para>
+
+ <para>See the script <emphasis>bin/zkServer.sh</emphasis>,
+ which is included in the release, for an example
+ of starting server instances.</para>
+
+ </listitem>
+
+ <listitem>
+ <para>Test your deployment by connecting to the hosts:</para>
+
+ <para>In Java, you can run the following command to execute
+ simple operations:</para>
+
+ <para><computeroutput>$ bin/zkCli.sh -server 127.0.0.1:2181</computeroutput></para>
+ </listitem>
+ </orderedlist>
+ </section>
+
+ <section id="sc_singleAndDevSetup">
+ <title>Single Server and Developer Setup</title>
+
+ <para>If you want to setup ZooKeeper for development purposes, you will
+ probably want to setup a single server instance of ZooKeeper, and then
+ install either the Java or C client-side libraries and bindings on your
+ development machine.</para>
+
+ <para>The steps to setting up a single server instance are the similar
+ to the above, except the configuration file is simpler. You can find the
+ complete instructions in the <ulink
+ url="zookeeperStarted.html#sc_InstallingSingleMode">Installing and
+ Running ZooKeeper in Single Server Mode</ulink> section of the <ulink
+ url="zookeeperStarted.html">ZooKeeper Getting Started
+ Guide</ulink>.</para>
+
+ <para>For information on installing the client side libraries, refer to
+ the <ulink url="zookeeperProgrammers.html#Bindings">Bindings</ulink>
+ section of the <ulink url="zookeeperProgrammers.html">ZooKeeper
+ Programmer's Guide</ulink>.</para>
+ </section>
+ </section>
+
+ <section id="ch_administration">
+ <title>Administration</title>
+
+ <para>This section contains information about running and maintaining
+ ZooKeeper and covers these topics: </para>
+ <itemizedlist>
+ <listitem>
+ <para><xref linkend="sc_designing" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_provisioning" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_strengthsAndLimitations" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_administering" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_maintenance" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_supervision" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_monitoring" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_logging" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_troubleshooting" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_configuration" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_zkCommands" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_dataFileManagement" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_commonProblems" /></para>
+ </listitem>
+
+ <listitem>
+ <para><xref linkend="sc_bestPractices" /></para>
+ </listitem>
+ </itemizedlist>
+
+ <section id="sc_designing">
+ <title>Designing a ZooKeeper Deployment</title>
+
+ <para>The reliablity of ZooKeeper rests on two basic assumptions.</para>
+ <orderedlist>
+ <listitem><para> Only a minority of servers in a deployment
+ will fail. <emphasis>Failure</emphasis> in this context
+ means a machine crash, or some error in the network that
+ partitions a server off from the majority.</para>
+ </listitem>
+ <listitem><para> Deployed machines operate correctly. To
+ operate correctly means to execute code correctly, to have
+ clocks that work properly, and to have storage and network
+ components that perform consistently.</para>
+ </listitem>
+ </orderedlist>
+
+ <para>The sections below contain considerations for ZooKeeper
+ administrators to maximize the probability for these assumptions
+ to hold true. Some of these are cross-machines considerations,
+ and others are things you should consider for each and every
+ machine in your deployment.</para>
+
+ <section id="sc_CrossMachineRequirements">
+ <title>Cross Machine Requirements</title>
+
+ <para>For the ZooKeeper service to be active, there must be a
+ majority of non-failing machines that can communicate with
+ each other. To create a deployment that can tolerate the
+ failure of F machines, you should count on deploying 2xF+1
+ machines. Thus, a deployment that consists of three machines
+ can handle one failure, and a deployment of five machines can
+ handle two failures. Note that a deployment of six machines
+ can only handle two failures since three machines is not a
+ majority. For this reason, ZooKeeper deployments are usually
+ made up of an odd number of machines.</para>
+
+ <para>To achieve the highest probability of tolerating a failure
+ you should try to make machine failures independent. For
+ example, if most of the machines share the same switch,
+ failure of that switch could cause a correlated failure and
+ bring down the service. The same holds true of shared power
+ circuits, cooling systems, etc.</para>
+ </section>
+
+ <section>
+ <title>Single Machine Requirements</title>
+
+ <para>If ZooKeeper has to contend with other applications for
+ access to resourses like storage media, CPU, network, or
+ memory, its performance will suffer markedly. ZooKeeper has
+ strong durability guarantees, which means it uses storage
+ media to log changes before the operation responsible for the
+ change is allowed to complete. You should be aware of this
+ dependency then, and take great care if you want to ensure
+ that ZooKeeper operations aren’t held up by your media. Here
+ are some things you can do to minimize that sort of
+ degradation:
+ </para>
+
+ <itemizedlist>
+ <listitem>
+ <para>ZooKeeper's transaction log must be on a dedicated
+ device. (A dedicated partition is not enough.) ZooKeeper
+ writes the log sequentially, without seeking Sharing your
+ log device with other processes can cause seeks and
+ contention, which in turn can cause multi-second
+ delays.</para>
+ </listitem>
+
+ <listitem>
+ <para>Do not put ZooKeeper in a situation that can cause a
+ swap. In order for ZooKeeper to function with any sort of
+ timeliness, it simply cannot be allowed to swap.
+ Therefore, make certain that the maximum heap size given
+ to ZooKeeper is not bigger than the amount of real memory
+ available to ZooKeeper. For more on this, see
+ <xref linkend="sc_commonProblems"/>
+ below. </para>
+ </listitem>
+ </itemizedlist>
+ </section>
+ </section>
+
+ <section id="sc_provisioning">
+ <title>Provisioning</title>
+
+ <para></para>
+ </section>
+
+ <section id="sc_strengthsAndLimitations">
+ <title>Things to Consider: ZooKeeper Strengths and Limitations</title>
+
+ <para></para>
+ </section>
+
+ <section id="sc_administering">
+ <title>Administering</title>
+
+ <para></para>
+ </section>
+
+ <section id="sc_maintenance">
+ <title>Maintenance</title>
+
+ <para>Little long term maintenance is required for a ZooKeeper
+ cluster however you must be aware of the following:</para>
+
+ <section>
+ <title>Ongoing Data Directory Cleanup</title>
+
+ <para>The ZooKeeper <ulink url="#var_datadir">Data
+ Directory</ulink> contains files which are a persistent copy
+ of the znodes stored by a particular serving ensemble. These
+ are the snapshot and transactional log files. As changes are
+ made to the znodes these changes are appended to a
+ transaction log. Occasionally, when a log grows large, a
+ snapshot of the current state of all znodes will be written
+ to the filesystem and a new transaction log file is created
+ for future transactions. During snapshotting, ZooKeeper may
+ continue appending incoming transactions to the old log file.
+ Therefore, some transactions which are newer than a snapshot
+ may be found in the last transaction log preceding the
+ snapshot.
+ </para>
+
+ <para>A ZooKeeper server <emphasis role="bold">will not remove
+ old snapshots and log files</emphasis> when using the default
+ configuration (see autopurge below), this is the
+ responsibility of the operator. Every serving environment is
+ different and therefore the requirements of managing these
+ files may differ from install to install (backup for example).
+ </para>
+
+ <para>The PurgeTxnLog utility implements a simple retention
+ policy that administrators can use. The <ulink
+ url="ext:api/index">API docs</ulink> contains details on
+ calling conventions (arguments, etc...).
+ </para>
+
+ <para>In the following example the last count snapshots and
+ their corresponding logs are retained and the others are
+ deleted. The value of <count> should typically be
+ greater than 3 (although not required, this provides 3 backups
+ in the unlikely event a recent log has become corrupted). This
+ can be run as a cron job on the ZooKeeper server machines to
+ clean up the logs daily.</para>
+
+ <programlisting> java -cp zookeeper.jar:lib/slf4j-api-1.6.1.jar:lib/slf4j-log4j12-1.6.1.jar:lib/log4j-1.2.15.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count></programlisting>
+
+ <para>Automatic purging of the snapshots and corresponding
+ transaction logs was introduced in version 3.4.0 and can be
+ enabled via the following configuration parameters <emphasis
+ role="bold">autopurge.snapRetainCount</emphasis> and <emphasis
+ role="bold">autopurge.purgeInterval</emphasis>. For more on
+ this, see <xref linkend="sc_advancedConfiguration"/>
+ below.</para>
+ </section>
+
+ <section>
+ <title>Debug Log Cleanup (log4j)</title>
+
+ <para>See the section on <ulink
+ url="#sc_logging">logging</ulink> in this document. It is
+ expected that you will setup a rolling file appender using the
+ in-built log4j feature. The sample configuration file in the
+ release tar's conf/log4j.properties provides an example of
+ this.
+ </para>
+ </section>
+
+ </section>
+
+ <section id="sc_supervision">
+ <title>Supervision</title>
+
+ <para>You will want to have a supervisory process that manages
+ each of your ZooKeeper server processes (JVM). The ZK server is
+ designed to be "fail fast" meaning that it will shutdown
+ (process exit) if an error occurs that it cannot recover
+ from. As a ZooKeeper serving cluster is highly reliable, this
+ means that while the server may go down the cluster as a whole
+ is still active and serving requests. Additionally, as the
+ cluster is "self healing" the failed server once restarted will
+ automatically rejoin the ensemble w/o any manual
+ interaction.</para>
+
+ <para>Having a supervisory process such as <ulink
+ url="http://cr.yp.to/daemontools.html">daemontools</ulink> or
+ <ulink
+ url="http://en.wikipedia.org/wiki/Service_Management_Facility">SMF</ulink>
+ (other options for supervisory process are also available, it's
+ up to you which one you would like to use, these are just two
+ examples) managing your ZooKeeper server ensures that if the
+ process does exit abnormally it will automatically be restarted
+ and will quickly rejoin the cluster.</para>
+ </section>
+
+ <section id="sc_monitoring">
+ <title>Monitoring</title>
+
+ <para>The ZooKeeper service can be monitored in one of two
+ primary ways; 1) the command port through the use of <ulink
+ url="#sc_zkCommands">4 letter words</ulink> and 2) <ulink
+ url="zookeeperJMX.html">JMX</ulink>. See the appropriate section for
+ your environment/requirements.</para>
+ </section>
+
+ <section id="sc_logging">
+ <title>Logging</title>
+
+ <para>ZooKeeper uses <emphasis role="bold">log4j</emphasis> version 1.2 as
+ its logging infrastructure. The ZooKeeper default <filename>log4j.properties</filename>
+ file resides in the <filename>conf</filename> directory. Log4j requires that
+ <filename>log4j.properties</filename> either be in the working directory
+ (the directory from which ZooKeeper is run) or be accessible from the classpath.</para>
+
+ <para>For more information, see
+ <ulink url="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</ulink>
+ of the log4j manual.</para>
+
+ </section>
+
+ <section id="sc_troubleshooting">
+ <title>Troubleshooting</title>
+ <variablelist>
+ <varlistentry>
+ <term> Server not coming up because of file corruption</term>
+ <listitem>
+ <para>A server might not be able to read its database and fail to come up because of
+ some file corruption in the transaction logs of the ZooKeeper server. You will
+ see some IOException on loading ZooKeeper database. In such a case,
+ make sure all the other servers in your ensemble are up and working. Use "stat"
+ command on the command port to see if they are in good health. After you have verified that
+ all the other servers of the ensemble are up, you can go ahead and clean the database
+ of the corrupt server. Delete all the files in datadir/version-2 and datalogdir/version-2/.
+ Restart the server.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </section>
+
+ <section id="sc_configuration">
+ <title>Configuration Parameters</title>
+
+ <para>ZooKeeper's behavior is governed by the ZooKeeper configuration
+ file. This file is designed so that the exact same file can be used by
+ all the servers that make up a ZooKeeper server assuming the disk
+ layouts are the same. If servers use different configuration files, care
+ must be taken to ensure that the list of servers in all of the different
+ configuration files match.</para>
+
+ <section id="sc_minimumConfiguration">
+ <title>Minimum Configuration</title>
+
+ <para>Here are the minimum configuration keywords that must be defined
+ in the configuration file:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>clientPort</term>
+
+ <listitem>
+ <para>the port to listen for client connections; that is, the
+ port that clients attempt to connect to.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="var_datadir">
+ <term>dataDir</term>
+
+ <listitem>
+ <para>the location where ZooKeeper will store the in-memory
+ database snapshots and, unless specified otherwise, the
+ transaction log of updates to the database.</para>
+
+ <note>
+ <para>Be careful where you put the transaction log. A
+ dedicated transaction log device is key to consistent good
+ performance. Putting the log on a busy device will adversely
+ effect performance.</para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry id="id_tickTime">
+ <term>tickTime</term>
+
+ <listitem>
+ <para>the length of a single tick, which is the basic time unit
+ used by ZooKeeper, as measured in milliseconds. It is used to
+ regulate heartbeats, and timeouts. For example, the minimum
+ session timeout will be two ticks.</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </section>
+
+ <section id="sc_advancedConfiguration">
+ <title>Advanced Configuration</title>
+
+ <para>The configuration settings in the section are optional. You can
+ use them to further fine tune the behaviour of your ZooKeeper servers.
+ Some can also be set using Java system properties, generally of the
+ form <emphasis>zookeeper.keyword</emphasis>. The exact system
+ property, when available, is noted below.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>dataLogDir</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>This option will direct the machine to write the
+ transaction log to the <emphasis
+ role="bold">dataLogDir</emphasis> rather than the <emphasis
+ role="bold">dataDir</emphasis>. This allows a dedicated log
+ device to be used, and helps avoid competition between logging
+ and snaphots.</para>
+
+ <note>
+ <para>Having a dedicated log device has a large impact on
+ throughput and stable latencies. It is highly recommened to
+ dedicate a log device and set <emphasis
+ role="bold">dataLogDir</emphasis> to point to a directory on
+ that device, and then make sure to point <emphasis
+ role="bold">dataDir</emphasis> to a directory
+ <emphasis>not</emphasis> residing on that device.</para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>globalOutstandingLimit</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.globalOutstandingLimit.</emphasis>)</para>
+
+ <para>Clients can submit requests faster than ZooKeeper can
+ process them, especially if there are a lot of clients. To
+ prevent ZooKeeper from running out of memory due to queued
+ requests, ZooKeeper will throttle clients so that there is no
+ more than globalOutstandingLimit outstanding requests in the
+ system. The default limit is 1,000.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>preAllocSize</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.preAllocSize</emphasis>)</para>
+
+ <para>To avoid seeks ZooKeeper allocates space in the
+ transaction log file in blocks of preAllocSize kilobytes. The
+ default block size is 64M. One reason for changing the size of
+ the blocks is to reduce the block size if snapshots are taken
+ more often. (Also, see <emphasis
+ role="bold">snapCount</emphasis>).</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>snapCount</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.snapCount</emphasis>)</para>
+
+ <para>ZooKeeper records its transactions using snapshots and
+ a transaction log (think write-ahead log).The number of
+ transactions recorded in the transaction log before a snapshot
+ can be taken (and the transaction log rolled) is determined
+ by snapCount. In order to prevent all of the machines in the quorum
+ from taking a snapshot at the same time, each ZooKeeper server
+ will take a snapshot when the number of transactions in the transaction log
+ reaches a runtime generated random value in the [snapCount/2+1, snapCount]
+ range.The default snapCount is 100,000.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>maxClientCnxns</term>
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>Limits the number of concurrent connections (at the socket
+ level) that a single client, identified by IP address, may make
+ to a single member of the ZooKeeper ensemble. This is used to
+ prevent certain classes of DoS attacks, including file
+ descriptor exhaustion. The default is 60. Setting this to 0
+ entirely removes the limit on concurrent connections.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>clientPortAddress</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> the
+ address (ipv4, ipv6 or hostname) to listen for client
+ connections; that is, the address that clients attempt
+ to connect to. This is optional, by default we bind in
+ such a way that any connection to the <emphasis
+ role="bold">clientPort</emphasis> for any
+ address/interface/nic on the server will be
+ accepted.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>minSessionTimeout</term>
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> the
+ minimum session timeout in milliseconds that the server
+ will allow the client to negotiate. Defaults to 2 times
+ the <emphasis role="bold">tickTime</emphasis>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>maxSessionTimeout</term>
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> the
+ maximum session timeout in milliseconds that the server
+ will allow the client to negotiate. Defaults to 20 times
+ the <emphasis role="bold">tickTime</emphasis>.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>fsync.warningthresholdms</term>
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.fsync.warningthresholdms</emphasis>)</para>
+
+ <para><emphasis role="bold">New in 3.3.4:</emphasis> A
+ warning message will be output to the log whenever an
+ fsync in the Transactional Log (WAL) takes longer than
+ this value. The values is specified in milliseconds and
+ defaults to 1000. This value can only be set as a
+ system property.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>autopurge.snapRetainCount</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para><emphasis role="bold">New in 3.4.0:</emphasis>
+ When enabled, ZooKeeper auto purge feature retains
+ the <emphasis role="bold">autopurge.snapRetainCount</emphasis> most
+ recent snapshots and the corresponding transaction logs in the
+ <emphasis role="bold">dataDir</emphasis> and <emphasis
+ role="bold">dataLogDir</emphasis> respectively and deletes the rest.
+ Defaults to 3. Minimum value is 3.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>autopurge.purgeInterval</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para><emphasis role="bold">New in 3.4.0:</emphasis> The
+ time interval in hours for which the purge task has to
+ be triggered. Set to a positive integer (1 and above)
+ to enable the auto purging. Defaults to 0.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>syncEnabled</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.observer.syncEnabled</emphasis>)</para>
+
+ <para><emphasis role="bold">New in 3.4.6, 3.5.0:</emphasis>
+ The observers now log transaction and write snapshot to disk
+ by default like the participants. This reduces the recovery time
+ of the observers on restart. Set to "false" to disable this
+ feature. Default is "true"</para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </section>
+
+ <section id="sc_clusterOptions">
+ <title>Cluster Options</title>
+
+ <para>The options in this section are designed for use with an ensemble
+ of servers -- that is, when deploying clusters of servers.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>electionAlg</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>Election implementation to use. A value of "0" corresponds
+ to the original UDP-based version, "1" corresponds to the
+ non-authenticated UDP-based version of fast leader election, "2"
+ corresponds to the authenticated UDP-based version of fast
+ leader election, and "3" corresponds to TCP-based version of
+ fast leader election. Currently, algorithm 3 is the default</para>
+
+ <note>
+ <para> The implementations of leader election 0, 1, and 2 are now
+ <emphasis role="bold"> deprecated </emphasis>. We have the intention
+ of removing them in the next release, at which point only the
+ FastLeaderElection will be available.
+ </para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>initLimit</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>Amount of time, in ticks (see <ulink
+ url="#id_tickTime">tickTime</ulink>), to allow followers to
+ connect and sync to a leader. Increased this value as needed, if
+ the amount of data managed by ZooKeeper is large.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>leaderServes</term>
+
+ <listitem>
+ <para>(Java system property: zookeeper.<emphasis
+ role="bold">leaderServes</emphasis>)</para>
+
+ <para>Leader accepts client connections. Default value is "yes".
+ The leader machine coordinates updates. For higher update
+ throughput at thes slight expense of read throughput the leader
+ can be configured to not accept clients and focus on
+ coordination. The default to this option is yes, which means
+ that a leader will accept client connections.</para>
+
+ <note>
+ <para>Turning on leader selection is highly recommended when
+ you have more than three ZooKeeper servers in an ensemble.</para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>server.x=[hostname]:nnnnn[:nnnnn], etc</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>servers making up the ZooKeeper ensemble. When the server
+ starts up, it determines which server it is by looking for the
+ file <filename>myid</filename> in the data directory. That file
+ contains the server number, in ASCII, and it should match
+ <emphasis role="bold">x</emphasis> in <emphasis
+ role="bold">server.x</emphasis> in the left hand side of this
+ setting.</para>
+
+ <para>The list of servers that make up ZooKeeper servers that is
+ used by the clients must match the list of ZooKeeper servers
+ that each ZooKeeper server has.</para>
+
+ <para>There are two port numbers <emphasis role="bold">nnnnn</emphasis>.
+ The first followers use to connect to the leader, and the second is for
+ leader election. The leader election port is only necessary if electionAlg
+ is 1, 2, or 3 (default). If electionAlg is 0, then the second port is not
+ necessary. If you want to test multiple servers on a single machine, then
+ different ports can be used for each server.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>syncLimit</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>Amount of time, in ticks (see <ulink
+ url="#id_tickTime">tickTime</ulink>), to allow followers to sync
+ with ZooKeeper. If followers fall too far behind a leader, they
+ will be dropped.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>group.x=nnnnn[:nnnnn]</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>Enables a hierarchical quorum construction."x" is a group identifier
+ and the numbers following the "=" sign correspond to server identifiers.
+ The left-hand side of the assignment is a colon-separated list of server
+ identifiers. Note that groups must be disjoint and the union of all groups
+ must be the ZooKeeper ensemble. </para>
+
+ <para> You will find an example <ulink url="zookeeperHierarchicalQuorums.html">here</ulink>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>weight.x=nnnnn</term>
+
+ <listitem>
+ <para>(No Java system property)</para>
+
+ <para>Used along with "group", it assigns a weight to a server when
+ forming quorums. Such a value corresponds to the weight of a server
+ when voting. There are a few parts of ZooKeeper that require voting
+ such as leader election and the atomic broadcast protocol. By default
+ the weight of server is 1. If the configuration defines groups, but not
+ weights, then a value of 1 will be assigned to all servers.
+ </para>
+
+ <para> You will find an example <ulink url="zookeeperHierarchicalQuorums.html">here</ulink>
+ </para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>cnxTimeout</term>
+
+ <listitem>
+ <para>(Java system property: zookeeper.<emphasis
+ role="bold">cnxTimeout</emphasis>)</para>
+
+ <para>Sets the timeout value for opening connections for leader election notifications.
+ Only applicable if you are using electionAlg 3.
+ </para>
+
+ <note>
+ <para>Default value is 5 seconds.</para>
+ </note>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>4lw.commands.whitelist</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.4lw.commands.whitelist</emphasis>)</para>
+
+ <para><emphasis role="bold">New in 3.4.10:</emphasis>
+ This property contains a list of comma separated
+ <ulink url="#sc_zkCommands">Four Letter Words</ulink> commands. It is introduced
+ to provide fine grained control over the set of commands ZooKeeper can execute,
+ so users can turn off certain commands if necessary.
+ By default it contains all supported four letter word commands except "wchp" and "wchc",
+ if the property is not specified. If the property is specified, then only commands listed
+ in the whitelist are enabled.
+ </para>
+
+ <para>Here's an example of the configuration that enables stat, ruok, conf, and isro
+ command while disabling the rest of Four Letter Words command:</para>
+ <programlisting>
+ 4lw.commands.whitelist=stat, ruok, conf, isro
+ </programlisting>
+
+ <para>Users can also use asterisk option so they don't have to include every command one by one in the list.
+ As an example, this will enable all four letter word commands:
+ </para>
+ <programlisting>
+ 4lw.commands.whitelist=*
+ </programlisting>
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>ipReachableTimeout</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.ipReachableTimeout</emphasis>)</para>
+
+ <para><emphasis role="bold">New in 3.4.11:</emphasis>
+ Set this timeout value for IP addresses reachable checking when hostname is resolved, as mesured in
+ milliseconds.
+ By default, ZooKeeper will use the first IP address of the hostname(without any reachable checking).
+ When zookeeper.ipReachableTimeout is set(larger than 0), ZooKeeper will will try to pick up the first
+ IP address which is reachable. This is done by calling Java API InetAddress.isReachable(long timeout)
+ function, in which this timeout value is used. If none of such reachable IP address can be found, the
+ first IP address of the hostname will be used anyway.
+ </para>
+
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>tcpKeepAlive</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.tcpKeepAlive</emphasis>)</para>
+
+ <para><emphasis role="bold">New in 3.4.11:</emphasis>
+ Setting this to true sets the TCP keepAlive flag on the
+ sockets used by quorum members to perform elections.
+ This will allow for connections between quorum members to
+ remain up when there is network infrastructure that may
+ otherwise break them. Some NATs and firewalls may terminate
+ or lose state for long running or idle connections.</para>
+
+ <para> Enabling this option relies on OS level settings to work
+ properly, check your operating system's options regarding TCP
+ keepalive for more information. Defaults to
+ <emphasis role="bold">false</emphasis>.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ <para></para>
+ </section>
+
+ <section id="sc_authOptions">
+ <title>Authentication & Authorization Options</title>
+
+ <para>The options in this section allow control over
+ authentication/authorization performed by the service.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>zookeeper.DigestAuthenticationProvider.superDigest</term>
+
+ <listitem>
+ <para>(Java system property only: <emphasis
+ role="bold">zookeeper.DigestAuthenticationProvider.superDigest</emphasis>)</para>
+
+ <para>By default this feature is <emphasis
+ role="bold">disabled</emphasis></para>
+
+ <para><emphasis role="bold">New in 3.2:</emphasis>
+ Enables a ZooKeeper ensemble administrator to access the
+ znode hierarchy as a "super" user. In particular no ACL
+ checking occurs for a user authenticated as
+ super.</para>
+
+ <para>org.apache.zookeeper.server.auth.DigestAuthenticationProvider
+ can be used to generate the superDigest, call it with
+ one parameter of "super:<password>". Provide the
+ generated "super:<data>" as the system property value
+ when starting each server of the ensemble.</para>
+
+ <para>When authenticating to a ZooKeeper server (from a
+ ZooKeeper client) pass a scheme of "digest" and authdata
+ of "super:<password>". Note that digest auth passes
+ the authdata in plaintext to the server, it would be
+ prudent to use this authentication method only on
+ localhost (not over the network) or over an encrypted
+ connection.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>isro</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.4.0:</emphasis> Tests if
+ server is running in read-only mode. The server will respond with
+ "ro" if in read-only mode or "rw" if not in read-only mode.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>gtmk</term>
+
+ <listitem>
+ <para>Gets the current trace mask as a 64-bit signed long value in
+ decimal format. See <command>stmk</command> for an explanation of
+ the possible values.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>stmk</term>
+
+ <listitem>
+ <para>Sets the current trace mask. The trace mask is 64 bits,
+ where each bit enables or disables a specific category of trace
+ logging on the server. Log4J must be configured to enable
+ <command>TRACE</command> level first in order to see trace logging
+ messages. The bits of the trace mask correspond to the following
+ trace logging categories.</para>
+
+ <table>
+ <title>Trace Mask Bit Values</title>
+ <tgroup cols="2" align="left" colsep="1" rowsep="1">
+ <tbody>
+ <row>
+ <entry>0b0000000000</entry>
+ <entry>Unused, reserved for future use.</entry>
+ </row>
+ <row>
+ <entry>0b0000000010</entry>
+ <entry>Logs client requests, excluding ping
+ requests.</entry>
+ </row>
+ <row>
+ <entry>0b0000000100</entry>
+ <entry>Unused, reserved for future use.</entry>
+ </row>
+ <row>
+ <entry>0b0000001000</entry>
+ <entry>Logs client ping requests.</entry>
+ </row>
+ <row>
+ <entry>0b0000010000</entry>
+ <entry>Logs packets received from the quorum peer that is
+ the current leader, excluding ping requests.</entry>
+ </row>
+ <row>
+ <entry>0b0000100000</entry>
+ <entry>Logs addition, removal and validation of client
+ sessions.</entry>
+ </row>
+ <row>
+ <entry>0b0001000000</entry>
+ <entry>Logs delivery of watch events to client
+ sessions.</entry>
+ </row>
+ <row>
+ <entry>0b0010000000</entry>
+ <entry>Logs ping packets received from the quorum peer
+ that is the current leader.</entry>
+ </row>
+ <row>
+ <entry>0b0100000000</entry>
+ <entry>Unused, reserved for future use.</entry>
+ </row>
+ <row>
+ <entry>0b1000000000</entry>
+ <entry>Unused, reserved for future use.</entry>
+ </row>
+ </tbody>
+ </tgroup>
+ </table>
+
+ <para>All remaining bits in the 64-bit value are unused and
+ reserved for future use. Multiple trace logging categories are
+ specified by calculating the bitwise OR of the documented values.
+ The default trace mask is 0b0100110010. Thus, by default, trace
+ logging includes client requests, packets received from the
+ leader and sessions.</para>
+
+ <para>To set a different trace mask, send a request containing the
+ <command>stmk</command> four-letter word followed by the trace
+ mask represented as a 64-bit signed long value. This example uses
+ the Perl <command>pack</command> function to construct a trace
+ mask that enables all trace logging categories described above and
+ convert it to a 64-bit signed long value with big-endian byte
+ order. The result is appended to <command>stmk</command> and sent
+ to the server using netcat. The server responds with the new
+ trace mask in decimal format.</para>
+
+ <programlisting>$ perl -e "print 'stmk', pack('q>', 0b0011111010)" | nc localhost 2181
+250
+ </programlisting>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </section>
+
+ <section>
+ <title>Experimental Options/Features</title>
+
+ <para>New features that are currently considered experimental.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>Read Only Mode Server</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">readonlymode.enabled</emphasis>)</para>
+
+ <para><emphasis role="bold">New in 3.4.0:</emphasis>
+ Setting this value to true enables Read Only Mode server
+ support (disabled by default). ROM allows clients
+ sessions which requested ROM support to connect to the
+ server even when the server might be partitioned from
+ the quorum. In this mode ROM clients can still read
+ values from the ZK service, but will be unable to write
+ values and see changes from other clients. See
+ ZOOKEEPER-784 for more details.
+ </para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </section>
+
+ <section>
+ <title>Unsafe Options</title>
+
+ <para>The following options can be useful, but be careful when you use
+ them. The risk of each is explained along with the explanation of what
+ the variable does.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>forceSync</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.forceSync</emphasis>)</para>
+
+ <para>Requires updates to be synced to media of the transaction
+ log before finishing processing the update. If this option is
+ set to no, ZooKeeper will not require updates to be synced to
+ the media.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>jute.maxbuffer:</term>
+
+ <listitem>
+ <para>(Java system property:<emphasis role="bold">
+ jute.maxbuffer</emphasis>)</para>
+
+ <para>This option can only be set as a Java system property.
+ There is no zookeeper prefix on it. It specifies the maximum
+ size of the data that can be stored in a znode. The default is
+ 0xfffff, or just under 1M. If this option is changed, the system
+ property must be set on all servers and clients otherwise
+ problems will arise. This is really a sanity check. ZooKeeper is
+ designed to store data on the order of kilobytes in size.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>skipACL</term>
+
+ <listitem>
+ <para>(Java system property: <emphasis
+ role="bold">zookeeper.skipACL</emphasis>)</para>
+
+ <para>Skips ACL checks. This results in a boost in throughput,
+ but opens up full access to the data tree to everyone.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>quorumListenOnAllIPs</term>
+
+ <listitem>
+ <para>When set to true the ZooKeeper server will listen
+ for connections from its peers on all available IP addresses,
+ and not only the address configured in the server list of the
+ configuration file. It affects the connections handling the
+ ZAB protocol and the Fast Leader Election protocol. Default
+ value is <emphasis role="bold">false</emphasis>.</para>
+ </listitem>
+ </varlistentry>
+
+ </variablelist>
+ </section>
+
+ <section>
+ <title>Communication using the Netty framework</title>
+
+ <para><emphasis role="bold">New in
+ 3.4:</emphasis> <ulink url="http://jboss.org/netty">Netty</ulink>
+ is an NIO based client/server communication framework, it
+ simplifies (over NIO being used directly) many of the
+ complexities of network level communication for java
+ applications. Additionally the Netty framework has built
+ in support for encryption (SSL) and authentication
+ (certificates). These are optional features and can be
+ turned on or off individually.
+ </para>
+ <para>Prior to version 3.4 ZooKeeper has always used NIO
+ directly, however in versions 3.4 and later Netty is
+ supported as an option to NIO (replaces). NIO continues to
+ be the default, however Netty based communication can be
+ used in place of NIO by setting the environment variable
+ "zookeeper.serverCnxnFactory" to
+ "org.apache.zookeeper.server.NettyServerCnxnFactory". You
+ have the option of setting this on either the client(s) or
+ server(s), typically you would want to set this on both,
+ however that is at your discretion.
+ </para>
+ <para>
+ TBD - tuning options for netty - currently there are none that are netty specific but we should add some. Esp around max bound on the number of reader worker threads netty creates.
+ </para>
+ <para>
+ TBD - how to manage encryption
+ </para>
+ <para>
+ TBD - how to manage certificates
+ </para>
+
+ </section>
+
+ </section>
+
+ <section id="sc_zkCommands">
+ <title>ZooKeeper Commands: The Four Letter Words</title>
+
+ <para>ZooKeeper responds to a small set of commands. Each command is
+ composed of four letters. You issue the commands to ZooKeeper via telnet
+ or nc, at the client port.</para>
+
+ <para>Three of the more interesting commands: "stat" gives some
+ general information about the server and connected clients,
+ while "srvr" and "cons" give extended details on server and
+ connections respectively.</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>conf</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> Print
+ details about serving configuration.</para>
+ </listitem>
+
+ </varlistentry>
+
+ <varlistentry>
+ <term>cons</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> List
+ full connection/session details for all clients connected
+ to this server. Includes information on numbers of packets
+ received/sent, session id, operation latencies, last
+ operation performed, etc...</para>
+ </listitem>
+
+ </varlistentry>
+
+ <varlistentry>
+ <term>crst</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> Reset
+ connection/session statistics for all connections.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>dump</term>
+
+ <listitem>
+ <para>Lists the outstanding sessions and ephemeral nodes. This
+ only works on the leader.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>envi</term>
+
+ <listitem>
+ <para>Print details about serving environment</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>ruok</term>
+
+ <listitem>
+ <para>Tests if server is running in a non-error state. The server
+ will respond with imok if it is running. Otherwise it will not
+ respond at all.</para>
+
+ <para>A response of "imok" does not necessarily indicate that the
+ server has joined the quorum, just that the server process is active
+ and bound to the specified client port. Use "stat" for details on
+ state wrt quorum and client connection information.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>srst</term>
+
+ <listitem>
+ <para>Reset server statistics.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>srvr</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists
+ full details for the server.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>stat</term>
+
+ <listitem>
+ <para>Lists brief details for the server and connected
+ clients.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>wchs</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists
+ brief information on watches for the server.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>wchc</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists
+ detailed information on watches for the server, by
+ session. This outputs a list of sessions(connections)
+ with associated watches (paths). Note, depending on the
+ number of watches this operation may be expensive (ie
+ impact server performance), use it carefully.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>wchp</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.3.0:</emphasis> Lists
+ detailed information on watches for the server, by path.
+ This outputs a list of paths (znodes) with associated
+ sessions. Note, depending on the number of watches this
+ operation may be expensive (ie impact server performance),
+ use it carefully.</para>
+ </listitem>
+ </varlistentry>
+
+
+ <varlistentry>
+ <term>mntr</term>
+
+ <listitem>
+ <para><emphasis role="bold">New in 3.4.0:</emphasis> Outputs a list
+ of variables that could be used for monitoring the health of the cluster.</para>
+
+ <programlisting>$ echo mntr | nc localhost 2185
+
+zk_version 3.4.0
+zk_avg_latency 0
+zk_max_latency 0
+zk_min_latency 0
+zk_packets_received 70
+zk_packets_sent 69
+zk_outstanding_requests 0
+zk_server_state leader
+zk_znode_count 4
+zk_watch_count 0
+zk_ephemerals_count 0
+zk_approximate_data_size 27
+zk_followers 4 - only exposed by the Leader
+zk_synced_followers 4 - only exposed by the Leader
+zk_pending_syncs 0 - only exposed by the Leader
+zk_open_file_descriptor_count 23 - only available on Unix platforms
+zk_max_file_descriptor_count 1024 - only available on Unix platforms
+zk_fsync_threshold_exceed_count 0
+</programlisting>
+
+ <para>The output is compatible with java properties format and the content
+ may change over time (new keys added). Your scripts should expect changes.</para>
+
+ <para>ATTENTION: Some of the keys are platform specific and some of the keys are only exported by the Leader. </para>
+
+ <para>The output contains multiple lines with the following format:</para>
+ <programlisting>key \t value</programlisting>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+
+ <para>Here's an example of the <emphasis role="bold">ruok</emphasis>
+ command:</para>
+
+ <programlisting>$ echo ruok | nc 127.0.0.1 5111
+imok
+</programlisting>
+
+
+ </section>
+
+ <section id="sc_dataFileManagement">
+ <title>Data File Management</title>
+
+ <para>ZooKeeper stores its data in a data directory and its transaction
+ log in a transaction log directory. By default these two directories are
+ the same. The server can (and should) be configured to store the
+ transaction log files in a separate directory than the data files.
+ Throughput increases and latency decreases when transaction logs reside
+ on a dedicated log devices.</para>
+
+ <section>
+ <title>The Data Directory</title>
+
+ <para>This directory has two files in it:</para>
+
+ <itemizedlist>
+ <listitem>
+ <para><filename>myid</filename> - contains a single integer in
+ human readable ASCII text that represents the server id.</para>
+ </listitem>
+
+ <listitem>
+ <para><filename>snapshot.<zxid></filename> - holds the fuzzy
+ snapshot of a data tree.</para>
+ </listitem>
+ </itemizedlist>
+
+ <para>Each ZooKeeper server has a unique id. This id is used in two
+ places: the <filename>myid</filename> file and the configuration file.
+ The <filename>myid</filename> file identifies the server that
+ corresponds to the given data directory. The configuration file lists
+ the contact information for each server identified by its server id.
+ When a ZooKeeper server instance starts, it reads its id from the
+ <filename>myid</filename> file and then, using that id, reads from the
+ configuration file, looking up the port on which it should
+ listen.</para>
+
+ <para>The <filename>snapshot</filename> files stored in the data
+ directory are fuzzy snapshots in the sense that during the time the
+ ZooKeeper server is taking the snapshot, updates are occurring to the
+ data tree. The suffix of the <filename>snapshot</filename> file names
+ is the <emphasis>zxid</emphasis>, the ZooKeeper transaction id, of the
+ last committed transaction at the start of the snapshot. Thus, the
+ snapshot includes a subset of the updates to the data tree that
+ occurred while the snapshot was in process. The snapshot, then, may
+ not correspond to any data tree that actually existed, and for this
+ reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can
+ recover using this snapshot because it takes advantage of the
+ idempotent nature of its updates. By replaying the transaction log
+ against fuzzy snapshots ZooKeeper gets the state of the system at the
+ end of the log.</para>
+ </section>
+
+ <section>
+ <title>The Log Directory</title>
+
+ <para>The Log Directory contains the ZooKeeper transaction logs.
+ Before any update takes place, ZooKeeper ensures that the transaction
+ that represents the update is written to non-volatile storage. A new
+ log file is started when the number of transactions written to the
+ current log file reaches a (variable) threshold. The threshold is
+ computed using the same parameter which influences the frequency of
+ snapshotting (see snapCount above). The log file's suffix is the first
+ zxid written to that log.</para>
+ </section>
+
+ <section id="sc_filemanagement">
+ <title>File Management</title>
+
+ <para>The format of snapshot and log files does not change between
+ standalone ZooKeeper servers and different configurations of
+ replicated ZooKeeper servers. Therefore, you can pull these files from
+ a running replicated ZooKeeper server to a development machine with a
+ stand-alone ZooKeeper server for trouble shooting.</para>
+
+ <para>Using older log and snapshot files, you can look at the previous
+ state of ZooKeeper servers and even restore that state. The
+ LogFormatter class allows an administrator to look at the transactions
+ in a log.</para>
+
+ <para>The ZooKeeper server creates snapshot and log files, but
+ never deletes them. The retention policy of the data and log
+ files is implemented outside of the ZooKeeper server. The
+ server itself only needs the latest complete fuzzy snapshot, all log
+ files following it, and the last log file preceding it. The latter
+ requirement is necessary to include updates which happened after this
+ snapshot was started but went into the existing log file at that time.
+ This is possible because snapshotting and rolling over of logs
+ proceed somewhat independently in ZooKeeper. See the
+ <ulink url="#sc_maintenance">maintenance</ulink> section in
+ this document for more details on setting a retention policy
+ and maintenance of ZooKeeper storage.
+ </para>
+ <note>
+ <para>The data stored in these files is not encrypted. In the case of
+ storing sensitive data in ZooKeeper, necessary measures need to be
+ taken to prevent unauthorized access. Such measures are external to
+ ZooKeeper (e.g., control access to the files) and depend on the
+ individual settings in which it is being deployed. </para>
+ </note>
+ </section>
+
+ <section>
+ <title>Recovery - TxnLogToolkit</title>
+
+ <para>TxnLogToolkit is a command line tool shipped with ZooKeeper which
+ is capable of recovering transaction log entries with broken CRC.</para>
+ <para>Running it without any command line parameters or with the "-h,--help"
+ argument, it outputs the following help page:</para>
+
+ <programlisting>
+ $ bin/zkTxnLogToolkit.sh
+
+ usage: TxnLogToolkit [-dhrv] txn_log_file_name
+ -d,--dump Dump mode. Dump all entries of the log file. (this is the default)
+ -h,--help Print help message
+ -r,--recover Recovery mode. Re-calculate CRC for broken entries.
+ -v,--verbose Be verbose in recovery mode: print all entries, not just fixed ones.
+ -y,--yes Non-interactive mode: repair all CRC errors without asking
+ </programlisting>
+
+ <para>The default behaviour is safe: it dumps the entries of the given
+ transaction log file to the screen: (same as using '-d,--dump' parameter)</para>
+
+ <programlisting>
+ $ bin/zkTxnLogToolkit.sh log.100000001
+ ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
+ 4/5/18 2:15:58 PM CEST session 0x16295bafcc40000 cxid 0x0 zxid 0x100000001 createSession 30000
+ <emphasis role="bold">CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null</emphasis>
+ 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
+ 4/5/18 2:16:12 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x100000003 createSession 30000
+ 4/5/18 2:17:34 PM CEST session 0x26295bafcc90000 cxid 0x0 zxid 0x200000001 closeSession null
+ 4/5/18 2:17:34 PM CEST session 0x16295bd23720000 cxid 0x0 zxid 0x200000002 createSession 30000
+ 4/5/18 2:18:02 PM CEST session 0x16295bd23720000 cxid 0x2 zxid 0x200000003 create '/andor,#626262,v{s{31,s{'world,'anyone}}},F,1
+ EOF reached after 6 txns.
+ </programlisting>
+
+ <para>There's a CRC error in the 2nd entry of the above transaction log file. In <emphasis role="bold">dump</emphasis>
+ mode, the toolkit only prints this information to the screen without touching the original file. In
+ <emphasis role="bold">recovery</emphasis> mode (-r,--recover flag) the original file still remains
+ untouched and all transactions will be copied over to a new txn log file with ".fixed" suffix. It recalculates
+ CRC values and copies the calculated value, if it doesn't match the original txn entry.
+ By default, the tool works interactively: it asks for confirmation whenever CRC error encountered.</para>
+
+ <programlisting>
+ $ bin/zkTxnLogToolkit.sh -r log.100000001
+ ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
+ CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
+ Would you like to fix it (Yes/No/Abort) ?
+ </programlisting>
+
+ <para>Answering <emphasis role="bold">Yes</emphasis> means the newly calculated CRC value will be outputted
+ to the new file. <emphasis role="bold">No</emphasis> means that the original CRC value will be copied over.
+ <emphasis role="bold">Abort</emphasis> will abort the entire operation and exits.
+ (In this case the ".fixed" will not be deleted and left in a half-complete state: contains only entries which
+ have already been processed or only the header if the operation was aborted at the first entry.)</para>
+
+ <programlisting>
+ $ bin/zkTxnLogToolkit.sh -r log.100000001
+ ZooKeeper Transactional Log File with dbid 0 txnlog format version 2
+ CRC ERROR - 4/5/18 2:16:05 PM CEST session 0x16295bafcc40000 cxid 0x1 zxid 0x100000002 closeSession null
+ Would you like to fix it (Yes/No/Abort) ? y
+ EOF reached after 6 txns.
+ Recovery file log.100000001.fixed has been written with 1 fixed CRC error(s)
+ </programlisting>
+
+ <para>The default behaviour of recovery is to be silent: only entries with CRC error get printed to the screen.
+ One can turn on verbose mode with the -v,--verbose parameter to see all records.
+ Interactive mode can be turned off with the -y,--yes parameter. In this case all CRC errors will be fixed
+ in the new transaction file.</para>
+ </section>
+ </section>
+
+ <section id="sc_commonProblems">
+ <title>Things to Avoid</title>
+
+ <para>Here are some common problems you can avoid by configuring
+ ZooKeeper correctly:</para>
+
+ <variablelist>
+ <varlistentry>
+ <term>inconsistent lists of servers</term>
+
+ <listitem>
+ <para>The list of ZooKeeper servers used by the clients must match
+ the list of ZooKeeper servers that each ZooKeeper server has.
+ Things work okay if the client list is a subset of the real list,
+ but things will really act strange if clients have a list of
+ ZooKeeper servers that are in different ZooKeeper clusters. Also,
+ the server lists in each Zookeeper server configuration file
+ should be consistent with one another.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>incorrect placement of transaction log</term>
+
+ <listitem>
+ <para>The most performance critical part of ZooKeeper is the
+ transaction log. ZooKeeper syncs transactions to media before it
+ returns a response. A dedicated transaction log device is key to
+ consistent good performance. Putting the log on a busy device will
+ adversely effect performance. If you only have one storage device,
+ put trace files on NFS and increase the snapshotCount; it doesn't
+ eliminate the problem, but it should mitigate it.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>incorrect Java heap size</term>
+
+ <listitem>
+ <para>You should take special care to set your Java max heap size
+ correctly. In particular, you should not create a situation in
+ which ZooKeeper swaps to disk. The disk is death to ZooKeeper.
+ Everything is ordered, so if processing one request swaps the
+ disk, all other queued requests will probably do the same. the
+ disk. DON'T SWAP.</para>
+
+ <para>Be conservative in your estimates: if you have 4G of RAM, do
+ not set the Java max heap size to 6G or even 4G. For example, it
+ is more likely you would use a 3G heap for a 4G machine, as the
+ operating system and the cache also need memory. The best and only
+ recommend practice for estimating the heap size your system needs
+ is to run load tests, and then make sure you are well below the
+ usage limit that would cause the system to swap.</para>
+ </listitem>
+ </varlistentry>
+
+ <varlistentry>
+ <term>Publicly accessible deployment</term>
+ <listitem>
+ <para>
+ A ZooKeeper ensemble is expected to operate in a trusted computing environment.
+ It is thus recommended to deploy ZooKeeper behind a firewall.
+ </para>
+ </listitem>
+ </varlistentry>
+ </variablelist>
+ </section>
+
+ <section id="sc_bestPractices">
+ <title>Best Practices</title>
+
+ <para>For best results, take note of the following list of good
+ Zookeeper practices:</para>
+
+
+ <para>For multi-tennant installations see the <ulink
+ url="zookeeperProgrammers.html#ch_zkSessions">section</ulink>
+ detailing ZooKeeper "chroot" support, this can be very useful
+ when deploying many applications/services interfacing to a
+ single ZooKeeper cluster.</para>
+
+ </section>
+ </section>
+</article>
http://git-wip-us.apache.org/repos/asf/zookeeper/blob/c1efa954/zookeeper-docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml
----------------------------------------------------------------------
diff --git a/zookeeper-docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml b/zookeeper-docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml
new file mode 100644
index 0000000..f71c4a8
--- /dev/null
+++ b/zookeeper-docs/src/documentation/content/xdocs/zookeeperHierarchicalQuorums.xml
@@ -0,0 +1,75 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+ Copyright 2002-2004 The Apache Software Foundation
+
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!DOCTYPE article PUBLIC "-//OASIS//DTD Simplified DocBook XML V1.0//EN"
+"http://www.oasis-open.org/docbook/xml/simple/1.0/sdocbook.dtd">
+<article id="zk_HierarchicalQuorums">
+ <title>Introduction to hierarchical quorums</title>
+
+ <articleinfo>
+ <legalnotice>
+ <para>Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License. You may
+ obtain a copy of the License at <ulink
+ url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+ <para>Unless required by applicable law or agreed to in writing,
+ software distributed under the License is distributed on an "AS IS"
+ BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+ implied. See the License for the specific language governing permissions
+ and limitations under the License.</para>
+ </legalnotice>
+
+ <abstract>
+ <para>This document contains information about hierarchical quorums.</para>
+ </abstract>
+ </articleinfo>
+
+ <para>
+ This document gives an example of how to use hierarchical quorums. The basic idea is
+ very simple. First, we split servers into groups, and add a line for each group listing
+ the servers that form this group. Next we have to assign a weight to each server.
+ </para>
+
+ <para>
+ The following example shows how to configure a system with three groups of three servers
+ each, and we assign a weight of 1 to each server:
+ </para>
+
+ <programlisting>
+ group.1=1:2:3
+ group.2=4:5:6
+ group.3=7:8:9
+
+ weight.1=1
+ weight.2=1
+ weight.3=1
+ weight.4=1
+ weight.5=1
+ weight.6=1
+ weight.7=1
+ weight.8=1
+ weight.9=1
+ </programlisting>
+
+ <para>
+ When running the system, we are able to form a quorum once we have a majority of votes from
+ a majority of non-zero-weight groups. Groups that have zero weight are discarded and not
+ considered when forming quorums. Looking at the example, we are able to form a quorum once
+ we have votes from at least two servers from each of two different groups.
+ </para>
+ </article>
\ No newline at end of file