You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@zookeeper.apache.org by ma...@apache.org on 2010/02/24 21:04:06 UTC
svn commit: r915956 - in /hadoop/zookeeper/trunk: CHANGES.txt
docs/zookeeperAdmin.html docs/zookeeperAdmin.pdf
src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
Author: mahadev
Date: Wed Feb 24 20:04:06 2010
New Revision: 915956
URL: http://svn.apache.org/viewvc?rev=915956&view=rev
Log:
ZOOKEEPER-485. Need ops documentation that details supervision of ZK server processes. (phunt via mahadev)
Modified:
hadoop/zookeeper/trunk/CHANGES.txt
hadoop/zookeeper/trunk/docs/zookeeperAdmin.html
hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf
hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
Modified: hadoop/zookeeper/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/CHANGES.txt?rev=915956&r1=915955&r2=915956&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/CHANGES.txt (original)
+++ hadoop/zookeeper/trunk/CHANGES.txt Wed Feb 24 20:04:06 2010
@@ -293,6 +293,9 @@
ZOOKEEPER-607. improve bookkeeper overview (flavio via mahadev)
+ ZOOKEEPER-485. Need ops documentation that details supervision of ZK server
+ processes. (phunt via mahadev)
+
NEW FEATURES:
ZOOKEEPER-539. generate eclipse project via ant target. (phunt via mahadev)
Modified: hadoop/zookeeper/trunk/docs/zookeeperAdmin.html
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperAdmin.html?rev=915956&r1=915955&r2=915956&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/docs/zookeeperAdmin.html (original)
+++ hadoop/zookeeper/trunk/docs/zookeeperAdmin.html Wed Feb 24 20:04:06 2010
@@ -263,6 +263,9 @@
</ul>
</li>
<li>
+<a href="#sc_supervision">Supervision</a>
+</li>
+<li>
<a href="#sc_monitoring">Monitoring</a>
</li>
<li>
@@ -673,6 +676,15 @@
<li>
<p>
+<a href="#sc_supervision">Supervision</a>
+</p>
+
+</li>
+
+
+<li>
+
+<p>
<a href="#sc_monitoring">Monitoring</a>
</p>
@@ -742,7 +754,7 @@
</li>
</ul>
-<a name="N101AE"></a><a name="sc_designing"></a>
+<a name="N101B6"></a><a name="sc_designing"></a>
<h3 class="h4">Designing a ZooKeeper Deployment</h3>
<p>The reliablity of ZooKeeper rests on two basic assumptions.</p>
<ol>
@@ -769,7 +781,7 @@
to hold true. Some of these are cross-machines considerations,
and others are things you should consider for each and every
machine in your deployment.</p>
-<a name="N101CA"></a><a name="sc_CrossMachineRequirements"></a>
+<a name="N101D2"></a><a name="sc_CrossMachineRequirements"></a>
<h4>Cross Machine Requirements</h4>
<p>For the ZooKeeper service to be active, there must be a
majority of non-failing machines that can communicate with
@@ -787,7 +799,7 @@
failure of that switch could cause a correlated failure and
bring down the service. The same holds true of shared power
circuits, cooling systems, etc.</p>
-<a name="N101D7"></a><a name="Single+Machine+Requirements"></a>
+<a name="N101DF"></a><a name="Single+Machine+Requirements"></a>
<h4>Single Machine Requirements</h4>
<p>If ZooKeeper has to contend with other applications for
access to resourses like storage media, CPU, network, or
@@ -828,20 +840,20 @@
</li>
</ul>
-<a name="N101F5"></a><a name="sc_provisioning"></a>
+<a name="N101FD"></a><a name="sc_provisioning"></a>
<h3 class="h4">Provisioning</h3>
<p></p>
-<a name="N101FE"></a><a name="sc_strengthsAndLimitations"></a>
+<a name="N10206"></a><a name="sc_strengthsAndLimitations"></a>
<h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
<p></p>
-<a name="N10207"></a><a name="sc_administering"></a>
+<a name="N1020F"></a><a name="sc_administering"></a>
<h3 class="h4">Administering</h3>
<p></p>
-<a name="N10210"></a><a name="sc_maintenance"></a>
+<a name="N10218"></a><a name="sc_maintenance"></a>
<h3 class="h4">Maintenance</h3>
<p>Little long term maintenance is required for a ZooKeeper
cluster however you must be aware of the following:</p>
-<a name="N10219"></a><a name="Ongoing+Data+Directory+Cleanup"></a>
+<a name="N10221"></a><a name="Ongoing+Data+Directory+Cleanup"></a>
<h4>Ongoing Data Directory Cleanup</h4>
<p>The ZooKeeper <a href="#var_datadir">Data
Directory</a> contains files which are a persistent copy
@@ -871,7 +883,7 @@
can be run as a cron job on the ZooKeeper server machines to
clean up the logs daily.</p>
<pre class="code"> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count></pre>
-<a name="N1023A"></a><a name="Debug+Log+Cleanup+%28log4j%29"></a>
+<a name="N10242"></a><a name="Debug+Log+Cleanup+%28log4j%29"></a>
<h4>Debug Log Cleanup (log4j)</h4>
<p>See the section on <a href="#sc_logging">logging</a> in this document. It is
expected that you will setup a rolling file appender using the
@@ -879,10 +891,31 @@
release tar's conf/log4j.properties provides an example of
this.
</p>
-<a name="N10249"></a><a name="sc_monitoring"></a>
+<a name="N10251"></a><a name="sc_supervision"></a>
+<h3 class="h4">Supervision</h3>
+<p>You will want to have a supervisory process that manages
+ each of your ZooKeeper server processes (JVM). The ZK server is
+ designed to be "fail fast" meaning that it will shutdown
+ (process exit) if an error occurs that it cannot recover
+ from. As a ZooKeeper serving cluster is highly reliable, this
+ means that while the server may go down the cluster as a whole
+ is still active and serving requests. Additionally, as the
+ cluster is "self healing" the failed server once restarted will
+ automatically rejoin the ensemble w/o any manual
+ interaction.</p>
+<p>Having a supervisory process such as <a href="http://cr.yp.to/daemontools.html">daemontools</a> or
+ <a href="http://en.wikipedia.org/wiki/Service_Management_Facility">SMF</a>
+ (other options for supervisory process are also available, it's
+ up to you which one you would like to use, these are just two
+ examples) managing your ZooKeeper server ensures that if the
+ process does exit abnormally it will automatically be restarted
+ and will quickly rejoin the cluster.</p>
+<a name="N10266"></a><a name="sc_monitoring"></a>
<h3 class="h4">Monitoring</h3>
-<p></p>
-<a name="N10252"></a><a name="sc_logging"></a>
+<p>The ZooKeeper service can be monitored in one of two
+ primary ways; 1) the command port through the use of <a href="#sc_zkCommands">4 letter words</a> and 2) <a href="zookeeperJMX.html">JMX</a>. See the appropriate section for
+ your environment/requirements.</p>
+<a name="N10278"></a><a name="sc_logging"></a>
<h3 class="h4">Logging</h3>
<p>ZooKeeper uses <strong>log4j</strong> version 1.2 as
its logging infrastructure. The ZooKeeper default <span class="codefrag filename">log4j.properties</span>
@@ -892,10 +925,10 @@
<p>For more information, see
<a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a>
of the log4j manual.</p>
-<a name="N10272"></a><a name="sc_troubleshooting"></a>
+<a name="N10298"></a><a name="sc_troubleshooting"></a>
<h3 class="h4">Troubleshooting</h3>
<p></p>
-<a name="N1027B"></a><a name="sc_configuration"></a>
+<a name="N102A1"></a><a name="sc_configuration"></a>
<h3 class="h4">Configuration Parameters</h3>
<p>ZooKeeper's behavior is governed by the ZooKeeper configuration
file. This file is designed so that the exact same file can be used by
@@ -903,7 +936,7 @@
layouts are the same. If servers use different configuration files, care
must be taken to ensure that the list of servers in all of the different
configuration files match.</p>
-<a name="N10284"></a><a name="sc_minimumConfiguration"></a>
+<a name="N102AA"></a><a name="sc_minimumConfiguration"></a>
<h4>Minimum Configuration</h4>
<p>Here are the minimum configuration keywords that must be defined
in the configuration file:</p>
@@ -950,7 +983,7 @@
</dd>
</dl>
-<a name="N102AB"></a><a name="sc_advancedConfiguration"></a>
+<a name="N102D1"></a><a name="sc_advancedConfiguration"></a>
<h4>Advanced Configuration</h4>
<p>The configuration settings in the section are optional. You can
use them to further fine tune the behaviour of your ZooKeeper servers.
@@ -1050,7 +1083,7 @@
</dd>
</dl>
-<a name="N10314"></a><a name="sc_clusterOptions"></a>
+<a name="N1033A"></a><a name="sc_clusterOptions"></a>
<h4>Cluster Options</h4>
<p>The options in this section are designed for use with an ensemble
of servers -- that is, when deploying clusters of servers.</p>
@@ -1174,7 +1207,7 @@
</dl>
<p></p>
-<a name="N1038F"></a><a name="sc_authOptions"></a>
+<a name="N103B5"></a><a name="sc_authOptions"></a>
<h4>Authentication & Authorization Options</h4>
<p>The options in this section allow control over
authentication/authorization performed by the service.</p>
@@ -1208,7 +1241,7 @@
</dd>
</dl>
-<a name="N103B2"></a><a name="Unsafe+Options"></a>
+<a name="N103D8"></a><a name="Unsafe+Options"></a>
<h4>Unsafe Options</h4>
<p>The following options can be useful, but be careful when you use
them. The risk of each is explained along with the explanation of what
@@ -1253,7 +1286,7 @@
</dd>
</dl>
-<a name="N103E4"></a><a name="sc_zkCommands"></a>
+<a name="N1040A"></a><a name="sc_zkCommands"></a>
<h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
<p>ZooKeeper responds to a small set of commands. Each command is
composed of four letters. You issue the commands to ZooKeeper via telnet
@@ -1374,7 +1407,7 @@
<pre class="code">$ echo ruok | nc 127.0.0.1 5111
imok
</pre>
-<a name="N1044C"></a><a name="sc_dataFileManagement"></a>
+<a name="N10472"></a><a name="sc_dataFileManagement"></a>
<h3 class="h4">Data File Management</h3>
<p>ZooKeeper stores its data in a data directory and its transaction
log in a transaction log directory. By default these two directories are
@@ -1382,7 +1415,7 @@
transaction log files in a separate directory than the data files.
Throughput increases and latency decreases when transaction logs reside
on a dedicated log devices.</p>
-<a name="N10455"></a><a name="The+Data+Directory"></a>
+<a name="N1047B"></a><a name="The+Data+Directory"></a>
<h4>The Data Directory</h4>
<p>This directory has two files in it:</p>
<ul>
@@ -1428,14 +1461,14 @@
idempotent nature of its updates. By replaying the transaction log
against fuzzy snapshots ZooKeeper gets the state of the system at the
end of the log.</p>
-<a name="N10491"></a><a name="The+Log+Directory"></a>
+<a name="N104B7"></a><a name="The+Log+Directory"></a>
<h4>The Log Directory</h4>
<p>The Log Directory contains the ZooKeeper transaction logs.
Before any update takes place, ZooKeeper ensures that the transaction
that represents the update is written to non-volatile storage. A new
log file is started each time a snapshot is begun. The log file's
suffix is the first zxid written to that log.</p>
-<a name="N1049B"></a><a name="sc_filemanagement"></a>
+<a name="N104C1"></a><a name="sc_filemanagement"></a>
<h4>File Management</h4>
<p>The format of snapshot and log files does not change between
standalone ZooKeeper servers and different configurations of
@@ -1455,7 +1488,7 @@
this document for more details on setting a retention policy
and maintenance of ZooKeeper storage.
</p>
-<a name="N104B0"></a><a name="sc_commonProblems"></a>
+<a name="N104D6"></a><a name="sc_commonProblems"></a>
<h3 class="h4">Things to Avoid</h3>
<p>Here are some common problems you can avoid by configuring
ZooKeeper correctly:</p>
@@ -1509,7 +1542,7 @@
</dd>
</dl>
-<a name="N104D4"></a><a name="sc_bestPractices"></a>
+<a name="N104FA"></a><a name="sc_bestPractices"></a>
<h3 class="h4">Best Practices</h3>
<p>For best results, take note of the following list of good
Zookeeper practices:</p>
Modified: hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/docs/zookeeperAdmin.pdf?rev=915956&r1=915955&r2=915956&view=diff
==============================================================================
Binary files - no diff available.
Modified: hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
URL: http://svn.apache.org/viewvc/hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml?rev=915956&r1=915955&r2=915956&view=diff
==============================================================================
--- hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml (original)
+++ hadoop/zookeeper/trunk/src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml Wed Feb 24 20:04:06 2010
@@ -299,6 +299,10 @@
</listitem>
<listitem>
+ <para><xref linkend="sc_supervision" /></para>
+ </listitem>
+
+ <listitem>
<para><xref linkend="sc_monitoring" /></para>
</listitem>
@@ -492,10 +496,39 @@
</section>
+ <section id="sc_supervision">
+ <title>Supervision</title>
+
+ <para>You will want to have a supervisory process that manages
+ each of your ZooKeeper server processes (JVM). The ZK server is
+ designed to be "fail fast" meaning that it will shutdown
+ (process exit) if an error occurs that it cannot recover
+ from. As a ZooKeeper serving cluster is highly reliable, this
+ means that while the server may go down the cluster as a whole
+ is still active and serving requests. Additionally, as the
+ cluster is "self healing" the failed server once restarted will
+ automatically rejoin the ensemble w/o any manual
+ interaction.</para>
+
+ <para>Having a supervisory process such as <ulink
+ url="http://cr.yp.to/daemontools.html">daemontools</ulink> or
+ <ulink
+ url="http://en.wikipedia.org/wiki/Service_Management_Facility">SMF</ulink>
+ (other options for supervisory process are also available, it's
+ up to you which one you would like to use, these are just two
+ examples) managing your ZooKeeper server ensures that if the
+ process does exit abnormally it will automatically be restarted
+ and will quickly rejoin the cluster.</para>
+ </section>
+
<section id="sc_monitoring">
<title>Monitoring</title>
- <para></para>
+ <para>The ZooKeeper service can be monitored in one of two
+ primary ways; 1) the command port through the use of <ulink
+ url="#sc_zkCommands">4 letter words</ulink> and 2) <ulink
+ url="zookeeperJMX.html">JMX</ulink>. See the appropriate section for
+ your environment/requirements.</para>
</section>
<section id="sc_logging">