You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-commits@hadoop.apache.org by sz...@apache.org on 2009/08/04 20:30:34 UTC
svn commit: r800910 - in /hadoop/hdfs/trunk: ./
src/docs/src/documentation/content/xdocs/
src/docs/src/documentation/resources/images/
Author: szetszwo
Date: Tue Aug 4 18:30:34 2009
New Revision: 800910
URL: http://svn.apache.org/viewvc?rev=800910&view=rev
Log:
HDFS-498. Add development guide and documentation for the fault injection framework. Contributed by Konstantin Boudnik
Added:
hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/faultinject_framework.xml
hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.gif (with props)
hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.odg (with props)
Modified:
hadoop/hdfs/trunk/CHANGES.txt
hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/site.xml
Modified: hadoop/hdfs/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/hadoop/hdfs/trunk/CHANGES.txt?rev=800910&r1=800909&r2=800910&view=diff
==============================================================================
--- hadoop/hdfs/trunk/CHANGES.txt (original)
+++ hadoop/hdfs/trunk/CHANGES.txt Tue Aug 4 18:30:34 2009
@@ -13,8 +13,6 @@
HDFS-461. Tool to analyze file size distribution in HDFS. (shv)
- HDFS-446. Improvements to Offline Image Viewer. (Jakob Homan via shv)
-
IMPROVEMENTS
HDFS-381. Remove blocks from DataNode maps when corresponding file
@@ -70,6 +68,11 @@
HDFS-504. Update the modification time of a file when the file
is closed. (Chun Zhang via dhruba)
+ HDFS-446. Improvements to Offline Image Viewer. (Jakob Homan via shv)
+
+ HDFS-498. Add development guide and documentation for the fault injection
+ framework. (Konstantin Boudnik via szetszwo)
+
BUG FIXES
HDFS-76. Better error message to users when commands fail because of
lack of quota. Allow quota to be set even if the limit is lower than
Added: hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/faultinject_framework.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/faultinject_framework.xml?rev=800910&view=auto
==============================================================================
--- hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/faultinject_framework.xml (added)
+++ hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/faultinject_framework.xml Tue Aug 4 18:30:34 2009
@@ -0,0 +1,390 @@
+<?xml version="1.0"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+
+<document>
+ <header>
+ <title>Fault injection Framework and Development Guide</title>
+ </header>
+
+ <body>
+ <section>
+ <title>Introduction</title>
+ <p>The following is a brief help for Hadoops' Fault Injection (FI)
+ Framework and Developer's Guide for those who will be developing
+ their own faults (aspects).
+ </p>
+ <p>An idea of Fault Injection (FI) is fairly simple: it is an
+ infusion of errors and exceptions into an application's logic to
+ achieve a higher coverage and fault tolerance of the system.
+ Different implementations of this idea are available at this day.
+ Hadoop's FI framework is built on top of Aspect Oriented Paradigm
+ (AOP) implemented by AspectJ toolkit.
+ </p>
+ </section>
+ <section>
+ <title>Assumptions</title>
+ <p>The current implementation of the framework assumes that the faults it
+ will be emulating are of non-deterministic nature. i.e. the moment
+ of a fault's happening isn't known in advance and is a coin-flip
+ based.
+ </p>
+ </section>
+ <section>
+ <title>Architecture of the Fault Injection Framework</title>
+ <figure src="images/FI-framework.gif" alt="Components layout" />
+ <section>
+ <title>Configuration management</title>
+ <p>This piece of the framework allow to
+ set expectations for faults to happen. The settings could be applied
+ either statically (in advance) or in a runtime. There's two ways to
+ configure desired level of faults in the framework:
+ </p>
+ <ul>
+ <li>
+ editing
+ <code>src/aop/fi-site.xml</code>
+ configuration file. This file is similar to other Hadoop's config
+ files
+ </li>
+ <li>
+ setting system properties of JVM through VM startup parameters or in
+ <code>build.properties</code>
+ file
+ </li>
+ </ul>
+ </section>
+ <section>
+ <title>Probability model</title>
+ <p>This fundamentally is a coin flipper. The methods of this class are
+ getting a random number between 0.0
+ and 1.0 and then checking if new number has happened to be in the
+ range of
+ 0.0 and a configured level for the fault in question. If that
+ condition
+ is true then the fault will occur.
+ </p>
+ <p>Thus, to guarantee a happening of a fault one needs to set an
+ appropriate level to 1.0.
+ To completely prevent a fault from happening its probability level
+ has to be set to 0.0
+ </p>
+ <p><strong>Nota bene</strong>: default probability level is set to 0
+ (zero) unless the level is changed explicitly through the
+ configuration file or in the runtime. The name of the default
+ level's configuration parameter is
+ <code>fi.*</code>
+ </p>
+ </section>
+ <section>
+ <title>Fault injection mechanism: AOP and AspectJ</title>
+ <p>In the foundation of Hadoop's fault injection framework lays
+ cross-cutting concept implemented by AspectJ. The following basic
+ terms are important to remember:
+ </p>
+ <ul>
+ <li>
+ <strong>A cross-cutting concept</strong>
+ (aspect) is behavior, and often data, that is used across the scope
+ of a piece of software
+ </li>
+ <li>In AOP, the
+ <strong>aspects</strong>
+ provide a mechanism by which a cross-cutting concern can be
+ specified in a modular way
+ </li>
+ <li>
+ <strong>Advice</strong>
+ is the
+ code that is executed when an aspect is invoked
+ </li>
+ <li>
+ <strong>Join point</strong>
+ (or pointcut) is a specific
+ point within the application that may or not invoke some advice
+ </li>
+ </ul>
+ </section>
+ <section>
+ <title>Existing join points</title>
+ <p>
+ The following readily available join points are provided by AspectJ:
+ </p>
+ <ul>
+ <li>Join when a method is called
+ </li>
+ <li>Join during a method's execution
+ </li>
+ <li>Join when a constructor is invoked
+ </li>
+ <li>Join during a constructor's execution
+ </li>
+ <li>Join during aspect advice execution
+ </li>
+ <li>Join before an object is initialized
+ </li>
+ <li>Join during object initialization
+ </li>
+ <li>Join during static initializer execution
+ </li>
+ <li>Join when a class's field is referenced
+ </li>
+ <li>Join when a class's field is assigned
+ </li>
+ <li>Join when a handler is executed
+ </li>
+ </ul>
+ </section>
+ </section>
+ <section>
+ <title>Aspects examples</title>
+ <source>
+package org.apache.hadoop.hdfs.server.datanode;
+
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fi.ProbabilityModel;
+import org.apache.hadoop.hdfs.server.datanode.DataNode;
+import org.apache.hadoop.util.DiskChecker.*;
+
+import java.io.IOException;
+import java.io.OutputStream;
+import java.io.DataOutputStream;
+
+/**
+* This aspect takes care about faults injected into datanode.BlockReceiver
+* class
+*/
+public aspect BlockReceiverAspects {
+ public static final Log LOG = LogFactory.getLog(BlockReceiverAspects.class);
+
+ public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
+ pointcut callReceivePacket() : call (* OutputStream.write(..))
+ && withincode (* BlockReceiver.receivePacket(..))
+ // to further limit the application of this aspect a very narrow 'target' can be used as follows
+ // && target(DataOutputStream)
+ && !within(BlockReceiverAspects +);
+
+ before () throws IOException : callReceivePacket () {
+ if (ProbabilityModel.injectCriteria(BLOCK_RECEIVER_FAULT)) {
+ LOG.info("Before the injection point");
+ Thread.dumpStack();
+ throw new DiskOutOfSpaceException ("FI: injected fault point at " +
+ thisJoinPoint.getStaticPart( ).getSourceLocation());
+ }
+ }
+}
+ </source>
+ <p>
+ The aspect has two main parts: the join point
+ <code>pointcut callReceivepacket()</code>
+ which servers as an identification mark of a specific point (in control
+ and/or data flow) in the life of an application. A call to the advice -
+ <code>before () throws IOException : callReceivepacket()</code>
+ - will be
+ <a href="#Putting+it+all+together">injected</a>
+ before that specific spot of the application's code.
+ </p>
+
+ <p>The pointcut identifies an invocation of class'
+ <code>java.io.OutputStream write()</code>
+ method
+ with any number of parameters and any return type. This invoke should
+ take place within the body of method
+ <code>receivepacket()</code>
+ from class<code>BlockReceiver</code>.
+ The method can have any parameters and any return type. possible
+ invocations of
+ <code>write()</code>
+ method happening anywhere within the aspect
+ <code>BlockReceiverAspects</code>
+ or its heirs will be ignored.
+ </p>
+ <p><strong>Note 1</strong>: This short example doesn't illustrate
+ the fact that you can have more than a single injection point per
+ class. In such a case the names of the faults have to be different
+ if a developer wants to trigger them separately.
+ </p>
+ <p><strong>Note 2</strong>: After
+ <a href="#Putting+it+all+together">injection step</a>
+ you can verify that the faults were properly injected by
+ searching for
+ <code>ajc</code>
+ keywords in a disassembled class file.
+ </p>
+
+ </section>
+
+ <section>
+ <title>Fault naming convention & namespaces</title>
+ <p>For the sake of unified naming
+ convention the following two types of names are recommended for a
+ new aspects development:</p>
+ <ul>
+ <li>Activity specific notation (as
+ when we don't care about a particular location of a fault's
+ happening). In this case the name of the fault is rather abstract:
+ <code>fi.hdfs.DiskError</code>
+ </li>
+ <li>Location specific notation.
+ Here, the fault's name is mnemonic as in:
+ <code>fi.hdfs.datanode.BlockReceiver[optional location details]</code>
+ </li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Development tools</title>
+ <ul>
+ <li>Eclipse
+ <a href="http://www.eclipse.org/ajdt/">AspectJ
+ Development Toolkit
+ </a>
+ might help you in the aspects' development
+ process.
+ </li>
+ <li>IntelliJ IDEA provides AspectJ weaver and Spring-AOP plugins
+ </li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Putting it all together</title>
+ <p>Faults (or aspects) have to injected (or woven) together before
+ they can be used. Here's a step-by-step instruction how this can be
+ done.</p>
+ <p>Weaving aspects in place:</p>
+ <source>
+% ant injectfaults
+ </source>
+ <p>If you
+ misidentified the join point of your aspect then you'll see a
+ warning similar to this one below when 'injectfaults' target is
+ completed:</p>
+ <source>
+[iajc] warning at
+src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
+ BlockReceiverAspects.aj:44::0
+advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
+has not been applied [Xlint:adviceDidNotMatch]
+ </source>
+ <p>It isn't an error, so the build will report the successful result.
+
+ To prepare dev.jar file with all your faults weaved in
+ place run (HDFS-475 pending)</p>
+ <source>
+% ant jar-fault-inject
+ </source>
+
+ <p>Test jars can be created by</p>
+ <source>
+% ant jar-test-fault-inject
+ </source>
+
+ <p>To run HDFS tests with faults injected:</p>
+ <source>
+% ant run-test-hdfs-fault-inject
+ </source>
+ <section>
+ <title>How to use fault injection framework</title>
+ <p>Faults could be triggered by the following two meanings:
+ </p>
+ <ul>
+ <li>In the runtime as:
+ <source>
+% ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12
+ </source>
+ To set a certain level, e.g. 25%, of all injected faults one can run
+ <br/>
+ <source>
+% ant run-test-hdfs-fault-inject -Dfi.*=0.25
+ </source>
+ </li>
+ <li>or from a program as follows:
+ </li>
+ </ul>
+ <source>
+package org.apache.hadoop.fs;
+
+import org.junit.Test;
+import org.junit.Before;
+import junit.framework.TestCase;
+
+public class DemoFiTest extends TestCase {
+ public static final String BLOCK_RECEIVER_FAULT="hdfs.datanode.BlockReceiver";
+ @Override
+ @Before
+ public void setUp(){
+ //Setting up the test's environment as required
+ }
+
+ @Test
+ public void testFI() {
+ // It triggers the fault, assuming that there's one called 'hdfs.datanode.BlockReceiver'
+ System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.12");
+ //
+ // The main logic of your tests goes here
+ //
+ // Now set the level back to 0 (zero) to prevent this fault from happening again
+ System.setProperty("fi." + BLOCK_RECEIVER_FAULT, "0.0");
+ // or delete its trigger completely
+ System.getProperties().remove("fi." + BLOCK_RECEIVER_FAULT);
+ }
+
+ @Override
+ @After
+ public void tearDown() {
+ //Cleaning up test test environment
+ }
+}
+ </source>
+ <p>
+ as you can see above these two methods do the same thing. They are
+ setting the probability level of
+ <code>hdfs.datanode.BlockReceiver</code>
+ at 12%.
+ The difference, however, is that the program provides more
+ flexibility and allows to turn a fault off when a test doesn't need
+ it anymore.
+ </p>
+ </section>
+ </section>
+
+ <section>
+ <title>Additional information and contacts</title>
+ <p>This two sources of information seem to be particularly
+ interesting and worth further reading:
+ </p>
+ <ul>
+ <li>
+ <a href="http://www.eclipse.org/aspectj/doc/next/devguide/">
+ http://www.eclipse.org/aspectj/doc/next/devguide/
+ </a>
+ </li>
+ <li>AspectJ Cookbook (ISBN-13: 978-0-596-00654-9)
+ </li>
+ </ul>
+ <p>Should you have any farther comments or questions to the author
+ check
+ <a href="http://issues.apache.org/jira/browse/HDFS-435">HDFS-435</a>
+ </p>
+ </section>
+ </body>
+</document>
Modified: hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/site.xml?rev=800910&r1=800909&r2=800910&view=diff
==============================================================================
--- hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/hdfs/trunk/src/docs/src/documentation/content/xdocs/site.xml Tue Aug 4 18:30:34 2009
@@ -60,6 +60,9 @@
<hdfs_SLG label="Synthetic Load Generator Guide" href="SLG_user_guide.html" />
<hdfs_imageviewer label="Offline Image Viewer Guide" href="hdfs_imageviewer.html" />
<hdfs_libhdfs label="C API libhdfs" href="libhdfs.html" />
+ <docs label="Testing">
+ <faultinject_framework label="Fault Injection" href="faultinject_framework.html" />
+ </docs>
</docs>
<docs label="HOD">
Added: hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.gif
URL: http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.gif?rev=800910&view=auto
==============================================================================
Binary file - no diff available.
Propchange: hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.gif
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Added: hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.odg
URL: http://svn.apache.org/viewvc/hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.odg?rev=800910&view=auto
==============================================================================
Binary file - no diff available.
Propchange: hadoop/hdfs/trunk/src/docs/src/documentation/resources/images/FI-framework.odg
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream