You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by om...@apache.org on 2011/03/04 05:07:36 UTC
svn commit: r1077365 [5/5] - in
/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation:
./ content/xdocs/ resources/images/
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/libhdfs.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/libhdfs.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/libhdfs.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/libhdfs.xml Fri Mar 4 04:07:36 2011
@@ -1,18 +1,19 @@
<?xml version="1.0"?>
<!--
- Copyright 2002-2004 The Apache Software Foundation
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
-
- http://www.apache.org/licenses/LICENSE-2.0
-
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN"
@@ -21,17 +22,20 @@
<document>
<header>
-<title>C API to HDFS: libhdfs</title>
+<title>C API libhdfs</title>
<meta name="http-equiv">Content-Type</meta>
<meta name="content">text/html;</meta>
<meta name="charset">utf-8</meta>
</header>
<body>
<section>
-<title>C API to HDFS: libhdfs</title>
+<title>Overview</title>
<p>
-libhdfs is a JNI based C api for Hadoop's DFS. It provides C apis to a subset of the HDFS APIs to manipulate DFS files and the filesystem. libhdfs is part of the hadoop distribution and comes pre-compiled in ${HADOOP_HOME}/libhdfs/libhdfs.so .
+libhdfs is a JNI based C API for Hadoop's Distributed File System (HDFS).
+It provides C APIs to a subset of the HDFS APIs to manipulate HDFS files and
+the filesystem. libhdfs is part of the Hadoop distribution and comes
+pre-compiled in ${HADOOP_HOME}/libhdfs/libhdfs.so .
</p>
</section>
@@ -46,7 +50,7 @@ The header file for libhdfs describes ea
</p>
</section>
<section>
-<title>A sample program</title>
+<title>A Sample Program</title>
<source>
#include "hdfs.h"
@@ -73,24 +77,36 @@ int main(int argc, char **argv) {
</section>
<section>
-<title>How to link with the library</title>
+<title>How To Link With The Library</title>
<p>
-See the Makefile for hdfs_test.c in the libhdfs source directory (${HADOOP_HOME}/src/c++/libhdfs/Makefile) or something like:
+See the Makefile for hdfs_test.c in the libhdfs source directory (${HADOOP_HOME}/src/c++/libhdfs/Makefile) or something like:<br />
gcc above_sample.c -I${HADOOP_HOME}/src/c++/libhdfs -L${HADOOP_HOME}/libhdfs -lhdfs -o above_sample
</p>
</section>
<section>
-<title>Common problems</title>
+<title>Common Problems</title>
<p>
-The most common problem is the CLASSPATH is not set properly when calling a program that uses libhdfs. Make sure you set it to all the hadoop jars needed to run Hadoop itself. Currently, there is no way to programmatically generate the classpath, but a good bet is to include all the jar files in ${HADOOP_HOME} and ${HADOOP_HOME}/lib as well as the right configuration directory containing hdfs-site.xml
+The most common problem is the CLASSPATH is not set properly when calling a program that uses libhdfs.
+Make sure you set it to all the Hadoop jars needed to run Hadoop itself. Currently, there is no way to
+programmatically generate the classpath, but a good bet is to include all the jar files in ${HADOOP_HOME}
+and ${HADOOP_HOME}/lib as well as the right configuration directory containing hdfs-site.xml
</p>
</section>
<section>
-<title>libhdfs is thread safe</title>
-<p>Concurrency and Hadoop FS "handles" - the hadoop FS implementation includes a FS handle cache which caches based on the URI of the namenode along with the user connecting. So, all calls to hdfsConnect will return the same handle but calls to hdfsConnectAsUser with different users will return different handles. But, since HDFS client handles are completely thread safe, this has no bearing on concurrency.
-</p>
-<p>Concurrency and libhdfs/JNI - the libhdfs calls to JNI should always be creating thread local storage, so (in theory), libhdfs should be as thread safe as the underlying calls to the Hadoop FS.
-</p>
+<title>Thread Safe</title>
+<p>libhdfs is thread safe</p>
+<ul>
+<li>Concurrency and Hadoop FS "handles"
+<br />The Hadoop FS implementation includes a FS handle cache which caches based on the URI of the
+namenode along with the user connecting. So, all calls to hdfsConnect will return the same handle but
+calls to hdfsConnectAsUser with different users will return different handles. But, since HDFS client
+handles are completely thread safe, this has no bearing on concurrency.
+</li>
+<li>Concurrency and libhdfs/JNI
+<br />The libhdfs calls to JNI should always be creating thread local storage, so (in theory), libhdfs
+should be as thread safe as the underlying calls to the Hadoop FS.
+</li>
+</ul>
</section>
</body>
</document>
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml Fri Mar 4 04:07:36 2011
@@ -20,7 +20,7 @@
<document>
<header>
- <title>Map/Reduce Tutorial</title>
+ <title>MapReduce Tutorial</title>
</header>
<body>
@@ -29,21 +29,21 @@
<title>Purpose</title>
<p>This document comprehensively describes all user-facing facets of the
- Hadoop Map/Reduce framework and serves as a tutorial.
+ Hadoop MapReduce framework and serves as a tutorial.
</p>
</section>
<section>
- <title>Pre-requisites</title>
+ <title>Prerequisites</title>
<p>Ensure that Hadoop is installed, configured and is running. More
details:</p>
<ul>
<li>
- <a href="quickstart.html">Hadoop Quick Start</a> for first-time users.
+ <a href="single_node_setup.html">Single Node Setup</a> for first-time users.
</li>
<li>
- <a href="cluster_setup.html">Hadoop Cluster Setup</a> for large,
+ <a href="cluster_setup.html">Cluster Setup</a> for large,
distributed clusters.
</li>
</ul>
@@ -52,12 +52,12 @@
<section>
<title>Overview</title>
- <p>Hadoop Map/Reduce is a software framework for easily writing
+ <p>Hadoop MapReduce is a software framework for easily writing
applications which process vast amounts of data (multi-terabyte data-sets)
in-parallel on large clusters (thousands of nodes) of commodity
hardware in a reliable, fault-tolerant manner.</p>
- <p>A Map/Reduce <em>job</em> usually splits the input data-set into
+ <p>A MapReduce <em>job</em> usually splits the input data-set into
independent chunks which are processed by the <em>map tasks</em> in a
completely parallel manner. The framework sorts the outputs of the maps,
which are then input to the <em>reduce tasks</em>. Typically both the
@@ -66,13 +66,13 @@
tasks.</p>
<p>Typically the compute nodes and the storage nodes are the same, that is,
- the Map/Reduce framework and the Hadoop Distributed File System (see <a href="hdfs_design.html">HDFS Architecture </a>)
+ the MapReduce framework and the Hadoop Distributed File System (see <a href="hdfs_design.html">HDFS Architecture Guide</a>)
are running on the same set of nodes. This configuration
allows the framework to effectively schedule tasks on the nodes where data
is already present, resulting in very high aggregate bandwidth across the
cluster.</p>
- <p>The Map/Reduce framework consists of a single master
+ <p>The MapReduce framework consists of a single master
<code>JobTracker</code> and one slave <code>TaskTracker</code> per
cluster-node. The master is responsible for scheduling the jobs' component
tasks on the slaves, monitoring them and re-executing the failed tasks. The
@@ -89,7 +89,7 @@
information to the job-client.</p>
<p>Although the Hadoop framework is implemented in Java<sup>TM</sup>,
- Map/Reduce applications need not be written in Java.</p>
+ MapReduce applications need not be written in Java.</p>
<ul>
<li>
<a href="ext:api/org/apache/hadoop/streaming/package-summary">
@@ -100,7 +100,7 @@
<li>
<a href="ext:api/org/apache/hadoop/mapred/pipes/package-summary">
Hadoop Pipes</a> is a <a href="http://www.swig.org/">SWIG</a>-
- compatible <em>C++ API</em> to implement Map/Reduce applications (non
+ compatible <em>C++ API</em> to implement MapReduce applications (non
JNI<sup>TM</sup> based).
</li>
</ul>
@@ -109,7 +109,7 @@
<section>
<title>Inputs and Outputs</title>
- <p>The Map/Reduce framework operates exclusively on
+ <p>The MapReduce framework operates exclusively on
<code><key, value></code> pairs, that is, the framework views the
input to the job as a set of <code><key, value></code> pairs and
produces a set of <code><key, value></code> pairs as the output of
@@ -123,7 +123,7 @@
WritableComparable</a> interface to facilitate sorting by the framework.
</p>
- <p>Input and Output types of a Map/Reduce job:</p>
+ <p>Input and Output types of a MapReduce job:</p>
<p>
(input) <code><k1, v1></code>
->
@@ -144,14 +144,14 @@
<section>
<title>Example: WordCount v1.0</title>
- <p>Before we jump into the details, lets walk through an example Map/Reduce
+ <p>Before we jump into the details, lets walk through an example MapReduce
application to get a flavour for how they work.</p>
<p><code>WordCount</code> is a simple application that counts the number of
occurences of each word in a given input set.</p>
<p>This works with a local-standalone, pseudo-distributed or fully-distributed
- Hadoop installation(see <a href="quickstart.html"> Hadoop Quick Start</a>).</p>
+ Hadoop installation (<a href="single_node_setup.html">Single Node Setup</a>).</p>
<section>
<title>Source Code</title>
@@ -608,7 +608,7 @@
as arguments that are unzipped/unjarred and a link with name of the
jar/zip are created in the current working directory of tasks. More
details about the command line options are available at
- <a href="commands_manual.html"> Hadoop Command Guide.</a></p>
+ <a href="commands_manual.html">Commands Guide.</a></p>
<p>Running <code>wordcount</code> example with
<code>-libjars</code> and <code>-files</code>:<br/>
@@ -696,10 +696,10 @@
</section>
<section>
- <title>Map/Reduce - User Interfaces</title>
+ <title>MapReduce - User Interfaces</title>
<p>This section provides a reasonable amount of detail on every user-facing
- aspect of the Map/Reduce framwork. This should help users implement,
+ aspect of the MapReduce framework. This should help users implement,
configure and tune their jobs in a fine-grained manner. However, please
note that the javadoc for each class/interface remains the most
comprehensive documentation available; this is only meant to be a tutorial.
@@ -738,7 +738,7 @@
to be of the same type as the input records. A given input pair may
map to zero or many output pairs.</p>
- <p>The Hadoop Map/Reduce framework spawns one map task for each
+ <p>The Hadoop MapReduce framework spawns one map task for each
<code>InputSplit</code> generated by the <code>InputFormat</code> for
the job.</p>
@@ -949,7 +949,7 @@
<title>Reporter</title>
<p><a href="ext:api/org/apache/hadoop/mapred/reporter">
- Reporter</a> is a facility for Map/Reduce applications to report
+ Reporter</a> is a facility for MapReduce applications to report
progress, set application-level status messages and update
<code>Counters</code>.</p>
@@ -972,12 +972,12 @@
<p><a href="ext:api/org/apache/hadoop/mapred/outputcollector">
OutputCollector</a> is a generalization of the facility provided by
- the Map/Reduce framework to collect data output by the
+ the MapReduce framework to collect data output by the
<code>Mapper</code> or the <code>Reducer</code> (either the
intermediate outputs or the output of the job).</p>
</section>
- <p>Hadoop Map/Reduce comes bundled with a
+ <p>Hadoop MapReduce comes bundled with a
<a href="ext:api/org/apache/hadoop/mapred/lib/package-summary">
library</a> of generally useful mappers, reducers, and partitioners.</p>
</section>
@@ -986,10 +986,10 @@
<title>Job Configuration</title>
<p><a href="ext:api/org/apache/hadoop/mapred/jobconf">
- JobConf</a> represents a Map/Reduce job configuration.</p>
+ JobConf</a> represents a MapReduce job configuration.</p>
<p><code>JobConf</code> is the primary interface for a user to describe
- a Map/Reduce job to the Hadoop framework for execution. The framework
+ a MapReduce job to the Hadoop framework for execution. The framework
tries to faithfully execute the job as described by <code>JobConf</code>,
however:</p>
<ul>
@@ -1057,7 +1057,7 @@
<code>-Djava.library.path=<></code> etc. If the
<code>mapred.{map|reduce}.child.java.opts</code> parameters contains the
symbol <em>@taskid@</em> it is interpolated with value of
- <code>taskid</code> of the Map/Reduce task.</p>
+ <code>taskid</code> of the MapReduce task.</p>
<p>Here is an example with multiple arguments and substitutions,
showing jvm GC logging, and start of a passwordless JVM JMX agent so that
@@ -1110,7 +1110,7 @@
for configuring the launched child tasks from task tracker. Configuring
the memory options for daemons is documented in
<a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
- cluster_setup.html </a></p>
+ Configuring the Environment of the Hadoop Daemons</a>.</p>
<p>The memory available to some parts of the framework is also
configurable. In map and reduce tasks, performance may be influenced
@@ -1460,7 +1460,7 @@
with the <code>JobTracker</code>.</p>
<p><code>JobClient</code> provides facilities to submit jobs, track their
- progress, access component-tasks' reports and logs, get the Map/Reduce
+ progress, access component-tasks' reports and logs, get the MapReduce
cluster's status information and so on.</p>
<p>The job submission process involves:</p>
@@ -1472,7 +1472,7 @@
<code>DistributedCache</code> of the job, if necessary.
</li>
<li>
- Copying the job's jar and configuration to the Map/Reduce system
+ Copying the job's jar and configuration to the MapReduce system
directory on the <code>FileSystem</code>.
</li>
<li>
@@ -1512,7 +1512,7 @@
<code>mapreduce.cluster.job-authorization-enabled</code> is set to
true. When enabled, access control checks are done by the JobTracker
and the TaskTracker before allowing users to view
- job details or to modify a job using Map/Reduce APIs,
+ job details or to modify a job using MapReduce APIs,
CLI or web user interfaces.</p>
<p>A job submitter can specify access control lists for viewing or
@@ -1563,8 +1563,8 @@
<section>
<title>Job Control</title>
- <p>Users may need to chain Map/Reduce jobs to accomplish complex
- tasks which cannot be done via a single Map/Reduce job. This is fairly
+ <p>Users may need to chain MapReduce jobs to accomplish complex
+ tasks which cannot be done via a single MapReduce job. This is fairly
easy since the output of the job typically goes to distributed
file-system, and the output, in turn, can be used as the input for the
next job.</p>
@@ -1675,10 +1675,10 @@
<title>Job Input</title>
<p><a href="ext:api/org/apache/hadoop/mapred/inputformat">
- InputFormat</a> describes the input-specification for a Map/Reduce job.
+ InputFormat</a> describes the input-specification for a MapReduce job.
</p>
- <p>The Map/Reduce framework relies on the <code>InputFormat</code> of
+ <p>The MapReduce framework relies on the <code>InputFormat</code> of
the job to:</p>
<ol>
<li>Validate the input-specification of the job.</li>
@@ -1757,10 +1757,10 @@
<title>Job Output</title>
<p><a href="ext:api/org/apache/hadoop/mapred/outputformat">
- OutputFormat</a> describes the output-specification for a Map/Reduce
+ OutputFormat</a> describes the output-specification for a MapReduce
job.</p>
- <p>The Map/Reduce framework relies on the <code>OutputFormat</code> of
+ <p>The MapReduce framework relies on the <code>OutputFormat</code> of
the job to:</p>
<ol>
<li>
@@ -1782,9 +1782,9 @@
<p><a href="ext:api/org/apache/hadoop/mapred/outputcommitter">
OutputCommitter</a> describes the commit of task output for a
- Map/Reduce job.</p>
+ MapReduce job.</p>
- <p>The Map/Reduce framework relies on the <code>OutputCommitter</code>
+ <p>The MapReduce framework relies on the <code>OutputCommitter</code>
of the job to:</p>
<ol>
<li>
@@ -1842,7 +1842,7 @@
(using the attemptid, say <code>attempt_200709221812_0001_m_000000_0</code>),
not just per task.</p>
- <p>To avoid these issues the Map/Reduce framework, when the
+ <p>To avoid these issues the MapReduce framework, when the
<code>OutputCommitter</code> is <code>FileOutputCommitter</code>,
maintains a special
<code>${mapred.output.dir}/_temporary/_${taskid}</code> sub-directory
@@ -1866,10 +1866,10 @@
<p>Note: The value of <code>${mapred.work.output.dir}</code> during
execution of a particular task-attempt is actually
<code>${mapred.output.dir}/_temporary/_{$taskid}</code>, and this value is
- set by the Map/Reduce framework. So, just create any side-files in the
+ set by the MapReduce framework. So, just create any side-files in the
path returned by
<a href="ext:api/org/apache/hadoop/mapred/fileoutputformat/getworkoutputpath">
- FileOutputFormat.getWorkOutputPath() </a>from Map/Reduce
+ FileOutputFormat.getWorkOutputPath() </a>from MapReduce
task to take advantage of this feature.</p>
<p>The entire discussion holds true for maps of jobs with
@@ -1918,7 +1918,7 @@
<title>Counters</title>
<p><code>Counters</code> represent global counters, defined either by
- the Map/Reduce framework or applications. Each <code>Counter</code> can
+ the MapReduce framework or applications. Each <code>Counter</code> can
be of any <code>Enum</code> type. Counters of a particular
<code>Enum</code> are bunched into groups of type
<code>Counters.Group</code>.</p>
@@ -1942,7 +1942,7 @@
files efficiently.</p>
<p><code>DistributedCache</code> is a facility provided by the
- Map/Reduce framework to cache files (text, archives, jars and so on)
+ MapReduce framework to cache files (text, archives, jars and so on)
needed by applications.</p>
<p>Applications specify the files to be cached via urls (hdfs://)
@@ -2049,7 +2049,7 @@
interface supports the handling of generic Hadoop command-line options.
</p>
- <p><code>Tool</code> is the standard for any Map/Reduce tool or
+ <p><code>Tool</code> is the standard for any MapReduce tool or
application. The application should delegate the handling of
standard command-line options to
<a href="ext:api/org/apache/hadoop/util/genericoptionsparser">
@@ -2082,7 +2082,7 @@
<title>IsolationRunner</title>
<p><a href="ext:api/org/apache/hadoop/mapred/isolationrunner">
- IsolationRunner</a> is a utility to help debug Map/Reduce programs.</p>
+ IsolationRunner</a> is a utility to help debug MapReduce programs.</p>
<p>To use the <code>IsolationRunner</code>, first set
<code>keep.failed.task.files</code> to <code>true</code>
@@ -2122,7 +2122,7 @@
<p>Once user configures that profiling is needed, she/he can use
the configuration property
<code>mapred.task.profile.{maps|reduces}</code> to set the ranges
- of Map/Reduce tasks to profile. The value can be set using the api
+ of MapReduce tasks to profile. The value can be set using the api
<a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofiletaskrange">
JobConf.setProfileTaskRange(boolean,String)</a>.
By default, the specified range is <code>0-2</code>.</p>
@@ -2143,8 +2143,8 @@
<section>
<title>Debugging</title>
- <p>The Map/Reduce framework provides a facility to run user-provided
- scripts for debugging. When a Map/Reduce task fails, a user can run
+ <p>The MapReduce framework provides a facility to run user-provided
+ scripts for debugging. When a MapReduce task fails, a user can run
a debug script, to process task logs for example. The script is
given access to the task's stdout and stderr outputs, syslog and
jobconf. The output from the debug script's stdout and stderr is
@@ -2177,7 +2177,7 @@
<p>The arguments to the script are the task's stdout, stderr,
syslog and jobconf files. The debug command, run on the node where
- the Map/Reduce task failed, is: <br/>
+ the MapReduce task failed, is: <br/>
<code> $script $stdout $stderr $syslog $jobconf </code> </p>
<p> Pipes programs have the c++ program name as a fifth argument
@@ -2197,14 +2197,14 @@
<title>JobControl</title>
<p><a href="ext:api/org/apache/hadoop/mapred/jobcontrol/package-summary">
- JobControl</a> is a utility which encapsulates a set of Map/Reduce jobs
+ JobControl</a> is a utility which encapsulates a set of MapReduce jobs
and their dependencies.</p>
</section>
<section>
<title>Data Compression</title>
- <p>Hadoop Map/Reduce provides facilities for the application-writer to
+ <p>Hadoop MapReduce provides facilities for the application-writer to
specify compression for both intermediate map-outputs and the
job-outputs i.e. output of the reduces. It also comes bundled with
<a href="ext:api/org/apache/hadoop/io/compress/compressioncodec">
@@ -2333,12 +2333,12 @@
<title>Example: WordCount v2.0</title>
<p>Here is a more complete <code>WordCount</code> which uses many of the
- features provided by the Map/Reduce framework we discussed so far.</p>
+ features provided by the MapReduce framework we discussed so far.</p>
<p>This needs the HDFS to be up and running, especially for the
<code>DistributedCache</code>-related features. Hence it only works with a
- <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
- <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a>
+ <a href="single_node_setup.html#SingleNodeSetup">pseudo-distributed</a> or
+ <a href="single_node_setup.html#Fully-Distributed+Operation">fully-distributed</a>
Hadoop installation.</p>
<section>
@@ -3285,7 +3285,7 @@
<title>Highlights</title>
<p>The second version of <code>WordCount</code> improves upon the
- previous one by using some features offered by the Map/Reduce framework:
+ previous one by using some features offered by the MapReduce framework:
</p>
<ul>
<li>
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/native_libraries.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/native_libraries.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/native_libraries.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/native_libraries.xml Fri Mar 4 04:07:36 2011
@@ -1,10 +1,11 @@
<?xml version="1.0"?>
<!--
- Copyright 2002-2004 The Apache Software Foundation
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
@@ -25,114 +26,114 @@
<body>
+ <section>
+ <title>Overview</title>
+
+<p>This guide describes the native hadoop library and includes a small discussion about native shared libraries.</p>
+
+ <p><strong>Note:</strong> Depending on your environment, the term "native libraries" <em>could</em>
+ refer to all *.so's you need to compile; and, the term "native compression" <em>could</em> refer to all *.so's
+ you need to compile that are specifically related to compression.
+ Currently, however, this document only addresses the native hadoop library (<em>libhadoop.so</em>).</p>
+
+ </section>
+
<section>
- <title>Purpose</title>
-
- <p>Hadoop has native implementations of certain components for reasons of
- both performance and non-availability of Java implementations. These
- components are available in a single, dynamically-linked, native library.
- On the *nix platform it is <em>libhadoop.so</em>. This document describes
- the usage and details on how to build the native libraries.</p>
- </section>
-
- <section>
- <title>Components</title>
-
- <p>Hadoop currently has the following
- <a href="ext:api/org/apache/hadoop/io/compress/compressioncodec">
- compression codecs</a> as the native components:</p>
- <ul>
- <li><a href="ext:zlib">zlib</a></li>
- <li><a href="ext:gzip">gzip</a></li>
- <li><a href="ext:bzip">bzip2</a></li>
- </ul>
+ <title>Native Hadoop Library </title>
- <p>Of the above, the availability of native hadoop libraries is imperative
- for the gzip and bzip2 compression codecs to work.</p>
- </section>
-
+ <p>Hadoop has native implementations of certain components for
+ performance reasons and for non-availability of Java implementations. These
+ components are available in a single, dynamically-linked native library called
+ the native hadoop library. On the *nix platforms the library is named <em>libhadoop.so</em>. </p>
+
<section>
<title>Usage</title>
- <p>It is fairly simple to use the native hadoop libraries:</p>
+ <p>It is fairly easy to use the native hadoop library:</p>
- <ul>
+ <ol>
+ <li>
+ Review the <a href="#Components">components</a>.
+ </li>
<li>
- Take a look at the
- <a href="#Supported+Platforms">supported platforms</a>.
+ Review the <a href="#Supported+Platforms">supported platforms</a>.
</li>
<li>
- Either <a href="ext:releases/download">download</a> the pre-built
- 32-bit i386-Linux native hadoop libraries (available as part of hadoop
- distribution in <code>lib/native</code> directory) or
- <a href="#Building+Native+Hadoop+Libraries">build</a> them yourself.
+ Either <a href="#Download">download</a> a hadoop release, which will
+ include a pre-built version of the native hadoop library, or
+ <a href="#Build">build</a> your own version of the
+ native hadoop library. Whether you download or build, the name for the library is
+ the same: <em>libhadoop.so</em>
</li>
<li>
- Make sure you have any of or all of <strong>>zlib-1.2</strong>,
- <strong>>gzip-1.2</strong>, and <strong>>bzip2-1.0</strong>
- packages for your platform installed;
- depending on your needs.
+ Install the compression codec development packages
+ (<strong>>zlib-1.2</strong>, <strong>>gzip-1.2</strong>):
+ <ul>
+ <li>If you download the library, install one or more development packages -
+ whichever compression codecs you want to use with your deployment.</li>
+ <li>If you build the library, it is <strong>mandatory</strong>
+ to install both development packages.</li>
+ </ul>
</li>
- </ul>
-
- <p>The <code>bin/hadoop</code> script ensures that the native hadoop
- library is on the library path via the system property
- <em>-Djava.library.path=<path></em>.</p>
-
- <p>To check everything went alright check the hadoop log files for:</p>
-
- <p>
- <code>
- DEBUG util.NativeCodeLoader - Trying to load the custom-built
- native-hadoop library...
- </code><br/>
- <code>
- INFO util.NativeCodeLoader - Loaded the native-hadoop library
- </code>
- </p>
-
- <p>If something goes wrong, then:</p>
- <p>
- <code>
- INFO util.NativeCodeLoader - Unable to load native-hadoop library for
- your platform... using builtin-java classes where applicable
- </code>
+ <li>
+ Check the <a href="#Runtime">runtime</a> log files.
+ </li>
+ </ol>
+ </section>
+ <section>
+ <title>Components</title>
+ <p>The native hadoop library includes two components, the zlib and gzip
+ <a href="http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html">
+ compression codecs</a>:
</p>
+ <ul>
+ <li><a href="ext:zlib">zlib</a></li>
+ <li><a href="ext:gzip">gzip</a></li>
+ </ul>
+ <p>The native hadoop library is imperative for gzip to work.</p>
</section>
<section>
<title>Supported Platforms</title>
- <p>Hadoop native library is supported only on *nix platforms only.
- Unfortunately it is known not to work on <a href="ext:cygwin">Cygwin</a>
- and <a href="ext:osx">Mac OS X</a> and has mainly been used on the
- GNU/Linux platform.</p>
+ <p>The native hadoop library is supported on *nix platforms only.
+ The library does not to work with <a href="ext:cygwin">Cygwin</a>
+ or the <a href="ext:osx">Mac OS X</a> platform.</p>
- <p>It has been tested on the following GNU/Linux distributions:</p>
+ <p>The native hadoop library is mainly used on the GNU/Linus platform and
+ has been tested on these distributions:</p>
<ul>
<li>
- <a href="http://www.redhat.com/rhel/">RHEL4</a>/<a href="http://fedora.redhat.com/">Fedora</a>
+ <a href="http://www.redhat.com/rhel/">RHEL4</a>/<a href="http://fedoraproject.org/">Fedora</a>
</li>
<li><a href="http://www.ubuntu.com/">Ubuntu</a></li>
<li><a href="http://www.gentoo.org/">Gentoo</a></li>
</ul>
- <p>On all the above platforms a 32/64 bit Hadoop native library will work
+ <p>On all the above distributions a 32/64 bit native hadoop library will work
with a respective 32/64 bit jvm.</p>
</section>
<section>
- <title>Building Native Hadoop Libraries</title>
+ <title>Download</title>
+
+ <p>The pre-built 32-bit i386-Linux native hadoop library is available as part of the
+ hadoop distribution and is located in the <code>lib/native</code> directory. You can download the
+ hadoop distribution from <a href="ext:releases/download">Hadoop Common Releases</a>.</p>
+
+ <p>Be sure to install the zlib and/or gzip development packages - whichever compression
+ codecs you want to use with your deployment.</p>
+ </section>
+
+ <section>
+ <title>Build</title>
- <p>Hadoop native library is written in
- <a href="http://en.wikipedia.org/wiki/ANSI_C">ANSI C</a> and built using
- the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool).
- This means it should be straight-forward to build them on any platform with
- a standards compliant C compiler and the GNU autotools-chain.
- See <a href="#Supported+Platforms">supported platforms</a>.</p>
+ <p>The native hadoop library is written in <a href="http://en.wikipedia.org/wiki/ANSI_C">ANSI C</a>
+ and is built using the GNU autotools-chain (autoconf, autoheader, automake, autoscan, libtool).
+ This means it should be straight-forward to build the library on any platform with a standards-compliant
+ C compiler and the GNU autotools-chain (see the <a href="#Supported+Platforms">supported platforms</a>).</p>
- <p>In particular the various packages you would need on the target
- platform are:</p>
+ <p>The packages you need to install on the target platform are:</p>
<ul>
<li>
C compiler (e.g. <a href="http://gcc.gnu.org/">GNU C Compiler</a>)
@@ -148,52 +149,69 @@
</li>
</ul>
- <p>Once you have the pre-requisites use the standard <code>build.xml</code>
- and pass along the <code>compile.native</code> flag (set to
- <code>true</code>) to build the native hadoop library:</p>
+ <p>Once you installed the prerequisite packages use the standard hadoop <code>build.xml</code>
+ file and pass along the <code>compile.native</code> flag (set to <code>true</code>) to build the native hadoop library:</p>
<p><code>$ ant -Dcompile.native=true <target></code></p>
- <p>The native hadoop library is not built by default since not everyone is
- interested in building them.</p>
-
- <p>You should see the newly-built native hadoop library in:</p>
+ <p>You should see the newly-built library in:</p>
<p><code>$ build/native/<platform>/lib</code></p>
- <p>where <platform> is combination of the system-properties:
- <code>${os.name}-${os.arch}-${sun.arch.data.model}</code>; for e.g.
- Linux-i386-32.</p>
-
- <section>
- <title>Notes</title>
-
+ <p>where <<code>platform</code>> is a combination of the system-properties:
+ <code>${os.name}-${os.arch}-${sun.arch.data.model}</code> (for example, Linux-i386-32).</p>
+
+ <p>Please note the following:</p>
<ul>
<li>
- It is <strong>mandatory</strong> to have the
- zlib, gzip, and bzip2
- development packages on the target platform for building the
- native hadoop library; however for deployment it is sufficient to
- install one of them if you wish to use only one of them.
+ It is <strong>mandatory</strong> to install both the zlib and gzip
+ development packages on the target platform in order to build the
+ native hadoop library; however, for deployment it is sufficient to
+ install just one package if you wish to use only one codec.
</li>
<li>
- It is necessary to have the correct 32/64 libraries of both zlib
- depending on the 32/64 bit jvm for the target platform for
- building/deployment of the native hadoop library.
+ It is necessary to have the correct 32/64 libraries for zlib,
+ depending on the 32/64 bit jvm for the target platform, in order to
+ build and deploy the native hadoop library.
</li>
</ul>
- </section>
</section>
+
+ <section>
+ <title>Runtime</title>
+ <p>The <code>bin/hadoop</code> script ensures that the native hadoop
+ library is on the library path via the system property: <br/>
+ <em>-Djava.library.path=<path></em></p>
+
+ <p>During runtime, check the hadoop log files for your MapReduce tasks.</p>
+
+ <ul>
+ <li>If everything is all right, then:<br/><br/>
+ <code> DEBUG util.NativeCodeLoader - Trying to load the custom-built native-hadoop library... </code><br/>
+ <code> INFO util.NativeCodeLoader - Loaded the native-hadoop library </code><br/>
+ </li>
+
+ <li>If something goes wrong, then:<br/><br/>
+ <code>
+ INFO util.NativeCodeLoader - Unable to load native-hadoop library for
+ your platform... using builtin-java classes where applicable
+ </code>
+
+ </li>
+ </ul>
+ </section>
+ </section>
+
<section>
- <title> Loading native libraries through DistributedCache </title>
- <p>User can load native shared libraries through
- <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
- for <em>distributing</em> and <em>symlinking</em> the library files</p>
+ <title>Native Shared Libraries</title>
+ <p>You can load <strong>any</strong> native shared library using
+ <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
+ for <em>distributing</em> and <em>symlinking</em> the library files.</p>
- <p>Here is an example, describing how to distribute the library and
- load it from map/reduce task. </p>
+ <p>This example shows you how to distribute a shared library, <code>mylib.so</code>,
+ and load it from a MapReduce task.</p>
<ol>
- <li> First copy the library to the HDFS. <br/>
+ <li> First copy the library to the HDFS: <br/>
<code>bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1</code>
</li>
<li> The job launching program should contain the following: <br/>
@@ -201,10 +219,13 @@
<code> DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so.1#mylib.so", conf);
</code>
</li>
- <li> The map/reduce task can contain: <br/>
+ <li> The MapReduce task can contain: <br/>
<code> System.loadLibrary("mylib.so"); </code>
</li>
</ol>
+
+ <p><br/><strong>Note:</strong> If you downloaded or built the native hadoop library, you donât need to use DistibutedCache to
+ make the library available to your MapReduce tasks.</p>
</section>
</body>
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/service_level_auth.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/service_level_auth.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/service_level_auth.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/service_level_auth.xml Fri Mar 4 04:07:36 2011
@@ -1,10 +1,11 @@
<?xml version="1.0"?>
<!--
- Copyright 2002-2004 The Apache Software Foundation
-
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
@@ -33,17 +34,15 @@
</section>
<section>
- <title>Pre-requisites</title>
+ <title>Prerequisites</title>
- <p>Ensure that Hadoop is installed, configured and setup correctly. More
- details:</p>
+ <p>Make sure Hadoop is installed, configured and setup correctly. For more information see: </p>
<ul>
<li>
- <a href="quickstart.html">Hadoop Quick Start</a> for first-time users.
+ <a href="single_node_setup.html">Single Node Setup</a> for first-time users.
</li>
<li>
- <a href="cluster_setup.html">Hadoop Cluster Setup</a> for large,
- distributed clusters.
+ <a href="cluster_setup.html">Cluster Setup</a> for large, distributed clusters.
</li>
</ul>
</section>
@@ -54,7 +53,7 @@
<p>Service Level Authorization is the initial authorization mechanism to
ensure clients connecting to a particular Hadoop <em>service</em> have the
necessary, pre-configured, permissions and are authorized to access the given
- service. For e.g. a Map/Reduce cluster can use this mechanism to allow a
+ service. For example, a MapReduce cluster can use this mechanism to allow a
configured list of users/groups to submit jobs.</p>
<p>The <code>${HADOOP_CONF_DIR}/hadoop-policy.xml</code> configuration file
@@ -197,33 +196,33 @@
<title>Examples</title>
<p>Allow only users <code>alice</code>, <code>bob</code> and users in the
- <code>mapreduce</code> group to submit jobs to the Map/Reduce cluster:</p>
+ <code>mapreduce</code> group to submit jobs to the MapReduce cluster:</p>
- <table>
- <tr><td> <property></td></tr>
- <tr><td> <name>security.job.submission.protocol.acl</name></td></tr>
- <tr><td> <value>alice,bob mapreduce</value></td></tr>
- <tr><td> </property></td></tr>
- </table>
+<source>
+<property>
+ <name>security.job.submission.protocol.acl</name>
+ <value>alice,bob mapreduce</value>
+</property>
+</source>
<p></p><p>Allow only DataNodes running as the users who belong to the
group <code>datanodes</code> to communicate with the NameNode:</p>
-
- <table>
- <tr><td> <property></td></tr>
- <tr><td> <name>security.datanode.protocol.acl</name></td></tr>
- <tr><td> <value> datanodes</value></td></tr>
- <tr><td> </property></td></tr>
- </table>
+
+<source>
+<property>
+ <name>security.datanode.protocol.acl</name>
+ <value>datanodes</value>
+</property>
+</source>
<p></p><p>Allow any user to talk to the HDFS cluster as a DFSClient:</p>
-
- <table>
- <tr><td> <property></td></tr>
- <tr><td> <name>security.client.protocol.acl</name></td></tr>
- <tr><td> <value>*</value></td></tr>
- <tr><td> </property></td></tr>
- </table>
+
+<source>
+<property>
+ <name>security.client.protocol.acl</name>
+ <value>*</value>
+</property>
+</source>
</section>
</section>
Added: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/single_node_setup.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/single_node_setup.xml?rev=1077365&view=auto
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/single_node_setup.xml (added)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/single_node_setup.xml Fri Mar 4 04:07:36 2011
@@ -0,0 +1,293 @@
+<?xml version="1.0"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+
+ <header>
+ <title>Single Node Setup</title>
+ </header>
+
+ <body>
+
+ <section>
+ <title>Purpose</title>
+
+ <p>This document describes how to set up and configure a single-node Hadoop
+ installation so that you can quickly perform simple operations using Hadoop
+ MapReduce and the Hadoop Distributed File System (HDFS).</p>
+
+ </section>
+
+ <section id="PreReqs">
+ <title>Prerequisites</title>
+
+ <section>
+ <title>Supported Platforms</title>
+
+ <ul>
+ <li>
+ GNU/Linux is supported as a development and production platform.
+ Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
+ </li>
+ <li>
+ Win32 is supported as a <em>development platform</em>. Distributed
+ operation has not been well tested on Win32, so it is not
+ supported as a <em>production platform</em>.
+ </li>
+ </ul>
+ </section>
+
+ <section>
+ <title>Required Software</title>
+ <p>Required software for Linux and Windows include:</p>
+ <ol>
+ <li>
+ Java<sup>TM</sup> 1.6.x, preferably from Sun, must be installed.
+ </li>
+ <li>
+ <strong>ssh</strong> must be installed and <strong>sshd</strong> must
+ be running to use the Hadoop scripts that manage remote Hadoop
+ daemons.
+ </li>
+ </ol>
+ <p>Additional requirements for Windows include:</p>
+ <ol>
+ <li>
+ <a href="http://www.cygwin.com/">Cygwin</a> - Required for shell
+ support in addition to the required software above.
+ </li>
+ </ol>
+ </section>
+
+ <section>
+ <title>Installing Software</title>
+
+ <p>If your cluster doesn't have the requisite software you will need to
+ install it.</p>
+
+ <p>For example on Ubuntu Linux:</p>
+ <p>
+ <code>$ sudo apt-get install ssh</code><br/>
+ <code>$ sudo apt-get install rsync</code>
+ </p>
+
+ <p>On Windows, if you did not install the required software when you
+ installed cygwin, start the cygwin installer and select the packages:</p>
+ <ul>
+ <li>openssh - the <em>Net</em> category</li>
+ </ul>
+ </section>
+
+ </section>
+
+ <section>
+ <title>Download</title>
+
+ <p>
+ To get a Hadoop distribution, download a recent
+ <a href="ext:releases">stable release</a> from one of the Apache Download
+ Mirrors.
+ </p>
+ </section>
+
+ <section>
+ <title>Prepare to Start the Hadoop Cluster</title>
+ <p>
+ Unpack the downloaded Hadoop distribution. In the distribution, edit the
+ file <code>conf/hadoop-env.sh</code> to define at least
+ <code>JAVA_HOME</code> to be the root of your Java installation.
+ </p>
+
+ <p>
+ Try the following command:<br/>
+ <code>$ bin/hadoop</code><br/>
+ This will display the usage documentation for the <strong>hadoop</strong>
+ script.
+ </p>
+
+ <p>Now you are ready to start your Hadoop cluster in one of the three supported
+ modes:
+ </p>
+ <ul>
+ <li>Local (Standalone) Mode</li>
+ <li>Pseudo-Distributed Mode</li>
+ <li>Fully-Distributed Mode</li>
+ </ul>
+ </section>
+
+ <section id="Local">
+ <title>Standalone Operation</title>
+
+ <p>By default, Hadoop is configured to run in a non-distributed
+ mode, as a single Java process. This is useful for debugging.</p>
+
+ <p>
+ The following example copies the unpacked <code>conf</code> directory to
+ use as input and then finds and displays every match of the given regular
+ expression. Output is written to the given <code>output</code> directory.
+ <br/>
+ <code>$ mkdir input</code><br/>
+ <code>$ cp conf/*.xml input</code><br/>
+ <code>
+ $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
+ </code><br/>
+ <code>$ cat output/*</code>
+ </p>
+ </section>
+
+ <section id="PseudoDistributed">
+ <title>Pseudo-Distributed Operation</title>
+
+ <p>Hadoop can also be run on a single-node in a pseudo-distributed mode
+ where each Hadoop daemon runs in a separate Java process.</p>
+
+ <section>
+ <title>Configuration</title>
+ <p>Use the following:
+ <br/><br/>
+ <code>conf/core-site.xml</code>:</p>
+
+ <source>
+<configuration>
+ <property>
+ <name>fs.default.name</name>
+ <value>hdfs://localhost:9000</value>
+ </property>
+</configuration>
+</source>
+
+ <p><br/><code>conf/hdfs-site.xml</code>:</p>
+<source>
+<configuration>
+ <property>
+ <name>dfs.replication</name>
+ <value>1</value>
+ </property>
+</configuration>
+</source>
+
+
+ <p><br/><code>conf/mapred-site.xml</code>:</p>
+<source>
+<configuration>
+ <property>
+ <name>mapred.job.tracker</name>
+ <value>localhost:9001</value>
+ </property>
+</configuration>
+</source>
+
+
+
+ </section>
+
+ <section>
+ <title>Setup passphraseless <em>ssh</em></title>
+
+ <p>
+ Now check that you can ssh to the localhost without a passphrase:<br/>
+ <code>$ ssh localhost</code>
+ </p>
+
+ <p>
+ If you cannot ssh to localhost without a passphrase, execute the
+ following commands:<br/>
+ <code>$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa</code><br/>
+ <code>$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys</code>
+ </p>
+ </section>
+
+ <section>
+ <title>Execution</title>
+
+ <p>
+ Format a new distributed-filesystem:<br/>
+ <code>$ bin/hadoop namenode -format</code>
+ </p>
+
+ <p>
+ Start the hadoop daemons:<br/>
+ <code>$ bin/start-all.sh</code>
+ </p>
+
+ <p>The hadoop daemon log output is written to the
+ <code>${HADOOP_LOG_DIR}</code> directory (defaults to
+ <code>${HADOOP_HOME}/logs</code>).</p>
+
+ <p>Browse the web interface for the NameNode and the JobTracker; by
+ default they are available at:</p>
+ <ul>
+ <li>
+ <code>NameNode</code> -
+ <a href="http://localhost:50070/">http://localhost:50070/</a>
+ </li>
+ <li>
+ <code>JobTracker</code> -
+ <a href="http://localhost:50030/">http://localhost:50030/</a>
+ </li>
+ </ul>
+
+ <p>
+ Copy the input files into the distributed filesystem:<br/>
+ <code>$ bin/hadoop fs -put conf input</code>
+ </p>
+
+ <p>
+ Run some of the examples provided:<br/>
+ <code>
+ $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
+ </code>
+ </p>
+
+ <p>Examine the output files:</p>
+ <p>
+ Copy the output files from the distributed filesystem to the local
+ filesytem and examine them:<br/>
+ <code>$ bin/hadoop fs -get output output</code><br/>
+ <code>$ cat output/*</code>
+ </p>
+ <p> or </p>
+ <p>
+ View the output files on the distributed filesystem:<br/>
+ <code>$ bin/hadoop fs -cat output/*</code>
+ </p>
+
+ <p>
+ When you're done, stop the daemons with:<br/>
+ <code>$ bin/stop-all.sh</code>
+ </p>
+ </section>
+ </section>
+
+ <section id="FullyDistributed">
+ <title>Fully-Distributed Operation</title>
+
+ <p>For information on setting up fully-distributed, non-trivial clusters
+ see <a href="cluster_setup.html">Cluster Setup</a>.</p>
+ </section>
+
+ <p>
+ <em>Java and JNI are trademarks or registered trademarks of
+ Sun Microsystems, Inc. in the United States and other countries.</em>
+ </p>
+
+ </body>
+
+</document>
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/site.xml Fri Mar 4 04:07:36 2011
@@ -31,52 +31,50 @@ See http://forrest.apache.org/docs/linki
<site label="Hadoop" href="" xmlns="http://apache.org/forrest/linkmap/1.0">
- <docs label="Getting Started">
- <overview label="Overview" href="index.html" />
- <quickstart label="Quick Start" href="quickstart.html" />
- <setup label="Cluster Setup" href="cluster_setup.html" />
- <mapred label="Map/Reduce Tutorial" href="mapred_tutorial.html" />
- </docs>
-
- <docs label="Programming Guides">
- <commands label="Commands" href="commands_manual.html" />
- <distcp label="DistCp" href="distcp.html" />
- <native_lib label="Native Libraries" href="native_libraries.html" />
- <streaming label="Streaming" href="streaming.html" />
- <fair_scheduler label="Fair Scheduler" href="fair_scheduler.html"/>
- <cap_scheduler label="Capacity Scheduler" href="capacity_scheduler.html"/>
- <SLA label="Service Level Authorization" href="service_level_auth.html"/>
- <vaidya label="Vaidya" href="vaidya.html"/>
- <archives label="Archives" href="hadoop_archives.html"/>
- <gridmix label="Gridmix" href="gridmix.html"/>
- <sec_impersonation label="Secure Impersonation" href="Secure_Impersonation.html"/>
- </docs>
-
- <docs label="HDFS">
- <hdfs_user label="User Guide" href="hdfs_user_guide.html" />
- <hdfs_arch label="Architecture" href="hdfs_design.html" />
- <hdfs_fs label="File System Shell Guide" href="hdfs_shell.html" />
- <hdfs_perm label="Permissions Guide" href="hdfs_permissions_guide.html" />
- <hdfs_quotas label="Quotas Guide" href="hdfs_quota_admin_guide.html" />
- <hdfs_SLG label="Synthetic Load Generator Guide" href="SLG_user_guide.html" />
- <hdfs_libhdfs label="C API libhdfs" href="libhdfs.html" />
- </docs>
-
- <docs label="HOD">
- <hod_user label="User Guide" href="hod_user_guide.html"/>
- <hod_admin label="Admin Guide" href="hod_admin_guide.html"/>
- <hod_config label="Config Guide" href="hod_config_guide.html"/>
- </docs>
-
- <docs label="Miscellaneous">
- <api label="API Docs" href="ext:api/index" />
- <jdiff label="API Changes" href="ext:jdiff/changes" />
- <wiki label="Wiki" href="ext:wiki" />
- <faq label="FAQ" href="ext:faq" />
- <relnotes label="Release Notes" href="ext:relnotes" />
- <changes label="Change Log" href="ext:changes" />
- </docs>
-
+ <docs label="Getting Started">
+ <overview label="Overview" href="index.html" />
+ <single label="Single Node Setup" href="single_node_setup.html" />
+ <cluster label="Cluster Setup" href="cluster_setup.html" />
+ </docs>
+
+ <docs label="MapReduce">
+ <mapred label="MapReduce Tutorial" href="mapred_tutorial.html" />
+ <streaming label="Hadoop Streaming" href="streaming.html" />
+ <commands label="Hadoop Commands" href="commands_manual.html" />
+ <distcp label="DistCp" href="distcp.html" />
+ <vaidya label="Vaidya" href="vaidya.html"/>
+ <archives label="Hadoop Archives" href="hadoop_archives.html"/>
+ <gridmix label="Gridmix" href="gridmix.html"/>
+ <cap_scheduler label="Capacity Scheduler" href="capacity_scheduler.html"/>
+ <fair_scheduler label="Fair Scheduler" href="fair_scheduler.html"/>
+ <cap_scheduler label="Hod Scheduler" href="hod_scheduler.html"/>
+ </docs>
+
+ <docs label="HDFS">
+ <hdfs_user label="HDFS Users " href="hdfs_user_guide.html" />
+ <hdfs_arch label="HDFS Architecture" href="hdfs_design.html" />
+ <hdfs_perm label="Permissions" href="hdfs_permissions_guide.html" />
+ <hdfs_quotas label="Quotas" href="hdfs_quota_admin_guide.html" />
+ <hdfs_SLG label="Synthetic Load Generator" href="SLG_user_guide.html" />
+ <hdfs_libhdfs label="C API libhdfs" href="libhdfs.html" />
+ </docs>
+
+ <docs label="Common">
+ <fsshell label="File System Shell" href="file_system_shell.html" />
+ <SLA label="Service Level Authorization" href="service_level_auth.html"/>
+ <native_lib label="Native Libraries" href="native_libraries.html" />
+ </docs>
+
+ <docs label="Miscellaneous">
+ <sec_impersonation label="Secure Impersonation" href="Secure_Impersonation.html"/>
+ <api label="API Docs" href="ext:api/index" />
+ <jdiff label="API Changes" href="ext:jdiff/changes" />
+ <wiki label="Wiki" href="ext:wiki" />
+ <faq label="FAQ" href="ext:faq" />
+ <relnotes label="Release Notes" href="ext:relnotes" />
+ <changes label="Change Log" href="ext:changes" />
+ </docs>
+
<external-refs>
<site href="http://hadoop.apache.org/core/"/>
<lists href="http://hadoop.apache.org/core/mailing_lists.html"/>
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/streaming.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/streaming.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/streaming.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/streaming.xml Fri Mar 4 04:07:36 2011
@@ -552,6 +552,8 @@ if __name__ == "__main__":
</source>
</section>
+
+<!-- QUESTION -->
<section>
<title>Hadoop Field Selection Class</title>
<p>
@@ -789,6 +791,17 @@ For example, mapred.job.id becomes mapre
</p>
</section>
+
+<!-- QUESTION -->
+<section>
+<title>How do I get the JobConf variables in a streaming job's mapper/reducer?</title>
+<p>
+See the <a href="mapred_tutorial.html#Configured+Parameters">Configured Parameters</a>.
+During the execution of a streaming job, the names of the "mapred" parameters are transformed. The dots ( . ) become underscores ( _ ).
+For example, mapred.job.id becomes mapred_job_id and mapred.jar becomes mapred_jar. In your code, use the parameter names with the underscores.
+</p>
+</section>
+
</section>
</body>
</document>
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/vaidya.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/vaidya.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/vaidya.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/content/xdocs/vaidya.xml Fri Mar 4 04:07:36 2011
@@ -40,7 +40,7 @@
</section>
<section>
- <title>Pre-requisites</title>
+ <title>Prerequisites</title>
<p>Ensure that Hadoop is installed and configured. More details:</p>
<ul>
@@ -58,11 +58,11 @@
<p>Hadoop Vaidya (Vaidya in Sanskrit language means "one who knows", or "a physician")
is a rule based performance diagnostic tool for
- Map/Reduce jobs. It performs a post execution analysis of map/reduce
+ MapReduce jobs. It performs a post execution analysis of map/reduce
job by parsing and collecting execution statistics through job history
and job configuration files. It runs a set of predefined tests/rules
against job execution statistics to diagnose various performance problems.
- Each test rule detects a specific performance problem with the Map/Reduce job and provides
+ Each test rule detects a specific performance problem with the MapReduce job and provides
a targeted advice to the user. This tool generates an XML report based on
the evaluation results of individual test rules.
</p>
@@ -74,9 +74,9 @@
<p> This section describes main concepts and terminology involved with Hadoop Vaidya,</p>
<ul>
- <li> <em>PostExPerformanceDiagnoser</em>: This class extends the base Diagnoser class and acts as a driver for post execution performance analysis of Map/Reduce Jobs.
+ <li> <em>PostExPerformanceDiagnoser</em>: This class extends the base Diagnoser class and acts as a driver for post execution performance analysis of MapReduce Jobs.
It detects performance inefficiencies by executing a set of performance diagnosis rules against the job execution statistics.</li>
- <li> <em>Job Statistics</em>: This includes the job configuration information (job.xml) and various counters logged by Map/Reduce job as a part of the job history log
+ <li> <em>Job Statistics</em>: This includes the job configuration information (job.xml) and various counters logged by MapReduce job as a part of the job history log
file. The counters are parsed and collected into the Job Statistics data structures, which contains global job level aggregate counters and
a set of counters for each Map and Reduce task.</li>
<li> <em>Diagnostic Test/Rule</em>: This is a program logic that detects the inefficiency of M/R job based on the job statistics. The
@@ -140,8 +140,7 @@
<section>
<title>How to Write and Execute your own Tests</title>
<p>Writing and executing your own test rules is not very hard. You can take a look at Hadoop Vaidya source code for existing set of tests.
- The source code is at this <a href="http://svn.apache.org/viewvc/hadoop/core/trunk/src/contrib/vaidya/src/java/org/apache/hadoop/vaidya/">hadoop svn repository location</a>
- . The default set of tests are under <code>"postexdiagnosis/tests/"</code> folder.</p>
+ The source code is at this <a href="http://svn.apache.org/viewvc/hadoop/core/trunk/src/contrib/vaidya/src/java/org/apache/hadoop/vaidya/">hadoop svn repository location</a>. The default set of tests are under <code>"postexdiagnosis/tests/"</code> folder.</p>
<ul>
<li>Writing a test class for your new test case should extend the <code>org.apache.hadoop.vaidya.DiagnosticTest</code> class and
it should override following three methods from the base class,
Added: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/resources/images/hadoop-logo-2.gif
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/resources/images/hadoop-logo-2.gif?rev=1077365&view=auto
==============================================================================
Binary file - no diff available.
Propchange: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/resources/images/hadoop-logo-2.gif
------------------------------------------------------------------------------
svn:mime-type = application/octet-stream
Modified: hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/skinconf.xml
URL: http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/skinconf.xml?rev=1077365&r1=1077364&r2=1077365&view=diff
==============================================================================
--- hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/skinconf.xml (original)
+++ hadoop/common/branches/branch-0.20-security-patches/src/docs/src/documentation/skinconf.xml Fri Mar 4 04:07:36 2011
@@ -67,7 +67,7 @@ which will be used to configure the chos
<project-name>Hadoop</project-name>
<project-description>Scalable Computing Platform</project-description>
<project-url>http://hadoop.apache.org/core/</project-url>
- <project-logo>images/core-logo.gif</project-logo>
+ <project-logo>images/hadoop-logo-2.gif</project-logo>
<!-- group logo -->
<group-name>Hadoop</group-name>