You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-commits@hadoop.apache.org by sh...@apache.org on 2009/09/22 00:33:12 UTC
svn commit: r817449 [2/8] - in /hadoop/hdfs/branches/HDFS-265: ./
.eclipse.templates/.launches/ lib/ src/contrib/block_forensics/
src/contrib/block_forensics/client/ src/contrib/block_forensics/ivy/
src/contrib/block_forensics/src/java/org/apache/hadoo...
Added: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/commands_manual.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/commands_manual.xml?rev=817449&view=auto
==============================================================================
--- hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/commands_manual.xml (added)
+++ hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/commands_manual.xml Mon Sep 21 22:33:09 2009
@@ -0,0 +1,732 @@
+<?xml version="1.0"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+<document>
+ <header>
+ <title>Commands Guide</title>
+ </header>
+
+ <body>
+ <section>
+ <title>Overview</title>
+ <p>
+ All hadoop commands are invoked by the bin/hadoop script. Running the hadoop
+ script without any arguments prints the description for all commands.
+ </p>
+ <p>
+ <code>Usage: hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]</code>
+ </p>
+ <p>
+ Hadoop has an option parsing framework that employs parsing generic options as well as running classes.
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>--config confdir</code></td>
+ <td>Overwrites the default Configuration directory. Default is ${HADOOP_HOME}/conf.</td>
+ </tr>
+ <tr>
+ <td><code>GENERIC_OPTIONS</code></td>
+ <td>The common set of options supported by multiple commands.</td>
+ </tr>
+ <tr>
+ <td><code>COMMAND</code><br/><code>COMMAND_OPTIONS</code></td>
+ <td>Various commands with their options are described in the following sections. The commands
+ have been grouped into <a href="commands_manual.html#User+Commands">User Commands</a>
+ and <a href="commands_manual.html#Administration+Commands">Administration Commands</a>.</td>
+ </tr>
+ </table>
+ <section>
+ <title>Generic Options</title>
+ <p>
+ The following options are supported by <a href="commands_manual.html#dfsadmin">dfsadmin</a>,
+ <a href="commands_manual.html#fs">fs</a>, <a href="commands_manual.html#fsck">fsck</a> and
+ <a href="commands_manual.html#job">job</a>.
+ Applications should implement
+ <a href="ext:api/org/apache/hadoop/util/tool">Tool</a> to support
+ <a href="ext:api/org/apache/hadoop/util/genericoptionsparser">
+ GenericOptions</a>.
+ </p>
+ <table>
+ <tr><th> GENERIC_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-conf <configuration file></code></td>
+ <td>Specify an application configuration file.</td>
+ </tr>
+ <tr>
+ <td><code>-D <property=value></code></td>
+ <td>Use value for given property.</td>
+ </tr>
+ <tr>
+ <td><code>-fs <local|namenode:port></code></td>
+ <td>Specify a namenode.</td>
+ </tr>
+ <tr>
+ <td><code>-jt <local|jobtracker:port></code></td>
+ <td>Specify a job tracker. Applies only to <a href="commands_manual.html#job">job</a>.</td>
+ </tr>
+ <tr>
+ <td><code>-files <comma separated list of files></code></td>
+ <td>Specify comma separated files to be copied to the map reduce cluster.
+ Applies only to <a href="commands_manual.html#job">job</a>.</td>
+ </tr>
+ <tr>
+ <td><code>-libjars <comma seperated list of jars></code></td>
+ <td>Specify comma separated jar files to include in the classpath.
+ Applies only to <a href="commands_manual.html#job">job</a>.</td>
+ </tr>
+ <tr>
+ <td><code>-archives <comma separated list of archives></code></td>
+ <td>Specify comma separated archives to be unarchived on the compute machines.
+ Applies only to <a href="commands_manual.html#job">job</a>.</td>
+ </tr>
+ </table>
+ </section>
+ </section>
+
+ <section>
+ <title> User Commands </title>
+ <p>Commands useful for users of a hadoop cluster.</p>
+ <section>
+ <title> archive </title>
+ <p>
+ Creates a hadoop archive. More information can be found at <a href="hadoop_archives.html">Hadoop Archives</a>.
+ </p>
+ <p>
+ <code>Usage: hadoop archive -archiveName NAME <src>* <dest></code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+ <tr>
+ <td><code>-archiveName NAME</code></td>
+ <td>Name of the archive to be created.</td>
+ </tr>
+ <tr>
+ <td><code>src</code></td>
+ <td>Filesystem pathnames which work as usual with regular expressions.</td>
+ </tr>
+ <tr>
+ <td><code>dest</code></td>
+ <td>Destination directory which would contain the archive.</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> distcp </title>
+ <p>
+ Copy file or directories recursively. More information can be found at <a href="distcp.html">Hadoop DistCp Guide</a>.
+ </p>
+ <p>
+ <code>Usage: hadoop distcp <srcurl> <desturl></code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>srcurl</code></td>
+ <td>Source Url</td>
+ </tr>
+ <tr>
+ <td><code>desturl</code></td>
+ <td>Destination Url</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> fs </title>
+ <p>
+ <code>Usage: hadoop fs [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>]
+ [COMMAND_OPTIONS]</code>
+ </p>
+ <p>
+ Runs a generic filesystem user client.
+ </p>
+ <p>
+ The various COMMAND_OPTIONS can be found at <a href="hdfs_shell.html">Hadoop FS Shell Guide</a>.
+ </p>
+ </section>
+
+ <section>
+ <title> fsck </title>
+ <p>
+ Runs a HDFS filesystem checking utility. See <a href="hdfs_user_guide.html#Fsck">Fsck</a> for more info.
+ </p>
+ <p><code>Usage: hadoop fsck [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>]
+ <path> [-move | -delete | -openforwrite] [-files [-blocks
+ [-locations | -racks]]]</code></p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+ <tr>
+ <td><code><path></code></td>
+ <td>Start checking from this path.</td>
+ </tr>
+ <tr>
+ <td><code>-move</code></td>
+ <td>Move corrupted files to /lost+found</td>
+ </tr>
+ <tr>
+ <td><code>-delete</code></td>
+ <td>Delete corrupted files.</td>
+ </tr>
+ <tr>
+ <td><code>-openforwrite</code></td>
+ <td>Print out files opened for write.</td>
+ </tr>
+ <tr>
+ <td><code>-files</code></td>
+ <td>Print out files being checked.</td>
+ </tr>
+ <tr>
+ <td><code>-blocks</code></td>
+ <td>Print out block report.</td>
+ </tr>
+ <tr>
+ <td><code>-locations</code></td>
+ <td>Print out locations for every block.</td>
+ </tr>
+ <tr>
+ <td><code>-racks</code></td>
+ <td>Print out network topology for data-node locations.</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> jar </title>
+ <p>
+ Runs a jar file. Users can bundle their Map Reduce code in a jar file and execute it using this command.
+ </p>
+ <p>
+ <code>Usage: hadoop jar <jar> [mainClass] args...</code>
+ </p>
+ <p>
+ The streaming jobs are run via this command. Examples can be referred from
+ <a href="streaming.html#More+usage+examples">Streaming examples</a>
+ </p>
+ <p>
+ Word count example is also run using jar command. It can be referred from
+ <a href="mapred_tutorial.html#Usage">Wordcount example</a>
+ </p>
+ </section>
+
+ <section>
+ <title> job </title>
+ <p>
+ Command to interact with Map Reduce Jobs.
+ </p>
+ <p>
+ <code>Usage: hadoop job [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>]
+ [-submit <job-file>] | [-status <job-id>] |
+ [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] |
+ [-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <jobOutputDir>] |
+ [-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] |
+ [-set-priority <job-id> <priority>]</code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-submit <job-file></code></td>
+ <td>Submits the job.</td>
+ </tr>
+ <tr>
+ <td><code>-status <job-id></code></td>
+ <td>Prints the map and reduce completion percentage and all job counters.</td>
+ </tr>
+ <tr>
+ <td><code>-counter <job-id> <group-name> <counter-name></code></td>
+ <td>Prints the counter value.</td>
+ </tr>
+ <tr>
+ <td><code>-kill <job-id></code></td>
+ <td>Kills the job.</td>
+ </tr>
+ <tr>
+ <td><code>-events <job-id> <from-event-#> <#-of-events></code></td>
+ <td>Prints the events' details received by jobtracker for the given range.</td>
+ </tr>
+ <tr>
+ <td><code>-history [all] <jobOutputDir></code></td>
+ <td>-history <jobOutputDir> prints job details, failed and killed tip details. More details
+ about the job such as successful tasks and task attempts made for each task can be viewed by
+ specifying the [all] option. </td>
+ </tr>
+ <tr>
+ <td><code>-list [all]</code></td>
+ <td>-list all displays all jobs. -list displays only jobs which are yet to complete.</td>
+ </tr>
+ <tr>
+ <td><code>-kill-task <task-id></code></td>
+ <td>Kills the task. Killed tasks are NOT counted against failed attempts.</td>
+ </tr>
+ <tr>
+ <td><code>-fail-task <task-id></code></td>
+ <td>Fails the task. Failed tasks are counted against failed attempts.</td>
+ </tr>
+ <tr>
+ <td><code>-set-priority <job-id> <priority></code></td>
+ <td>Changes the priority of the job.
+ Allowed priority values are VERY_HIGH, HIGH, NORMAL, LOW, VERY_LOW</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> pipes </title>
+ <p>
+ Runs a pipes job.
+ </p>
+ <p>
+ <code>Usage: hadoop pipes [-conf <path>] [-jobconf <key=value>, <key=value>, ...]
+ [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat <class>]
+ [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer <class>]
+ [-program <executable>] [-reduces <num>] </code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-conf <path></code></td>
+ <td>Configuration for job</td>
+ </tr>
+ <tr>
+ <td><code>-jobconf <key=value>, <key=value>, ...</code></td>
+ <td>Add/override configuration for job</td>
+ </tr>
+ <tr>
+ <td><code>-input <path></code></td>
+ <td>Input directory</td>
+ </tr>
+ <tr>
+ <td><code>-output <path></code></td>
+ <td>Output directory</td>
+ </tr>
+ <tr>
+ <td><code>-jar <jar file></code></td>
+ <td>Jar filename</td>
+ </tr>
+ <tr>
+ <td><code>-inputformat <class></code></td>
+ <td>InputFormat class</td>
+ </tr>
+ <tr>
+ <td><code>-map <class></code></td>
+ <td>Java Map class</td>
+ </tr>
+ <tr>
+ <td><code>-partitioner <class></code></td>
+ <td>Java Partitioner</td>
+ </tr>
+ <tr>
+ <td><code>-reduce <class></code></td>
+ <td>Java Reduce class</td>
+ </tr>
+ <tr>
+ <td><code>-writer <class></code></td>
+ <td>Java RecordWriter</td>
+ </tr>
+ <tr>
+ <td><code>-program <executable></code></td>
+ <td>Executable URI</td>
+ </tr>
+ <tr>
+ <td><code>-reduces <num></code></td>
+ <td>Number of reduces</td>
+ </tr>
+ </table>
+ </section>
+ <section>
+ <title> queue </title>
+ <p>
+ command to interact and view Job Queue information
+ </p>
+ <p>
+ <code>Usage : hadoop queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]</code>
+ </p>
+ <table>
+ <tr>
+ <th> COMMAND_OPTION </th><th> Description </th>
+ </tr>
+ <tr>
+ <td><code>-list</code> </td>
+ <td>Gets list of Job Queues configured in the system. Along with scheduling information
+ associated with the job queues.
+ </td>
+ </tr>
+ <tr>
+ <td><code>-info <job-queue-name> [-showJobs]</code></td>
+ <td>
+ Displays the job queue information and associated scheduling information of particular
+ job queue. If -showJobs options is present a list of jobs submitted to the particular job
+ queue is displayed.
+ </td>
+ </tr>
+ <tr>
+ <td><code>-showacls</code></td>
+ <td>Displays the queue name and associated queue operations allowed for the current user.
+ The list consists of only those queues to which the user has access.
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section>
+ <title> version </title>
+ <p>
+ Prints the version.
+ </p>
+ <p>
+ <code>Usage: hadoop version</code>
+ </p>
+ </section>
+ <section>
+ <title> CLASSNAME </title>
+ <p>
+ hadoop script can be used to invoke any class.
+ </p>
+ <p>
+ <code>Usage: hadoop CLASSNAME</code>
+ </p>
+ <p>
+ Runs the class named CLASSNAME.
+ </p>
+ </section>
+ </section>
+ <section>
+ <title> Administration Commands </title>
+ <p>Commands useful for administrators of a hadoop cluster.</p>
+ <section>
+ <title> balancer </title>
+ <p>
+ Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the
+ rebalancing process. See <a href="hdfs_user_guide.html#Rebalancer">Rebalancer</a> for more details.
+ </p>
+ <p>
+ <code>Usage: hadoop balancer [-threshold <threshold>]</code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-threshold <threshold></code></td>
+ <td>Percentage of disk capacity. This overwrites the default threshold.</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> daemonlog </title>
+ <p>
+ Get/Set the log level for each daemon.
+ </p>
+ <p>
+ <code>Usage: hadoop daemonlog -getlevel <host:port> <name></code><br/>
+ <code>Usage: hadoop daemonlog -setlevel <host:port> <name> <level></code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-getlevel <host:port> <name></code></td>
+ <td>Prints the log level of the daemon running at <host:port>.
+ This command internally connects to http://<host:port>/logLevel?log=<name></td>
+ </tr>
+ <tr>
+ <td><code>-setlevel <host:port> <name> <level></code></td>
+ <td>Sets the log level of the daemon running at <host:port>.
+ This command internally connects to http://<host:port>/logLevel?log=<name></td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> datanode</title>
+ <p>
+ Runs a HDFS datanode.
+ </p>
+ <p>
+ <code>Usage: hadoop datanode [-rollback]</code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-rollback</code></td>
+ <td>Rollsback the datanode to the previous version. This should be used after stopping the datanode
+ and distributing the old hadoop version.</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> dfsadmin </title>
+ <p>
+ Runs a HDFS dfsadmin client.
+ </p>
+ <p>
+ <code>Usage: hadoop dfsadmin [</code><a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a><code>]
+ [-report][-safemode enter | leave | get | wait] [-saveNamespace] [-restoreFailedStorage true|false|check] [-refreshNodes] [-finalizeUpgrade]
+ [-upgradeProgress status | details | force] [-metasave filename] [-refreshServiceAcl] [-printTopology] [-setQuota <quota> <dirname>...<dirname>]
+ [-clrQuota <dirname>...<dirname>] [-setSpaceQuota <quota> <dirname>...<dirname>] [-clrSpaceQuota <dirname>...<dirname>] [-help [cmd]]
+ </code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-report</code></td>
+ <td>Reports basic filesystem information and statistics.</td>
+ </tr>
+ <tr>
+ <td><code>-safemode enter | leave | get | wait</code></td>
+ <td>Safe mode maintenance command.
+ Safe mode is a Namenode state in which it <br/>
+ 1. does not accept changes to the name space (read-only) <br/>
+ 2. does not replicate or delete blocks. <br/>
+ Safe mode is entered automatically at Namenode startup, and
+ leaves safe mode automatically when the configured minimum
+ percentage of blocks satisfies the minimum replication
+ condition. Safe mode can also be entered manually, but then
+ it can only be turned off manually as well.</td>
+ </tr>
+ <tr>
+ <td><code>-saveNamesapce</code></td>
+ <td>Save current namespace into storage directories and reset edits log.
+ Requires superuser permissions and safe mode.
+ </td>
+ </tr>
+ <tr>
+ <td><code> -restoreFailedStorage</code></td>
+ <td> Set/Unset/Check flag to attempt restore of failed storage replicas if they become available.
+ Requires superuser permissions.</td>
+ </tr>
+ <tr>
+ <td><code>-refreshServiceAcl</code></td>
+ <td> Reload the service-level authorization policy file
+ Namenode will reload the authorization policy file</td>
+ </tr>
+ <tr>
+ <td><code>-setSpaceQuota <quota> <dirname>...<dirname></code></td>
+ <td>Set the disk space quota <quota> for each directory <dirName>. The directory quota is a long integer that puts a hard limit
+ on the number of names in the directory tree.
+ Quota can also be speciefied with a binary prefix for terabytes,
+ petabytes etc (e.g. 50t is 50TB, 5m is 5MB, 3p is 3PB).<br/>
+ For each directory, attempt to set the quota. An error will be reported if<br/>
+ 1. N is not a positive integer, or<br/>
+ 2. user is not an administrator, or<br/>
+ 3. the directory does not exist or is a file, or<br/>
+ 4. the directory would immediately exceed the new space quota.</td>
+ </tr>
+ <tr>
+ <td><code>-clrSpaceQuota <dirname>...<dirname></code></td>
+ <td> Clear the disk space quota for each directory <dirName>.
+ For each directory, attempt to set the quota. An error will be reported if<br/>
+ 1. the directory does not exist or is a file, or<br/>
+ 2. user is not an administrator.<br/>
+ It does not fault if the directory has no quota.</td>
+ </tr>
+ <tr>
+ <td><code>-setQuota <quota> <dirname>...<dirname></code></td>
+ <td>Set the quota <quota> for each directory <dirname>.
+ The directory quota is a long integer that puts a hard limit on the number of names in the directory tree.<br/>
+ For each directory, attempt to set the quota. An error will be reported if<br/>
+ 1. N is not a positive integer, or<br/>
+ 2. user is not an administrator, or<br/>
+ 3. the directory does not exist or is a file, or<br/>
+ 4. the directory would immediately exceed the new quota.</td>
+ </tr>
+ <tr>
+ <td><code>-clrQuota <dirname>...<dirname></code></td>
+ <td>Clear the quota for each directory <dirname>.<br/>
+ For each directory, attempt to set the quota. An error will be reported if<br/>
+ 1. the directory does not exist or is a file, or<br/>
+ 2. user is not an administrator.<br/>
+ It does not fault if the directory has no quota.</td>
+ </tr>
+ <tr>
+ <td><code>-refreshNodes</code></td>
+ <td>Re-read the hosts and exclude files to update the set
+ of Datanodes that are allowed to connect to the Namenode
+ and those that should be decommissioned or recommissioned.</td>
+ </tr>
+ <tr>
+ <td><code>-finalizeUpgrade</code></td>
+ <td>Finalize upgrade of HDFS.
+ Datanodes delete their previous version working directories,
+ followed by Namenode doing the same.
+ This completes the upgrade process.</td>
+ </tr>
+ <tr>
+ <td><code>-printTopology</code></td>
+ <td>Print a tree of the rack/datanode topology of the
+ cluster as seen by the NameNode.</td>
+ </tr>
+ <tr>
+ <td><code>-upgradeProgress status | details | force</code></td>
+ <td>Request current distributed upgrade status,
+ a detailed status or force the upgrade to proceed.</td>
+ </tr>
+ <tr>
+ <td><code>-metasave filename</code></td>
+ <td>Save Namenode's primary data structures
+ to <filename> in the directory specified by hadoop.log.dir property.
+ <filename> will contain one line for each of the following <br/>
+ 1. Datanodes heart beating with Namenode<br/>
+ 2. Blocks waiting to be replicated<br/>
+ 3. Blocks currrently being replicated<br/>
+ 4. Blocks waiting to be deleted</td>
+ </tr>
+ <tr>
+ <td><code>-restoreFailedStorage true | false | check</code></td>
+ <td>This option will turn on/off automatic attempt to restore failed storage replicas.
+ If a failed storage becomes available again the system will attempt to restore
+ edits and/or fsimage during checkpoint. 'check' option will return current setting.</td>
+ </tr>
+ <tr>
+ <td><code>-help [cmd]</code></td>
+ <td> Displays help for the given command or all commands if none
+ is specified.</td>
+ </tr>
+ </table>
+ </section>
+ <section>
+ <title>mradmin</title>
+ <p>Runs MR admin client</p>
+ <p><code>Usage: hadoop mradmin [</code>
+ <a href="commands_manual.html#Generic+Options">GENERIC_OPTIONS</a>
+ <code>] [-refreshQueueAcls] </code></p>
+ <table>
+ <tr>
+ <th> COMMAND_OPTION </th><th> Description </th>
+ </tr>
+ <tr>
+ <td><code>-refreshQueueAcls</code></td>
+ <td> Refresh the queue acls used by hadoop, to check access during submissions
+ and administration of the job by the user. The properties present in
+ <code>mapred-queue-acls.xml</code> is reloaded by the queue manager.</td>
+ </tr>
+ </table>
+ </section>
+ <section>
+ <title> jobtracker </title>
+ <p>
+ Runs the MapReduce job Tracker node.
+ </p>
+ <p>
+ <code>Usage: hadoop jobtracker</code>
+ </p>
+ </section>
+
+ <section>
+ <title> namenode </title>
+ <p>
+ Runs the namenode. More info about the upgrade, rollback and finalize is at
+ <a href="hdfs_user_guide.html#Upgrade+and+Rollback">Upgrade Rollback</a>
+ </p>
+ <p>
+ <code>Usage: hadoop namenode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint]</code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-regular</code></td>
+ <td>Start namenode in standard, active role rather than as backup or checkpoint node. This is the default role.</td>
+ </tr>
+ <tr>
+ <td><code>-checkpoint</code></td>
+ <td>Start namenode in checkpoint role, creating periodic checkpoints of the active namenode metadata.</td>
+ </tr>
+ <tr>
+ <td><code>-backup</code></td>
+ <td>Start namenode in backup role, maintaining an up-to-date in-memory copy of the namespace and creating periodic checkpoints.</td>
+ </tr>
+ <tr>
+ <td><code>-format</code></td>
+ <td>Formats the namenode. It starts the namenode, formats it and then shut it down.</td>
+ </tr>
+ <tr>
+ <td><code>-upgrade</code></td>
+ <td>Namenode should be started with upgrade option after the distribution of new hadoop version.</td>
+ </tr>
+ <tr>
+ <td><code>-rollback</code></td>
+ <td>Rollsback the namenode to the previous version. This should be used after stopping the cluster
+ and distributing the old hadoop version.</td>
+ </tr>
+ <tr>
+ <td><code>-finalize</code></td>
+ <td>Finalize will remove the previous state of the files system. Recent upgrade will become permanent.
+ Rollback option will not be available anymore. After finalization it shuts the namenode down.</td>
+ </tr>
+ <tr>
+ <td><code>-importCheckpoint</code></td>
+ <td>Loads image from a checkpoint directory and saves it into the current one. Checkpoint directory
+ is read from property fs.checkpoint.dir</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> secondarynamenode </title>
+ <p>
+ Use of the Secondary NameNode has been deprecated. Instead, consider using a
+ <a href="hdfs_user_guide.html#Checkpoint+node">Checkpoint node</a> or
+ <a href="hdfs_user_guide.html#Backup+node">Backup node</a>. Runs the HDFS secondary
+ namenode. See <a href="hdfs_user_guide.html#Secondary+NameNode">Secondary NameNode</a>
+ for more info.
+ </p>
+ <p>
+ <code>Usage: hadoop secondarynamenode [-checkpoint [force]] | [-geteditsize]</code>
+ </p>
+ <table>
+ <tr><th> COMMAND_OPTION </th><th> Description </th></tr>
+
+ <tr>
+ <td><code>-checkpoint [force]</code></td>
+ <td>Checkpoints the Secondary namenode if EditLog size >= fs.checkpoint.size.
+ If -force is used, checkpoint irrespective of EditLog size.</td>
+ </tr>
+ <tr>
+ <td><code>-geteditsize</code></td>
+ <td>Prints the EditLog size.</td>
+ </tr>
+ </table>
+ </section>
+
+ <section>
+ <title> tasktracker </title>
+ <p>
+ Runs a MapReduce task Tracker node.
+ </p>
+ <p>
+ <code>Usage: hadoop tasktracker</code>
+ </p>
+ </section>
+
+ </section>
+
+
+
+
+ </body>
+</document>
Propchange: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/commands_manual.xml
------------------------------------------------------------------------------
svn:mime-type = text/plain
Added: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/distcp.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/distcp.xml?rev=817449&view=auto
==============================================================================
--- hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/distcp.xml (added)
+++ hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/distcp.xml Mon Sep 21 22:33:09 2009
@@ -0,0 +1,352 @@
+<?xml version="1.0"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+
+<document>
+
+ <header>
+ <title>DistCp Guide</title>
+ </header>
+
+ <body>
+
+ <section>
+ <title>Overview</title>
+
+ <p>DistCp (distributed copy) is a tool used for large inter/intra-cluster
+ copying. It uses Map/Reduce to effect its distribution, error
+ handling and recovery, and reporting. It expands a list of files and
+ directories into input to map tasks, each of which will copy a partition
+ of the files specified in the source list. Its Map/Reduce pedigree has
+ endowed it with some quirks in both its semantics and execution. The
+ purpose of this document is to offer guidance for common tasks and to
+ elucidate its model.</p>
+
+ </section>
+
+ <section>
+ <title>Usage</title>
+
+ <section>
+ <title>Basic</title>
+ <p>The most common invocation of DistCp is an inter-cluster copy:</p>
+ <p><code>bash$ hadoop distcp hdfs://nn1:8020/foo/bar \</code><br/>
+ <code>
+
+ hdfs://nn2:8020/bar/foo</code></p>
+
+ <p>This will expand the namespace under <code>/foo/bar</code> on nn1
+ into a temporary file, partition its contents among a set of map
+ tasks, and start a copy on each TaskTracker from nn1 to nn2. Note
+ that DistCp expects absolute paths.</p>
+
+ <p>One can also specify multiple source directories on the command
+ line:</p>
+ <p><code>bash$ hadoop distcp hdfs://nn1:8020/foo/a \</code><br/>
+ <code>
+
+ hdfs://nn1:8020/foo/b \</code><br/>
+ <code>
+
+ hdfs://nn2:8020/bar/foo</code></p>
+
+ <p>Or, equivalently, from a file using the <code>-f</code> option:<br/>
+ <code>bash$ hadoop distcp -f hdfs://nn1:8020/srclist \</code><br/>
+ <code>
+
+ hdfs://nn2:8020/bar/foo</code><br/></p>
+
+ <p>Where <code>srclist</code> contains<br/>
+ <code> hdfs://nn1:8020/foo/a</code><br/>
+ <code> hdfs://nn1:8020/foo/b</code></p>
+
+ <p>When copying from multiple sources, DistCp will abort the copy with
+ an error message if two sources collide, but collisions at the
+ destination are resolved per the <a href="#options">options</a>
+ specified. By default, files already existing at the destination are
+ skipped (i.e. not replaced by the source file). A count of skipped
+ files is reported at the end of each job, but it may be inaccurate if a
+ copier failed for some subset of its files, but succeeded on a later
+ attempt (see <a href="#etc">Appendix</a>).</p>
+
+ <p>It is important that each TaskTracker can reach and communicate with
+ both the source and destination file systems. For HDFS, both the source
+ and destination must be running the same version of the protocol or use
+ a backwards-compatible protocol (see <a href="#cpver">Copying Between
+ Versions</a>).</p>
+
+ <p>After a copy, it is recommended that one generates and cross-checks
+ a listing of the source and destination to verify that the copy was
+ truly successful. Since DistCp employs both Map/Reduce and the
+ FileSystem API, issues in or between any of the three could adversely
+ and silently affect the copy. Some have had success running with
+ <code>-update</code> enabled to perform a second pass, but users should
+ be acquainted with its semantics before attempting this.</p>
+
+ <p>It's also worth noting that if another client is still writing to a
+ source file, the copy will likely fail. Attempting to overwrite a file
+ being written at the destination should also fail on HDFS. If a source
+ file is (re)moved before it is copied, the copy will fail with a
+ FileNotFoundException.</p>
+
+ </section> <!-- Basic -->
+
+ <section id="options">
+ <title>Options</title>
+
+ <section>
+ <title>Option Index</title>
+ <table>
+ <tr><th> Flag </th><th> Description </th><th> Notes </th></tr>
+
+ <tr><td><code>-p[rbugp]</code></td>
+ <td>Preserve<br/>
+ r: replication number<br/>
+ b: block size<br/>
+ u: user<br/>
+ g: group<br/>
+ p: permission<br/></td>
+ <td>Modification times are not preserved. Also, when
+ <code>-update</code> is specified, status updates will
+ <strong>not</strong> be synchronized unless the file sizes
+ also differ (i.e. unless the file is re-created).
+ </td></tr>
+ <tr><td><code>-i</code></td>
+ <td>Ignore failures</td>
+ <td>As explained in the <a href="#etc">Appendix</a>, this option
+ will keep more accurate statistics about the copy than the
+ default case. It also preserves logs from failed copies, which
+ can be valuable for debugging. Finally, a failing map will not
+ cause the job to fail before all splits are attempted.
+ </td></tr>
+ <tr><td><code>-log <logdir></code></td>
+ <td>Write logs to <logdir></td>
+ <td>DistCp keeps logs of each file it attempts to copy as map
+ output. If a map fails, the log output will not be retained if
+ it is re-executed.
+ </td></tr>
+ <tr><td><code>-m <num_maps></code></td>
+ <td>Maximum number of simultaneous copies</td>
+ <td>Specify the number of maps to copy data. Note that more maps
+ may not necessarily improve throughput.
+ </td></tr>
+ <tr><td><code>-overwrite</code></td>
+ <td>Overwrite destination</td>
+ <td>If a map fails and <code>-i</code> is not specified, all the
+ files in the split, not only those that failed, will be recopied.
+ As discussed in the <a href="#uo">following</a>, it also changes
+ the semantics for generating destination paths, so users should
+ use this carefully.
+ </td></tr>
+ <tr><td><code>-update</code></td>
+ <td>Overwrite if src size different from dst size</td>
+ <td>As noted in the preceding, this is not a "sync"
+ operation. The only criterion examined is the source and
+ destination file sizes; if they differ, the source file
+ replaces the destination file. As discussed in the
+ <a href="#uo">following</a>, it also changes the semantics for
+ generating destination paths, so users should use this carefully.
+ </td></tr>
+ <tr><td><code>-f <urilist_uri></code></td>
+ <td>Use list at <urilist_uri> as src list</td>
+ <td>This is equivalent to listing each source on the command
+ line. The <code>urilist_uri</code> list should be a fully
+ qualified URI.
+ </td></tr>
+ <tr><td><code>-filelimit <n></code></td>
+ <td>Limit the total number of files to be <= n</td>
+ <td>See also <a href="#Symbolic-Representations">Symbolic
+ Representations</a>.
+ </td></tr>
+ <tr><td><code>-sizelimit <n></code></td>
+ <td>Limit the total size to be <= n bytes</td>
+ <td>See also <a href="#Symbolic-Representations">Symbolic
+ Representations</a>.
+ </td></tr>
+ <tr><td><code>-delete</code></td>
+ <td>Delete the files existing in the dst but not in src</td>
+ <td>The deletion is done by FS Shell. So the trash will be used,
+ if it is enable.
+ </td></tr>
+
+ </table>
+
+ </section>
+
+ <section id="Symbolic-Representations">
+ <title>Symbolic Representations</title>
+ <p>
+ The parameter <n> in <code>-filelimit</code>
+ and <code>-sizelimit</code> can be specified with symbolic
+ representation. For examples,
+ </p>
+ <ul>
+ <li>1230k = 1230 * 1024 = 1259520</li>
+ <li>891g = 891 * 1024^3 = 956703965184</li>
+ </ul>
+ </section>
+
+ <section id="uo">
+ <title>Update and Overwrite</title>
+
+ <p>It's worth giving some examples of <code>-update</code> and
+ <code>-overwrite</code>. Consider a copy from <code>/foo/a</code> and
+ <code>/foo/b</code> to <code>/bar/foo</code>, where the sources contain
+ the following:</p>
+
+ <p><code> hdfs://nn1:8020/foo/a</code><br/>
+ <code> hdfs://nn1:8020/foo/a/aa</code><br/>
+ <code> hdfs://nn1:8020/foo/a/ab</code><br/>
+ <code> hdfs://nn1:8020/foo/b</code><br/>
+ <code> hdfs://nn1:8020/foo/b/ba</code><br/>
+ <code> hdfs://nn1:8020/foo/b/ab</code></p>
+
+ <p>If either <code>-update</code> or <code>-overwrite</code> is set,
+ then both sources will map an entry to <code>/bar/foo/ab</code> at the
+ destination. For both options, the contents of each source directory
+ are compared with the <strong>contents</strong> of the destination
+ directory. Rather than permit this conflict, DistCp will abort.</p>
+
+ <p>In the default case, both <code>/bar/foo/a</code> and
+ <code>/bar/foo/b</code> will be created and neither will collide.</p>
+
+ <p>Now consider a legal copy using <code>-update</code>:<br/>
+ <code>distcp -update hdfs://nn1:8020/foo/a \</code><br/>
+ <code>
+
+ hdfs://nn1:8020/foo/b \</code><br/>
+ <code>
+
+ hdfs://nn2:8020/bar</code></p>
+
+ <p>With sources/sizes:</p>
+
+ <p><code> hdfs://nn1:8020/foo/a</code><br/>
+ <code> hdfs://nn1:8020/foo/a/aa 32</code><br/>
+ <code> hdfs://nn1:8020/foo/a/ab 32</code><br/>
+ <code> hdfs://nn1:8020/foo/b</code><br/>
+ <code> hdfs://nn1:8020/foo/b/ba 64</code><br/>
+ <code> hdfs://nn1:8020/foo/b/bb 32</code></p>
+
+ <p>And destination/sizes:</p>
+
+ <p><code> hdfs://nn2:8020/bar</code><br/>
+ <code> hdfs://nn2:8020/bar/aa 32</code><br/>
+ <code> hdfs://nn2:8020/bar/ba 32</code><br/>
+ <code> hdfs://nn2:8020/bar/bb 64</code></p>
+
+ <p>Will effect:</p>
+
+ <p><code> hdfs://nn2:8020/bar</code><br/>
+ <code> hdfs://nn2:8020/bar/aa 32</code><br/>
+ <code> hdfs://nn2:8020/bar/ab 32</code><br/>
+ <code> hdfs://nn2:8020/bar/ba 64</code><br/>
+ <code> hdfs://nn2:8020/bar/bb 32</code></p>
+
+ <p>Only <code>aa</code> is not overwritten on nn2. If
+ <code>-overwrite</code> were specified, all elements would be
+ overwritten.</p>
+
+ </section> <!-- Update and Overwrite -->
+
+ </section> <!-- Options -->
+
+ </section> <!-- Usage -->
+
+ <section id="etc">
+ <title>Appendix</title>
+
+ <section>
+ <title>Map sizing</title>
+
+ <p>DistCp makes a faint attempt to size each map comparably so that
+ each copies roughly the same number of bytes. Note that files are the
+ finest level of granularity, so increasing the number of simultaneous
+ copiers (i.e. maps) may not always increase the number of
+ simultaneous copies nor the overall throughput.</p>
+
+ <p>If <code>-m</code> is not specified, DistCp will attempt to
+ schedule work for <code>min (total_bytes / bytes.per.map, 20 *
+ num_task_trackers)</code> where <code>bytes.per.map</code> defaults
+ to 256MB.</p>
+
+ <p>Tuning the number of maps to the size of the source and
+ destination clusters, the size of the copy, and the available
+ bandwidth is recommended for long-running and regularly run jobs.</p>
+
+ </section>
+
+ <section id="cpver">
+ <title>Copying between versions of HDFS</title>
+
+ <p>For copying between two different versions of Hadoop, one will
+ usually use HftpFileSystem. This is a read-only FileSystem, so DistCp
+ must be run on the destination cluster (more specifically, on
+ TaskTrackers that can write to the destination cluster). Each source is
+ specified as <code>hftp://<dfs.http.address>/<path></code>
+ (the default <code>dfs.http.address</code> is
+ <namenode>:50070).</p>
+
+ </section>
+
+ <section>
+ <title>Map/Reduce and other side-effects</title>
+
+ <p>As has been mentioned in the preceding, should a map fail to copy
+ one of its inputs, there will be several side-effects.</p>
+
+ <ul>
+
+ <li>Unless <code>-i</code> is specified, the logs generated by that
+ task attempt will be replaced by the previous attempt.</li>
+
+ <li>Unless <code>-overwrite</code> is specified, files successfully
+ copied by a previous map on a re-execution will be marked as
+ "skipped".</li>
+
+ <li>If a map fails <code>mapred.map.max.attempts</code> times, the
+ remaining map tasks will be killed (unless <code>-i</code> is
+ set).</li>
+
+ <li>If <code>mapred.speculative.execution</code> is set set
+ <code>final</code> and <code>true</code>, the result of the copy is
+ undefined.</li>
+
+ </ul>
+
+ </section>
+
+ <!--
+ <section>
+ <title>Firewalls and SSL</title>
+
+ <p>To copy over HTTP, use the HftpFileSystem as described in the
+ preceding <a href="#cpver">section</a>, and ensure that the required
+ port(s) are open.</p>
+
+ <p>TODO</p>
+
+ </section>
+ -->
+
+ </section> <!-- Appendix -->
+
+ </body>
+
+</document>
Propchange: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/distcp.xml
------------------------------------------------------------------------------
svn:mime-type = text/plain
Added: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/fair_scheduler.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/fair_scheduler.xml?rev=817449&view=auto
==============================================================================
--- hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/fair_scheduler.xml (added)
+++ hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/fair_scheduler.xml Mon Sep 21 22:33:09 2009
@@ -0,0 +1,371 @@
+<?xml version="1.0"?>
+ <!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version
+ 2.0 (the "License"); you may not use this file except in compliance
+ with the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0 Unless required by
+ applicable law or agreed to in writing, software distributed under
+ the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
+ OR CONDITIONS OF ANY KIND, either express or implied. See the
+ License for the specific language governing permissions and
+ limitations under the License.
+ -->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+<document>
+ <header>
+ <title>Fair Scheduler Guide</title>
+ </header>
+ <body>
+
+ <section>
+ <title>Purpose</title>
+
+ <p>This document describes the Fair Scheduler, a pluggable
+ Map/Reduce scheduler for Hadoop which provides a way to share
+ large clusters.</p>
+ </section>
+
+ <section>
+ <title>Introduction</title>
+ <p>Fair scheduling is a method of assigning resources to jobs
+ such that all jobs get, on average, an equal share of resources
+ over time. When there is a single job running, that job uses the
+ entire cluster. When other jobs are submitted, tasks slots that
+ free up are assigned to the new jobs, so that each job gets
+ roughly the same amount of CPU time. Unlike the default Hadoop
+ scheduler, which forms a queue of jobs, this lets short jobs finish
+ in reasonable time while not starving long jobs. It is also a
+ reasonable way to share a cluster between a number of users. Finally,
+ fair sharing can also work with job priorities - the priorities are
+ used as weights to determine the fraction of total compute time that
+ each job should get.
+ </p>
+ <p>
+ The scheduler actually organizes jobs further into "pools", and
+ shares resources fairly between these pools. By default, there is a
+ separate pool for each user, so that each user gets the same share
+ of the cluster no matter how many jobs they submit. However, it is
+ also possible to set a job's pool based on the user's Unix group or
+ any other jobconf property, such as the queue name property used by
+ <a href="capacity_scheduler.html">Capacity Scheduler</a>.
+ Within each pool, fair sharing is used to share capacity between
+ the running jobs. Pools can also be given weights to share the
+ cluster non-proportionally in the config file.
+ </p>
+ <p>
+ In addition to providing fair sharing, the Fair Scheduler allows
+ assigning guaranteed minimum shares to pools, which is useful for
+ ensuring that certain users, groups or production applications
+ always get sufficient resources. When a pool contains jobs, it gets
+ at least its minimum share, but when the pool does not need its full
+ guaranteed share, the excess is split between other running jobs.
+ This lets the scheduler guarantee capacity for pools while utilizing
+ resources efficiently when these pools don't contain jobs.
+ </p>
+ <p>
+ The Fair Scheduler lets all jobs run by default, but it is also
+ possible to limit the number of running jobs per user and per pool
+ through the config file. This can be useful when a user must submit
+ hundreds of jobs at once, or in general to improve performance if
+ running too many jobs at once would cause too much intermediate data
+ to be created or too much context-switching. Limiting the jobs does
+ not cause any subsequently submitted jobs to fail, only to wait in the
+ sheduler's queue until some of the user's earlier jobs finish. Jobs to
+ run from each user/pool are chosen in order of priority and then
+ submit time, as in the default FIFO scheduler in Hadoop.
+ </p>
+ <p>
+ Finally, the fair scheduler provides several extension points where
+ the basic functionality can be extended. For example, the weight
+ calculation can be modified to give a priority boost to new jobs,
+ implementing a "shortest job first" policy which reduces response
+ times for interactive jobs even further.
+ </p>
+ </section>
+
+ <section>
+ <title>Installation</title>
+ <p>
+ To run the fair scheduler in your Hadoop installation, you need to put
+ it on the CLASSPATH. The easiest way is to copy the
+ <em>hadoop-*-fairscheduler.jar</em> from
+ <em>HADOOP_HOME/contrib/fairscheduler</em> to <em>HADOOP_HOME/lib</em>.
+ Alternatively you can modify <em>HADOOP_CLASSPATH</em> to include this jar, in
+ <em>HADOOP_CONF_DIR/hadoop-env.sh</em>
+ </p>
+ <p>
+ In order to compile fair scheduler, from sources execute <em> ant
+ package</em> in source folder and copy the
+ <em>build/contrib/fair-scheduler/hadoop-*-fairscheduler.jar</em>
+ to <em>HADOOP_HOME/lib</em>
+ </p>
+ <p>
+ You will also need to set the following property in the Hadoop config
+ file <em>HADOOP_CONF_DIR/mapred-site.xml</em> to have Hadoop use
+ the fair scheduler: <br/>
+ <code><property></code><br/>
+ <code> <name>mapred.jobtracker.taskScheduler</name></code><br/>
+ <code> <value>org.apache.hadoop.mapred.FairScheduler</value></code><br/>
+ <code></property></code>
+ </p>
+ <p>
+ Once you restart the cluster, you can check that the fair scheduler
+ is running by going to http://<jobtracker URL>/scheduler
+ on the JobTracker's web UI. A "job scheduler administration" page should
+ be visible there. This page is described in the Administration section.
+ </p>
+ </section>
+
+ <section>
+ <title>Configuring the Fair scheduler</title>
+ <p>
+ The following properties can be set in mapred-site.xml to configure
+ the fair scheduler:
+ </p>
+ <table>
+ <tr>
+ <th>Name</th><th>Description</th>
+ </tr>
+ <tr>
+ <td>
+ mapred.fairscheduler.allocation.file
+ </td>
+ <td>
+ Specifies an absolute path to an XML file which contains the
+ allocations for each pool, as well as the per-pool and per-user
+ limits on number of running jobs. If this property is not
+ provided, allocations are not used.<br/>
+ This file must be in XML format, and can contain three types of
+ elements:
+ <ul>
+ <li>pool elements, which may contain elements for minMaps,
+ minReduces, maxRunningJobs (limit the number of jobs from the
+ pool to run at once),and weight (to share the cluster
+ non-proportionally with other pools).
+ </li>
+ <li>user elements, which may contain a maxRunningJobs to limit
+ jobs. Note that by default, there is a separate pool for each
+ user, so these may not be necessary; they are useful, however,
+ if you create a pool per user group or manually assign jobs
+ to pools.</li>
+ <li>A userMaxJobsDefault element, which sets the default running
+ job limit for any users whose limit is not specified.</li>
+ </ul>
+ <br/>
+ Example Allocation file is listed below :<br/>
+ <code><?xml version="1.0"?> </code> <br/>
+ <code><allocations></code> <br/>
+ <code> <pool name="sample_pool"></code><br/>
+ <code> <minMaps>5</minMaps></code><br/>
+ <code> <minReduces>5</minReduces></code><br/>
+ <code> <weight>2.0</weight></code><br/>
+ <code> </pool></code><br/>
+ <code> <user name="sample_user"></code><br/>
+ <code> <maxRunningJobs>6</maxRunningJobs></code><br/>
+ <code> </user></code><br/>
+ <code> <userMaxJobsDefault>3</userMaxJobsDefault></code><br/>
+ <code></allocations></code>
+ <br/>
+ This example creates a pool sample_pool with a guarantee of 5 map
+ slots and 5 reduce slots. The pool also has a weight of 2.0, meaning
+ it has a 2x higher share of the cluster than other pools (the default
+ weight is 1). Finally, the example limits the number of running jobs
+ per user to 3, except for sample_user, who can run 6 jobs concurrently.
+ Any pool not defined in the allocations file will have no guaranteed
+ capacity and a weight of 1.0. Also, any pool or user with no max
+ running jobs set in the file will be allowed to run an unlimited
+ number of jobs.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ mapred.fairscheduler.assignmultiple
+ </td>
+ <td>
+ Allows the scheduler to assign both a map task and a reduce task
+ on each heartbeat, which improves cluster throughput when there
+ are many small tasks to run. Boolean value, default: true.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ mapred.fairscheduler.sizebasedweight
+ </td>
+ <td>
+ Take into account job sizes in calculating their weights for fair
+ sharing.By default, weights are only based on job priorities.
+ Setting this flag to true will make them based on the size of the
+ job (number of tasks needed) as well,though not linearly
+ (the weight will be proportional to the log of the number of tasks
+ needed). This lets larger jobs get larger fair shares while still
+ providing enough of a share to small jobs to let them finish fast.
+ Boolean value, default: false.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ mapred.fairscheduler.poolnameproperty
+ </td>
+ <td>
+ Specify which jobconf property is used to determine the pool that a
+ job belongs in. String, default: user.name (i.e. one pool for each
+ user). Some other useful values to set this to are: <br/>
+ <ul>
+ <li> group.name (to create a pool per Unix group).</li>
+ <li>mapred.job.queue.name (the same property as the queue name in
+ <a href="capacity_scheduler.html">Capacity Scheduler</a>).</li>
+ </ul>
+ </td>
+ </tr>
+ <tr>
+ <td>
+ mapred.fairscheduler.weightadjuster
+ </td>
+ <td>
+ An extensibility point that lets you specify a class to adjust the
+ weights of running jobs. This class should implement the
+ <em>WeightAdjuster</em> interface. There is currently one example
+ implementation - <em>NewJobWeightBooster</em>, which increases the
+ weight of jobs for the first 5 minutes of their lifetime to let
+ short jobs finish faster. To use it, set the weightadjuster
+ property to the full class name,
+ <code>org.apache.hadoop.mapred.NewJobWeightBooster</code>
+ NewJobWeightBooster itself provides two parameters for setting the
+ duration and boost factor. <br/>
+ <ol>
+ <li> <em>mapred.newjobweightbooster.factor</em>
+ Factor by which new jobs weight should be boosted. Default is 3</li>
+ <li><em>mapred.newjobweightbooster.duration</em>
+ Duration in milliseconds, default 300000 for 5 minutes</li>
+ </ol>
+ </td>
+ </tr>
+ <tr>
+ <td>
+ mapred.fairscheduler.loadmanager
+ </td>
+ <td>
+ An extensibility point that lets you specify a class that determines
+ how many maps and reduces can run on a given TaskTracker. This class
+ should implement the LoadManager interface. By default the task caps
+ in the Hadoop config file are used, but this option could be used to
+ make the load based on available memory and CPU utilization for example.
+ </td>
+ </tr>
+ <tr>
+ <td>
+ mapred.fairscheduler.taskselector:
+ </td>
+ <td>
+ An extensibility point that lets you specify a class that determines
+ which task from within a job to launch on a given tracker. This can be
+ used to change either the locality policy (e.g. keep some jobs within
+ a particular rack) or the speculative execution algorithm (select
+ when to launch speculative tasks). The default implementation uses
+ Hadoop's default algorithms from JobInProgress.
+ </td>
+ </tr>
+ </table>
+ </section>
+ <section>
+ <title> Administration</title>
+ <p>
+ The fair scheduler provides support for administration at runtime
+ through two mechanisms:
+ </p>
+ <ol>
+ <li>
+ It is possible to modify pools' allocations
+ and user and pool running job limits at runtime by editing the allocation
+ config file. The scheduler will reload this file 10-15 seconds after it
+ sees that it was modified.
+ </li>
+ <li>
+ Current jobs, pools, and fair shares can be examined through the
+ JobTracker's web interface, at http://<jobtracker URL>/scheduler.
+ On this interface, it is also possible to modify jobs' priorities or
+ move jobs from one pool to another and see the effects on the fair
+ shares (this requires JavaScript).
+ </li>
+ </ol>
+ <p>
+ The following fields can be seen for each job on the web interface:
+ </p>
+ <ul>
+ <li><em>Submitted</em> - Date and time job was submitted.</li>
+ <li><em>JobID, User, Name</em> - Job identifiers as on the standard
+ web UI.</li>
+ <li><em>Pool</em> - Current pool of job. Select another value to move job to
+ another pool.</li>
+ <li><em>Priority</em> - Current priority. Select another value to change the
+ job's priority</li>
+ <li><em>Maps/Reduces Finished</em>: Number of tasks finished / total tasks.</li>
+ <li><em>Maps/Reduces Running</em>: Tasks currently running.</li>
+ <li><em>Map/Reduce Fair Share</em>: The average number of task slots that this
+ job should have at any given time according to fair sharing. The actual
+ number of tasks will go up and down depending on how much compute time
+ the job has had, but on average it will get its fair share amount.</li>
+ </ul>
+ <p>
+ In addition, it is possible to turn on an "advanced" view for the web UI,
+ by going to http://<jobtracker URL>/scheduler?advanced. This view shows
+ four more columns used for calculations internally:
+ </p>
+ <ul>
+ <li><em>Maps/Reduce Weight</em>: Weight of the job in the fair sharing
+ calculations. This depends on priority and potentially also on
+ job size and job age if the <em>sizebasedweight</em> and
+ <em>NewJobWeightBooster</em> are enabled.</li>
+ <li><em>Map/Reduce Deficit</em>: The job's scheduling deficit in machine-
+ seconds - the amount of resources it should have gotten according to
+ its fair share, minus how many it actually got. Positive deficit means
+ the job will be scheduled again in the near future because it needs to
+ catch up to its fair share. The scheduler schedules jobs with higher
+ deficit ahead of others. Please see the Implementation section of
+ this document for details.</li>
+ </ul>
+ </section>
+ <section>
+ <title>Implementation</title>
+ <p>There are two aspects to implementing fair scheduling: Calculating
+ each job's fair share, and choosing which job to run when a task slot
+ becomes available.</p>
+ <p>To select jobs to run, the scheduler then keeps track of a
+ "deficit" for each job - the difference between the amount of
+ compute time it should have gotten on an ideal scheduler, and the amount
+ of compute time it actually got. This is a measure of how
+ "unfair" we've been to the job. Every few hundred
+ milliseconds, the scheduler updates the deficit of each job by looking
+ at how many tasks each job had running during this interval vs. its
+ fair share. Whenever a task slot becomes available, it is assigned to
+ the job with the highest deficit. There is one exception - if there
+ were one or more jobs who were not meeting their pool capacity
+ guarantees, we only choose among these "needy" jobs (based
+ again on their deficit), to ensure that the scheduler meets pool
+ guarantees as soon as possible.</p>
+ <p>
+ The fair shares are calculated by dividing the capacity of the cluster
+ among runnable jobs according to a "weight" for each job. By
+ default the weight is based on priority, with each level of priority
+ having 2x higher weight than the next (for example, VERY_HIGH has 4x the
+ weight of NORMAL). However, weights can also be based on job sizes and ages,
+ as described in the Configuring section. For jobs that are in a pool,
+ fair shares also take into account the minimum guarantee for that pool.
+ This capacity is divided among the jobs in that pool according again to
+ their weights.
+ </p>
+ <p>Finally, when limits on a user's running jobs or a pool's running jobs
+ are in place, we choose which jobs get to run by sorting all jobs in order
+ of priority and then submit time, as in the standard Hadoop scheduler. Any
+ jobs that fall after the user/pool's limit in this ordering are queued up
+ and wait idle until they can be run. During this time, they are ignored
+ from the fair sharing calculations and do not gain or lose deficit (their
+ fair share is set to zero).</p>
+ </section>
+ </body>
+</document>
Propchange: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/fair_scheduler.xml
------------------------------------------------------------------------------
svn:mime-type = text/plain
Modified: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/faultinject_framework.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/faultinject_framework.xml?rev=817449&r1=817448&r2=817449&view=diff
==============================================================================
--- hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/faultinject_framework.xml (original)
+++ hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/faultinject_framework.xml Mon Sep 21 22:33:09 2009
@@ -21,40 +21,41 @@
<document>
<header>
- <title>Fault Injection Framework and Development Guide</title>
+ <title>Fault injection Framework and Development Guide</title>
</header>
<body>
<section>
<title>Introduction</title>
- <p>This guide provides an overview of the Hadoop Fault Injection (FI) framework for those
- who will be developing their own faults (aspects).
+ <p>The following is a brief help for Hadoops' Fault Injection (FI)
+ Framework and Developer's Guide for those who will be developing
+ their own faults (aspects).
</p>
- <p>The idea of fault injection is fairly simple: it is an
+ <p>An idea of Fault Injection (FI) is fairly simple: it is an
infusion of errors and exceptions into an application's logic to
achieve a higher coverage and fault tolerance of the system.
- Different implementations of this idea are available today.
+ Different implementations of this idea are available at this day.
Hadoop's FI framework is built on top of Aspect Oriented Paradigm
(AOP) implemented by AspectJ toolkit.
</p>
</section>
<section>
<title>Assumptions</title>
- <p>The current implementation of the FI framework assumes that the faults it
- will be emulating are of non-deterministic nature. That is, the moment
- of a fault's happening isn't known in advance and is a coin-flip based.
+ <p>The current implementation of the framework assumes that the faults it
+ will be emulating are of non-deterministic nature. i.e. the moment
+ of a fault's happening isn't known in advance and is a coin-flip
+ based.
</p>
</section>
-
<section>
<title>Architecture of the Fault Injection Framework</title>
<figure src="images/FI-framework.gif" alt="Components layout" />
-
<section>
- <title>Configuration Management</title>
- <p>This piece of the FI framework allows you to set expectations for faults to happen.
- The settings can be applied either statically (in advance) or in runtime.
- The desired level of faults in the framework can be configured two ways:
+ <title>Configuration management</title>
+ <p>This piece of the framework allow to
+ set expectations for faults to happen. The settings could be applied
+ either statically (in advance) or in a runtime. There's two ways to
+ configure desired level of faults in the framework:
</p>
<ul>
<li>
@@ -70,31 +71,31 @@
</li>
</ul>
</section>
-
<section>
- <title>Probability Model</title>
- <p>This is fundamentally a coin flipper. The methods of this class are
+ <title>Probability model</title>
+ <p>This fundamentally is a coin flipper. The methods of this class are
getting a random number between 0.0
- and 1.0 and then checking if a new number has happened in the
- range of 0.0 and a configured level for the fault in question. If that
- condition is true then the fault will occur.
+ and 1.0 and then checking if new number has happened to be in the
+ range of
+ 0.0 and a configured level for the fault in question. If that
+ condition
+ is true then the fault will occur.
</p>
- <p>Thus, to guarantee the happening of a fault one needs to set an
+ <p>Thus, to guarantee a happening of a fault one needs to set an
appropriate level to 1.0.
To completely prevent a fault from happening its probability level
- has to be set to 0.0.
+ has to be set to 0.0
</p>
- <p><strong>Note</strong>: The default probability level is set to 0
+ <p><strong>Nota bene</strong>: default probability level is set to 0
(zero) unless the level is changed explicitly through the
configuration file or in the runtime. The name of the default
level's configuration parameter is
<code>fi.*</code>
</p>
</section>
-
<section>
- <title>Fault Injection Mechanism: AOP and AspectJ</title>
- <p>The foundation of Hadoop's FI framework includes a
+ <title>Fault injection mechanism: AOP and AspectJ</title>
+ <p>In the foundation of Hadoop's fault injection framework lays
cross-cutting concept implemented by AspectJ. The following basic
terms are important to remember:
</p>
@@ -121,9 +122,8 @@
</li>
</ul>
</section>
-
<section>
- <title>Existing Join Points</title>
+ <title>Existing join points</title>
<p>
The following readily available join points are provided by AspectJ:
</p>
@@ -154,7 +154,7 @@
</section>
</section>
<section>
- <title>Aspect Example</title>
+ <title>Aspects examples</title>
<source>
package org.apache.hadoop.hdfs.server.datanode;
@@ -191,22 +191,17 @@
}
}
}
-</source>
-
- <p>The aspect has two main parts: </p>
- <ul>
- <li>The join point
+ </source>
+ <p>
+ The aspect has two main parts: the join point
<code>pointcut callReceivepacket()</code>
which servers as an identification mark of a specific point (in control
- and/or data flow) in the life of an application. </li>
-
- <li> A call to the advice -
+ and/or data flow) in the life of an application. A call to the advice -
<code>before () throws IOException : callReceivepacket()</code>
- - will be injected (see
- <a href="#Putting+it+all+together">Putting It All Together</a>)
- before that specific spot of the application's code.</li>
- </ul>
-
+ - will be
+ <a href="#Putting+it+all+together">injected</a>
+ before that specific spot of the application's code.
+ </p>
<p>The pointcut identifies an invocation of class'
<code>java.io.OutputStream write()</code>
@@ -215,8 +210,8 @@
take place within the body of method
<code>receivepacket()</code>
from class<code>BlockReceiver</code>.
- The method can have any parameters and any return type.
- Possible invocations of
+ The method can have any parameters and any return type. possible
+ invocations of
<code>write()</code>
method happening anywhere within the aspect
<code>BlockReceiverAspects</code>
@@ -227,22 +222,24 @@
class. In such a case the names of the faults have to be different
if a developer wants to trigger them separately.
</p>
- <p><strong>Note 2</strong>: After the injection step (see
- <a href="#Putting+it+all+together">Putting It All Together</a>)
+ <p><strong>Note 2</strong>: After
+ <a href="#Putting+it+all+together">injection step</a>
you can verify that the faults were properly injected by
- searching for <code>ajc</code> keywords in a disassembled class file.
+ searching for
+ <code>ajc</code>
+ keywords in a disassembled class file.
</p>
</section>
<section>
- <title>Fault Naming Convention and Namespaces</title>
- <p>For the sake of a unified naming
+ <title>Fault naming convention & namespaces</title>
+ <p>For the sake of unified naming
convention the following two types of names are recommended for a
new aspects development:</p>
<ul>
- <li>Activity specific notation
- (when we don't care about a particular location of a fault's
+ <li>Activity specific notation (as
+ when we don't care about a particular location of a fault's
happening). In this case the name of the fault is rather abstract:
<code>fi.hdfs.DiskError</code>
</li>
@@ -254,11 +251,14 @@
</section>
<section>
- <title>Development Tools</title>
+ <title>Development tools</title>
<ul>
- <li>The Eclipse
- <a href="http://www.eclipse.org/ajdt/">AspectJ Development Toolkit</a>
- may help you when developing aspects
+ <li>Eclipse
+ <a href="http://www.eclipse.org/ajdt/">AspectJ
+ Development Toolkit
+ </a>
+ might help you in the aspects' development
+ process.
</li>
<li>IntelliJ IDEA provides AspectJ weaver and Spring-AOP plugins
</li>
@@ -266,67 +266,60 @@
</section>
<section>
- <title>Putting It All Together</title>
- <p>Faults (aspects) have to injected (or woven) together before
- they can be used. Follow these instructions:</p>
-
- <ul>
- <li>To weave aspects in place use:
-<source>
+ <title>Putting it all together</title>
+ <p>Faults (or aspects) have to injected (or woven) together before
+ they can be used. Here's a step-by-step instruction how this can be
+ done.</p>
+ <p>Weaving aspects in place:</p>
+ <source>
% ant injectfaults
-</source>
- </li>
-
- <li>If you
- misidentified the join point of your aspect you will see a
- warning (similar to the one shown here) when 'injectfaults' target is
- completed:
-<source>
+ </source>
+ <p>If you
+ misidentified the join point of your aspect then you'll see a
+ warning similar to this one below when 'injectfaults' target is
+ completed:</p>
+ <source>
[iajc] warning at
src/test/aop/org/apache/hadoop/hdfs/server/datanode/ \
BlockReceiverAspects.aj:44::0
advice defined in org.apache.hadoop.hdfs.server.datanode.BlockReceiverAspects
has not been applied [Xlint:adviceDidNotMatch]
-</source>
- </li>
-
- <li>It isn't an error, so the build will report the successful result. <br />
- To prepare dev.jar file with all your faults weaved in place (HDFS-475 pending) use:
-<source>
+ </source>
+ <p>It isn't an error, so the build will report the successful result.
+
+ To prepare dev.jar file with all your faults weaved in
+ place run (HDFS-475 pending)</p>
+ <source>
% ant jar-fault-inject
-</source>
- </li>
+ </source>
- <li>To create test jars use:
-<source>
+ <p>Test jars can be created by</p>
+ <source>
% ant jar-test-fault-inject
-</source>
- </li>
+ </source>
- <li>To run HDFS tests with faults injected use:
-<source>
+ <p>To run HDFS tests with faults injected:</p>
+ <source>
% ant run-test-hdfs-fault-inject
-</source>
- </li>
- </ul>
-
+ </source>
<section>
- <title>How to Use the Fault Injection Framework</title>
- <p>Faults can be triggered as follows:
+ <title>How to use fault injection framework</title>
+ <p>Faults could be triggered by the following two meanings:
</p>
<ul>
- <li>During runtime:
-<source>
+ <li>In the runtime as:
+ <source>
% ant run-test-hdfs -Dfi.hdfs.datanode.BlockReceiver=0.12
-</source>
- To set a certain level, for example 25%, of all injected faults use:
+ </source>
+ To set a certain level, e.g. 25%, of all injected faults one can run
<br/>
-<source>
+ <source>
% ant run-test-hdfs-fault-inject -Dfi.*=0.25
-</source>
+ </source>
</li>
- <li>From a program:
-
+ <li>or from a program as follows:
+ </li>
+ </ul>
<source>
package org.apache.hadoop.fs;
@@ -361,23 +354,23 @@
//Cleaning up test test environment
}
}
-</source>
- </li>
- </ul>
-
+ </source>
<p>
- As you can see above these two methods do the same thing. They are
- setting the probability level of <code>hdfs.datanode.BlockReceiver</code>
- at 12%. The difference, however, is that the program provides more
- flexibility and allows you to turn a fault off when a test no longer needs it.
+ as you can see above these two methods do the same thing. They are
+ setting the probability level of
+ <code>hdfs.datanode.BlockReceiver</code>
+ at 12%.
+ The difference, however, is that the program provides more
+ flexibility and allows to turn a fault off when a test doesn't need
+ it anymore.
</p>
</section>
</section>
<section>
- <title>Additional Information and Contacts</title>
- <p>These two sources of information are particularly
- interesting and worth reading:
+ <title>Additional information and contacts</title>
+ <p>This two sources of information seem to be particularly
+ interesting and worth further reading:
</p>
<ul>
<li>
@@ -388,8 +381,9 @@
<li>AspectJ Cookbook (ISBN-13: 978-0-596-00654-9)
</li>
</ul>
- <p>If you have additional comments or questions for the author check
- <a href="http://issues.apache.org/jira/browse/HDFS-435">HDFS-435</a>.
+ <p>Should you have any farther comments or questions to the author
+ check
+ <a href="http://issues.apache.org/jira/browse/HDFS-435">HDFS-435</a>
</p>
</section>
</body>
Added: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/hadoop_archives.xml
URL: http://svn.apache.org/viewvc/hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/hadoop_archives.xml?rev=817449&view=auto
==============================================================================
--- hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/hadoop_archives.xml (added)
+++ hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/hadoop_archives.xml Mon Sep 21 22:33:09 2009
@@ -0,0 +1,80 @@
+<?xml version="1.0"?>
+<!--
+ Licensed to the Apache Software Foundation (ASF) under one or more
+ contributor license agreements. See the NOTICE file distributed with
+ this work for additional information regarding copyright ownership.
+ The ASF licenses this file to You under the Apache License, Version 2.0
+ (the "License"); you may not use this file except in compliance with
+ the License. You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License.
+-->
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+<document>
+ <header>
+ <title>Archives Guide</title>
+ </header>
+ <body>
+ <section>
+ <title> What are Hadoop archives? </title>
+ <p>
+ Hadoop archives are special format archives. A Hadoop archive
+ maps to a file system directory. A Hadoop archive always has a *.har
+ extension. A Hadoop archive directory contains metadata (in the form
+ of _index and _masterindex) and data (part-*) files. The _index file contains
+ the name of the files that are part of the archive and the location
+ within the part files.
+ </p>
+ </section>
+ <section>
+ <title> How to create an archive? </title>
+ <p>
+ <code>Usage: hadoop archive -archiveName name <src>* <dest></code>
+ </p>
+ <p>
+ -archiveName is the name of the archive you would like to create.
+ An example would be foo.har. The name should have a *.har extension.
+ The inputs are file system pathnames which work as usual with regular
+ expressions. The destination directory would contain the archive.
+ Note that this is a Map/Reduce job that creates the archives. You would
+ need a map reduce cluster to run this. The following is an example:</p>
+ <p>
+ <code>hadoop archive -archiveName foo.har /user/hadoop/dir1 /user/hadoop/dir2 /user/zoo/</code>
+ </p><p>
+ In the above example /user/hadoop/dir1 and /user/hadoop/dir2 will be
+ archived in the following file system directory -- /user/zoo/foo.har.
+ The sources are not changed or removed when an archive is created.
+ </p>
+ </section>
+ <section>
+ <title> How to look up files in archives? </title>
+ <p>
+ The archive exposes itself as a file system layer. So all the fs shell
+ commands in the archives work but with a different URI. Also, note that
+ archives are immutable. So, rename's, deletes and creates return
+ an error. URI for Hadoop Archives is
+ </p><p><code>har://scheme-hostname:port/archivepath/fileinarchive</code></p><p>
+ If no scheme is provided it assumes the underlying filesystem.
+ In that case the URI would look like
+ </p><p><code>
+ har:///archivepath/fileinarchive</code></p>
+ <p>
+ Here is an example of archive. The input to the archives is /dir. The directory dir contains
+ files filea, fileb. To archive /dir to /user/hadoop/foo.har, the command is
+ </p>
+ <p><code>hadoop archive -archiveName foo.har /dir /user/hadoop</code>
+ </p><p>
+ To get file listing for files in the created archive
+ </p>
+ <p><code>hadoop dfs -lsr har:///user/hadoop/foo.har</code></p>
+ <p>To cat filea in archive -
+ </p><p><code>hadoop dfs -cat har:///user/hadoop/foo.har/dir/filea</code></p>
+ </section>
+ </body>
+</document>
Propchange: hadoop/hdfs/branches/HDFS-265/src/docs/src/documentation/content/xdocs/hadoop_archives.xml
------------------------------------------------------------------------------
svn:mime-type = text/plain