You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-commits@hadoop.apache.org by tu...@apache.org on 2012/12/20 14:41:43 UTC
svn commit: r1424459 [2/2] - in
/hadoop/common/trunk/hadoop-common-project/hadoop-common: ./
src/main/docs/src/documentation/content/xdocs/ src/site/apt/
Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm?rev=1424459&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/CommandsManual.apt.vm Thu Dec 20 13:41:43 2012
@@ -0,0 +1,490 @@
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements. See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License. You may obtain a copy of the License at
+~~
+~~ http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+
+ ---
+ Hadoop Commands Guide
+ ---
+ ---
+ ${maven.build.timestamp}
+
+%{toc}
+
+Overview
+
+ All hadoop commands are invoked by the <<<bin/hadoop>>> script. Running the
+ hadoop script without any arguments prints the description for all
+ commands.
+
+ Usage: <<<hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
+
+ Hadoop has an option parsing framework that employs parsing generic
+ options as well as running classes.
+
+*-----------------------+---------------+
+|| COMMAND_OPTION || Description
+*-----------------------+---------------+
+| <<<--config confdir>>>| Overwrites the default Configuration directory. Default is <<<${HADOOP_HOME}/conf>>>.
+*-----------------------+---------------+
+| GENERIC_OPTIONS | The common set of options supported by multiple commands.
+| COMMAND_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands.
+*-----------------------+---------------+
+
+Generic Options
+
+ The following options are supported by {{dfsadmin}}, {{fs}}, {{fsck}},
+ {{job}} and {{fetchdt}}. Applications should implement {{{some_useful_url}Tool}} to support
+ {{{another_useful_url}GenericOptions}}.
+
+*------------------------------------------------+-----------------------------+
+|| GENERIC_OPTION || Description
+*------------------------------------------------+-----------------------------+
+|<<<-conf \<configuration file\> >>> | Specify an application
+ | configuration file.
+*------------------------------------------------+-----------------------------+
+|<<<-D \<property\>=\<value\> >>> | Use value for given property.
+*------------------------------------------------+-----------------------------+
+|<<<-jt \<local\> or \<jobtracker:port\> >>> | Specify a job tracker.
+ | Applies only to job.
+*------------------------------------------------+-----------------------------+
+|<<<-files \<comma separated list of files\> >>> | Specify comma separated files
+ | to be copied to the map
+ | reduce cluster. Applies only
+ | to job.
+*------------------------------------------------+-----------------------------+
+|<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
+ | files to include in the
+ | classpath. Applies only to
+ | job.
+*------------------------------------------------+-----------------------------+
+|<<<-archives \<comma separated list of archives\> >>> | Specify comma separated
+ | archives to be unarchived on
+ | the compute machines. Applies
+ | only to job.
+*------------------------------------------------+-----------------------------+
+
+User Commands
+
+ Commands useful for users of a hadoop cluster.
+
+* <<<archive>>>
+
+ Creates a hadoop archive. More information can be found at Hadoop
+ Archives.
+
+ Usage: <<<hadoop archive -archiveName NAME <src>* <dest> >>>
+
+*-------------------+-------------------------------------------------------+
+||COMMAND_OPTION || Description
+*-------------------+-------------------------------------------------------+
+| -archiveName NAME | Name of the archive to be created.
+*-------------------+-------------------------------------------------------+
+| src | Filesystem pathnames which work as usual with regular
+ | expressions.
+*-------------------+-------------------------------------------------------+
+| dest | Destination directory which would contain the archive.
+*-------------------+-------------------------------------------------------+
+
+* <<<distcp>>>
+
+ Copy file or directories recursively. More information can be found at
+ Hadoop DistCp Guide.
+
+ Usage: <<<hadoop distcp <srcurl> <desturl> >>>
+
+*-------------------+--------------------------------------------+
+||COMMAND_OPTION || Description
+*-------------------+--------------------------------------------+
+| srcurl | Source Url
+*-------------------+--------------------------------------------+
+| desturl | Destination Url
+*-------------------+--------------------------------------------+
+
+* <<<fs>>>
+
+ Usage: <<<hadoop fs [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
+
+ Deprecated, use <<<hdfs dfs>>> instead.
+
+ Runs a generic filesystem user client.
+
+ The various COMMAND_OPTIONS can be found at File System Shell Guide.
+
+* <<<fsck>>>
+
+ Runs a HDFS filesystem checking utility. See {{Fsck}} for more info.
+
+ Usage: <<<hadoop fsck [GENERIC_OPTIONS] <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]>>>
+
+*------------------+---------------------------------------------+
+|| COMMAND_OPTION || Description
+*------------------+---------------------------------------------+
+| <path> | Start checking from this path.
+*------------------+---------------------------------------------+
+| -move | Move corrupted files to /lost+found
+*------------------+---------------------------------------------+
+| -delete | Delete corrupted files.
+*------------------+---------------------------------------------+
+| -openforwrite | Print out files opened for write.
+*------------------+---------------------------------------------+
+| -files | Print out files being checked.
+*------------------+---------------------------------------------+
+| -blocks | Print out block report.
+*------------------+---------------------------------------------+
+| -locations | Print out locations for every block.
+*------------------+---------------------------------------------+
+| -racks | Print out network topology for data-node locations.
+*------------------+---------------------------------------------+
+
+* <<<fetchdt>>>
+
+ Gets Delegation Token from a NameNode. See {{fetchdt}} for more info.
+
+ Usage: <<<hadoop fetchdt [GENERIC_OPTIONS] [--webservice <namenode_http_addr>] <path> >>>
+
+*------------------------------+---------------------------------------------+
+|| COMMAND_OPTION || Description
+*------------------------------+---------------------------------------------+
+| <fileName> | File name to store the token into.
+*------------------------------+---------------------------------------------+
+| --webservice <https_address> | use http protocol instead of RPC
+*------------------------------+---------------------------------------------+
+
+* <<<jar>>>
+
+ Runs a jar file. Users can bundle their Map Reduce code in a jar file and
+ execute it using this command.
+
+ Usage: <<<hadoop jar <jar> [mainClass] args...>>>
+
+ The streaming jobs are run via this command. Examples can be referred from
+ Streaming examples
+
+ Word count example is also run using jar command. It can be referred from
+ Wordcount example
+
+* <<<job>>>
+
+ Command to interact with Map Reduce Jobs.
+
+ Usage: <<<hadoop job [GENERIC_OPTIONS] [-submit <job-file>] | [-status <job-id>] | [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] | [-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <jobOutputDir>] | [-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] | [-set-priority <job-id> <priority>]>>>
+
+*------------------------------+---------------------------------------------+
+|| COMMAND_OPTION || Description
+*------------------------------+---------------------------------------------+
+| -submit <job-file> | Submits the job.
+*------------------------------+---------------------------------------------+
+| -status <job-id> | Prints the map and reduce completion
+ | percentage and all job counters.
+*------------------------------+---------------------------------------------+
+| -counter <job-id> <group-name> <counter-name> | Prints the counter value.
+*------------------------------+---------------------------------------------+
+| -kill <job-id> | Kills the job.
+*------------------------------+---------------------------------------------+
+| -events <job-id> <from-event-#> <#-of-events> | Prints the events' details
+ | received by jobtracker for the given range.
+*------------------------------+---------------------------------------------+
+| -history [all]<jobOutputDir> | Prints job details, failed and killed tip
+ | details. More details about the job such as
+ | successful tasks and task attempts made for
+ | each task can be viewed by specifying the [all]
+ | option.
+*------------------------------+---------------------------------------------+
+| -list [all] | Displays jobs which are yet to complete.
+ | <<<-list all>>> displays all jobs.
+*------------------------------+---------------------------------------------+
+| -kill-task <task-id> | Kills the task. Killed tasks are NOT counted
+ | against failed attempts.
+*------------------------------+---------------------------------------------+
+| -fail-task <task-id> | Fails the task. Failed tasks are counted
+ | against failed attempts.
+*------------------------------+---------------------------------------------+
+| -set-priority <job-id> <priority> | Changes the priority of the job. Allowed
+ | priority values are VERY_HIGH, HIGH, NORMAL,
+ | LOW, VERY_LOW
+*------------------------------+---------------------------------------------+
+
+* <<<pipes>>>
+
+ Runs a pipes job.
+
+ Usage: <<<hadoop pipes [-conf <path>] [-jobconf <key=value>, <key=value>,
+ ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat
+ <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer
+ <class>] [-program <executable>] [-reduces <num>]>>>
+
+*----------------------------------------+------------------------------------+
+|| COMMAND_OPTION || Description
+*----------------------------------------+------------------------------------+
+| -conf <path> | Configuration for job
+*----------------------------------------+------------------------------------+
+| -jobconf <key=value>, <key=value>, ... | Add/override configuration for job
+*----------------------------------------+------------------------------------+
+| -input <path> | Input directory
+*----------------------------------------+------------------------------------+
+| -output <path> | Output directory
+*----------------------------------------+------------------------------------+
+| -jar <jar file> | Jar filename
+*----------------------------------------+------------------------------------+
+| -inputformat <class> | InputFormat class
+*----------------------------------------+------------------------------------+
+| -map <class> | Java Map class
+*----------------------------------------+------------------------------------+
+| -partitioner <class> | Java Partitioner
+*----------------------------------------+------------------------------------+
+| -reduce <class> | Java Reduce class
+*----------------------------------------+------------------------------------+
+| -writer <class> | Java RecordWriter
+*----------------------------------------+------------------------------------+
+| -program <executable> | Executable URI
+*----------------------------------------+------------------------------------+
+| -reduces <num> | Number of reduces
+*----------------------------------------+------------------------------------+
+
+* <<<queue>>>
+
+ command to interact and view Job Queue information
+
+ Usage: <<<hadoop queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]>>>
+
+*-----------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*-----------------+-----------------------------------------------------------+
+| -list | Gets list of Job Queues configured in the system.
+ | Along with scheduling information associated with the job queues.
+*-----------------+-----------------------------------------------------------+
+| -info <job-queue-name> [-showJobs] | Displays the job queue information and
+ | associated scheduling information of particular job queue.
+ | If <<<-showJobs>>> options is present a list of jobs
+ | submitted to the particular job queue is displayed.
+*-----------------+-----------------------------------------------------------+
+| -showacls | Displays the queue name and associated queue operations
+ | allowed for the current user. The list consists of only
+ | those queues to which the user has access.
+*-----------------+-----------------------------------------------------------+
+
+* <<<version>>>
+
+ Prints the version.
+
+ Usage: <<<hadoop version>>>
+
+* <<<CLASSNAME>>>
+
+ hadoop script can be used to invoke any class.
+
+ Usage: <<<hadoop CLASSNAME>>>
+
+ Runs the class named <<<CLASSNAME>>>.
+
+* <<<classpath>>>
+
+ Prints the class path needed to get the Hadoop jar and the required
+ libraries.
+
+ Usage: <<<hadoop classpath>>>
+
+Administration Commands
+
+ Commands useful for administrators of a hadoop cluster.
+
+* <<<balancer>>>
+
+ Runs a cluster balancing utility. An administrator can simply press Ctrl-C
+ to stop the rebalancing process. See Rebalancer for more details.
+
+ Usage: <<<hadoop balancer [-threshold <threshold>]>>>
+
+*------------------------+-----------------------------------------------------------+
+|| COMMAND_OPTION | Description
+*------------------------+-----------------------------------------------------------+
+| -threshold <threshold> | Percentage of disk capacity. This overwrites the
+ | default threshold.
+*------------------------+-----------------------------------------------------------+
+
+* <<<daemonlog>>>
+
+ Get/Set the log level for each daemon.
+
+ Usage: <<<hadoop daemonlog -getlevel <host:port> <name> >>>
+ Usage: <<<hadoop daemonlog -setlevel <host:port> <name> <level> >>>
+
+*------------------------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*------------------------------+-----------------------------------------------------------+
+| -getlevel <host:port> <name> | Prints the log level of the daemon running at
+ | <host:port>. This command internally connects
+ | to http://<host:port>/logLevel?log=<name>
+*------------------------------+-----------------------------------------------------------+
+| -setlevel <host:port> <name> <level> | Sets the log level of the daemon
+ | running at <host:port>. This command internally
+ | connects to http://<host:port>/logLevel?log=<name>
+*------------------------------+-----------------------------------------------------------+
+
+* <<<datanode>>>
+
+ Runs a HDFS datanode.
+
+ Usage: <<<hadoop datanode [-rollback]>>>
+
+*-----------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*-----------------+-----------------------------------------------------------+
+| -rollback | Rollsback the datanode to the previous version. This should
+ | be used after stopping the datanode and distributing the old
+ | hadoop version.
+*-----------------+-----------------------------------------------------------+
+
+* <<<dfsadmin>>>
+
+ Runs a HDFS dfsadmin client.
+
+ Usage: <<<hadoop dfsadmin [GENERIC_OPTIONS] [-report] [-safemode enter | leave | get | wait] [-refreshNodes] [-finalizeUpgrade] [-upgradeProgress status | details | force] [-metasave filename] [-setQuota <quota> <dirname>...<dirname>] [-clrQuota <dirname>...<dirname>] [-help [cmd]]>>>
+
+*-----------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+| -report | Reports basic filesystem information and statistics.
+*-----------------+-----------------------------------------------------------+
+| -safemode enter / leave / get / wait | Safe mode maintenance command. Safe
+ | mode is a Namenode state in which it \
+ | 1. does not accept changes to the name space (read-only) \
+ | 2. does not replicate or delete blocks. \
+ | Safe mode is entered automatically at Namenode startup, and
+ | leaves safe mode automatically when the configured minimum
+ | percentage of blocks satisfies the minimum replication
+ | condition. Safe mode can also be entered manually, but then
+ | it can only be turned off manually as well.
+*-----------------+-----------------------------------------------------------+
+| -refreshNodes | Re-read the hosts and exclude files to update the set of
+ | Datanodes that are allowed to connect to the Namenode and
+ | those that should be decommissioned or recommissioned.
+*-----------------+-----------------------------------------------------------+
+| -finalizeUpgrade| Finalize upgrade of HDFS. Datanodes delete their previous
+ | version working directories, followed by Namenode doing the
+ | same. This completes the upgrade process.
+*-----------------+-----------------------------------------------------------+
+| -upgradeProgress status / details / force | Request current distributed
+ | upgrade status, a detailed status or force the upgrade to
+ | proceed.
+*-----------------+-----------------------------------------------------------+
+| -metasave filename | Save Namenode's primary data structures to <filename> in
+ | the directory specified by hadoop.log.dir property.
+ | <filename> will contain one line for each of the following\
+ | 1. Datanodes heart beating with Namenode\
+ | 2. Blocks waiting to be replicated\
+ | 3. Blocks currrently being replicated\
+ | 4. Blocks waiting to be deleted\
+*-----------------+-----------------------------------------------------------+
+| -setQuota <quota> <dirname>...<dirname> | Set the quota <quota> for each
+ | directory <dirname>. The directory quota is a long integer
+ | that puts a hard limit on the number of names in the
+ | directory tree. Best effort for the directory, with faults
+ | reported if \
+ | 1. N is not a positive integer, or \
+ | 2. user is not an administrator, or \
+ | 3. the directory does not exist or is a file, or \
+ | 4. the directory would immediately exceed the new quota. \
+*-----------------+-----------------------------------------------------------+
+| -clrQuota <dirname>...<dirname> | Clear the quota for each directory
+ | <dirname>. Best effort for the directory. with fault
+ | reported if \
+ | 1. the directory does not exist or is a file, or \
+ | 2. user is not an administrator. It does not fault if the
+ | directory has no quota.
+*-----------------+-----------------------------------------------------------+
+| -help [cmd] | Displays help for the given command or all commands if none
+ | is specified.
+*-----------------+-----------------------------------------------------------+
+
+* <<<mradmin>>>
+
+ Runs MR admin client
+
+ Usage: <<<hadoop mradmin [ GENERIC_OPTIONS ] [-refreshQueueAcls]>>>
+
+*-------------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*-------------------+-----------------------------------------------------------+
+| -refreshQueueAcls | Refresh the queue acls used by hadoop, to check access
+ | during submissions and administration of the job by the
+ | user. The properties present in mapred-queue-acls.xml is
+ | reloaded by the queue manager.
+*-------------------+-----------------------------------------------------------+
+
+* <<<jobtracker>>>
+
+ Runs the MapReduce job Tracker node.
+
+ Usage: <<<hadoop jobtracker [-dumpConfiguration]>>>
+
+*--------------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*--------------------+-----------------------------------------------------------+
+| -dumpConfiguration | Dumps the configuration used by the JobTracker alongwith
+ | queue configuration in JSON format into Standard output
+ | used by the jobtracker and exits.
+*--------------------+-----------------------------------------------------------+
+
+* <<<namenode>>>
+
+ Runs the namenode. More info about the upgrade, rollback and finalize is
+ at Upgrade Rollback
+
+ Usage: <<<hadoop namenode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint]>>>
+
+*--------------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*--------------------+-----------------------------------------------------------+
+| -format | Formats the namenode. It starts the namenode, formats
+ | it and then shut it down.
+*--------------------+-----------------------------------------------------------+
+| -upgrade | Namenode should be started with upgrade option after
+ | the distribution of new hadoop version.
+*--------------------+-----------------------------------------------------------+
+| -rollback | Rollsback the namenode to the previous version. This
+ | should be used after stopping the cluster and
+ | distributing the old hadoop version.
+*--------------------+-----------------------------------------------------------+
+| -finalize | Finalize will remove the previous state of the files
+ | system. Recent upgrade will become permanent. Rollback
+ | option will not be available anymore. After finalization
+ | it shuts the namenode down.
+*--------------------+-----------------------------------------------------------+
+| -importCheckpoint | Loads image from a checkpoint directory and save it
+ | into the current one. Checkpoint dir is read from
+ | property fs.checkpoint.dir
+*--------------------+-----------------------------------------------------------+
+
+* <<<secondarynamenode>>>
+
+ Runs the HDFS secondary namenode. See Secondary Namenode for more
+ info.
+
+ Usage: <<<hadoop secondarynamenode [-checkpoint [force]] | [-geteditsize]>>>
+
+*----------------------+-----------------------------------------------------------+
+|| COMMAND_OPTION || Description
+*----------------------+-----------------------------------------------------------+
+| -checkpoint [-force] | Checkpoints the Secondary namenode if EditLog size
+ | >= fs.checkpoint.size. If <<<-force>>> is used,
+ | checkpoint irrespective of EditLog size.
+*----------------------+-----------------------------------------------------------+
+| -geteditsize | Prints the EditLog size.
+*----------------------+-----------------------------------------------------------+
+
+* <<<tasktracker>>>
+
+ Runs a MapReduce task Tracker node.
+
+ Usage: <<<hadoop tasktracker>>>
Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm?rev=1424459&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/FileSystemShell.apt.vm Thu Dec 20 13:41:43 2012
@@ -0,0 +1,418 @@
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements. See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License. You may obtain a copy of the License at
+~~
+~~ http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+
+ ---
+ File System Shell Guide
+ ---
+ ---
+ ${maven.build.timestamp}
+
+%{toc}
+
+Overview
+
+ The File System (FS) shell includes various shell-like commands that
+ directly interact with the Hadoop Distributed File System (HDFS) as well as
+ other file systems that Hadoop supports, such as Local FS, HFTP FS, S3 FS,
+ and others. The FS shell is invoked by:
+
++---
+bin/hadoop fs <args>
++---
+
+ All FS shell commands take path URIs as arguments. The URI format is
+ <<<scheme://authority/path>>>. For HDFS the scheme is <<<hdfs>>>, and for
+ the Local FS the scheme is <<<file>>>. The scheme and authority are
+ optional. If not specified, the default scheme specified in the
+ configuration is used. An HDFS file or directory such as /parent/child can
+ be specified as <<<hdfs://namenodehost/parent/child>>> or simply as
+ <<</parent/child>>> (given that your configuration is set to point to
+ <<<hdfs://namenodehost>>>).
+
+ Most of the commands in FS shell behave like corresponding Unix commands.
+ Differences are described with each of the commands. Error information is
+ sent to stderr and the output is sent to stdout.
+
+cat
+
+ Usage: <<<hdfs dfs -cat URI [URI ...]>>>
+
+ Copies source paths to stdout.
+
+ Example:
+
+ * <<<hdfs dfs -cat hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
+
+ * <<<hdfs dfs -cat file:///file3 /user/hadoop/file4>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+chgrp
+
+ Usage: <<<hdfs dfs -chgrp [-R] GROUP URI [URI ...]>>>
+
+ Change group association of files. With -R, make the change recursively
+ through the directory structure. The user must be the owner of files, or
+ else a super-user. Additional information is in the
+ {{{betterurl}Permissions Guide}}.
+
+chmod
+
+ Usage: <<<hdfs dfs -chmod [-R] <MODE[,MODE]... | OCTALMODE> URI [URI ...]>>>
+
+ Change the permissions of files. With -R, make the change recursively
+ through the directory structure. The user must be the owner of the file, or
+ else a super-user. Additional information is in the
+ {{{betterurl}Permissions Guide}}.
+
+chown
+
+ Usage: <<<hdfs dfs -chown [-R] [OWNER][:[GROUP]] URI [URI ]>>>
+
+ Change the owner of files. With -R, make the change recursively through the
+ directory structure. The user must be a super-user. Additional information
+ is in the {{{betterurl}Permissions Guide}}.
+
+copyFromLocal
+
+ Usage: <<<hdfs dfs -copyFromLocal <localsrc> URI>>>
+
+ Similar to put command, except that the source is restricted to a local
+ file reference.
+
+copyToLocal
+
+ Usage: <<<hdfs dfs -copyToLocal [-ignorecrc] [-crc] URI <localdst> >>>
+
+ Similar to get command, except that the destination is restricted to a
+ local file reference.
+
+count
+
+ Usage: <<<hdfs dfs -count [-q] <paths> >>>
+
+ Count the number of directories, files and bytes under the paths that match
+ the specified file pattern. The output columns with -count are: DIR_COUNT,
+ FILE_COUNT, CONTENT_SIZE FILE_NAME
+
+ The output columns with -count -q are: QUOTA, REMAINING_QUATA, SPACE_QUOTA,
+ REMAINING_SPACE_QUOTA, DIR_COUNT, FILE_COUNT, CONTENT_SIZE, FILE_NAME
+
+ Example:
+
+ * <<<hdfs dfs -count hdfs://nn1.example.com/file1 hdfs://nn2.example.com/file2>>>
+
+ * <<<hdfs dfs -count -q hdfs://nn1.example.com/file1>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+cp
+
+ Usage: <<<hdfs dfs -cp URI [URI ...] <dest> >>>
+
+ Copy files from source to destination. This command allows multiple sources
+ as well in which case the destination must be a directory.
+
+ Example:
+
+ * <<<hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2>>>
+
+ * <<<hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2 /user/hadoop/dir>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+du
+
+ Usage: <<<hdfs dfs -du [-s] [-h] URI [URI ...]>>>
+
+ Displays sizes of files and directories contained in the given directory or
+ the length of a file in case its just a file.
+
+ Options:
+
+ * The -s option will result in an aggregate summary of file lengths being
+ displayed, rather than the individual files.
+
+ * The -h option will format file sizes in a "human-readable" fashion (e.g
+ 64.0m instead of 67108864)
+
+ Example:
+
+ * hdfs dfs -du /user/hadoop/dir1 /user/hadoop/file1 hdfs://nn.example.com/user/hadoop/dir1
+
+ Exit Code:
+ Returns 0 on success and -1 on error.
+
+dus
+
+ Usage: <<<hdfs dfs -dus <args> >>>
+
+ Displays a summary of file lengths. This is an alternate form of hdfs dfs -du -s.
+
+expunge
+
+ Usage: <<<hdfs dfs -expunge>>>
+
+ Empty the Trash. Refer to the {{{betterurl}HDFS Architecture Guide}} for
+ more information on the Trash feature.
+
+get
+
+ Usage: <<<hdfs dfs -get [-ignorecrc] [-crc] <src> <localdst> >>>
+
+ Copy files to the local file system. Files that fail the CRC check may be
+ copied with the -ignorecrc option. Files and CRCs may be copied using the
+ -crc option.
+
+ Example:
+
+ * <<<hdfs dfs -get /user/hadoop/file localfile>>>
+
+ * <<<hdfs dfs -get hdfs://nn.example.com/user/hadoop/file localfile>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+getmerge
+
+ Usage: <<<hdfs dfs -getmerge <src> <localdst> [addnl]>>>
+
+ Takes a source directory and a destination file as input and concatenates
+ files in src into the destination local file. Optionally addnl can be set to
+ enable adding a newline character at the
+ end of each file.
+
+ls
+
+ Usage: <<<hdfs dfs -ls <args> >>>
+
+ For a file returns stat on the file with the following format:
+
++---+
+permissions number_of_replicas userid groupid filesize modification_date modification_time filename
++---+
+
+ For a directory it returns list of its direct children as in unix.A directory is listed as:
+
++---+
+permissions userid groupid modification_date modification_time dirname
++---+
+
+ Example:
+
+ * <<<hdfs dfs -ls /user/hadoop/file1>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+lsr
+
+ Usage: <<<hdfs dfs -lsr <args> >>>
+
+ Recursive version of ls. Similar to Unix ls -R.
+
+mkdir
+
+ Usage: <<<hdfs dfs -mkdir [-p] <paths> >>>
+
+ Takes path uri's as argument and creates directories. With -p the behavior
+ is much like unix mkdir -p creating parent directories along the path.
+
+ Example:
+
+ * <<<hdfs dfs -mkdir /user/hadoop/dir1 /user/hadoop/dir2>>>
+
+ * <<<hdfs dfs -mkdir hdfs://nn1.example.com/user/hadoop/dir hdfs://nn2.example.com/user/hadoop/dir>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+moveFromLocal
+
+ Usage: <<<dfs -moveFromLocal <localsrc> <dst> >>>
+
+ Similar to put command, except that the source localsrc is deleted after
+ it's copied.
+
+moveToLocal
+
+ Usage: <<<hdfs dfs -moveToLocal [-crc] <src> <dst> >>>
+
+ Displays a "Not implemented yet" message.
+
+mv
+
+ Usage: <<<hdfs dfs -mv URI [URI ...] <dest> >>>
+
+ Moves files from source to destination. This command allows multiple sources
+ as well in which case the destination needs to be a directory. Moving files
+ across file systems is not permitted.
+
+ Example:
+
+ * <<<hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2>>>
+
+ * <<<hdfs dfs -mv hdfs://nn.example.com/file1 hdfs://nn.example.com/file2 hdfs://nn.example.com/file3 hdfs://nn.example.com/dir1>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+put
+
+ Usage: <<<hdfs dfs -put <localsrc> ... <dst> >>>
+
+ Copy single src, or multiple srcs from local file system to the destination
+ file system. Also reads input from stdin and writes to destination file
+ system.
+
+ * <<<hdfs dfs -put localfile /user/hadoop/hadoopfile>>>
+
+ * <<<hdfs dfs -put localfile1 localfile2 /user/hadoop/hadoopdir>>>
+
+ * <<<hdfs dfs -put localfile hdfs://nn.example.com/hadoop/hadoopfile>>>
+
+ * <<<hdfs dfs -put - hdfs://nn.example.com/hadoop/hadoopfile>>>
+ Reads the input from stdin.
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+rm
+
+ Usage: <<<hdfs dfs -rm [-skipTrash] URI [URI ...]>>>
+
+ Delete files specified as args. Only deletes non empty directory and files.
+ If the -skipTrash option is specified, the trash, if enabled, will be
+ bypassed and the specified file(s) deleted immediately. This can be useful
+ when it is necessary to delete files from an over-quota directory. Refer to
+ rmr for recursive deletes.
+
+ Example:
+
+ * <<<hdfs dfs -rm hdfs://nn.example.com/file /user/hadoop/emptydir>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+rmr
+
+ Usage: <<<hdfs dfs -rmr [-skipTrash] URI [URI ...]>>>
+
+ Recursive version of delete. If the -skipTrash option is specified, the
+ trash, if enabled, will be bypassed and the specified file(s) deleted
+ immediately. This can be useful when it is necessary to delete files from an
+ over-quota directory.
+
+ Example:
+
+ * <<<hdfs dfs -rmr /user/hadoop/dir>>>
+
+ * <<<hdfs dfs -rmr hdfs://nn.example.com/user/hadoop/dir>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+setrep
+
+ Usage: <<<hdfs dfs -setrep [-R] <path> >>>
+
+ Changes the replication factor of a file. -R option is for recursively
+ increasing the replication factor of files within a directory.
+
+ Example:
+
+ * <<<hdfs dfs -setrep -w 3 -R /user/hadoop/dir1>>>
+
+ Exit Code:
+
+ Returns 0 on success and -1 on error.
+
+stat
+
+ Usage: <<<hdfs dfs -stat URI [URI ...]>>>
+
+ Returns the stat information on the path.
+
+ Example:
+
+ * <<<hdfs dfs -stat path>>>
+
+ Exit Code:
+ Returns 0 on success and -1 on error.
+
+tail
+
+ Usage: <<<hdfs dfs -tail [-f] URI>>>
+
+ Displays last kilobyte of the file to stdout. -f option can be used as in
+ Unix.
+
+ Example:
+
+ * <<<hdfs dfs -tail pathname>>>
+
+ Exit Code:
+ Returns 0 on success and -1 on error.
+
+test
+
+ Usage: <<<hdfs dfs -test -[ezd] URI>>>
+
+ Options:
+
+*----+------------+
+| -e | check to see if the file exists. Return 0 if true.
+*----+------------+
+| -z | check to see if the file is zero length. Return 0 if true.
+*----+------------+
+| -d | check to see if the path is directory. Return 0 if true.
+*----+------------+
+
+ Example:
+
+ * <<<hdfs dfs -test -e filename>>>
+
+text
+
+ Usage: <<<hdfs dfs -text <src> >>>
+
+ Takes a source file and outputs the file in text format. The allowed formats
+ are zip and TextRecordInputStream.
+
+touchz
+
+ Usage: <<<hdfs dfs -touchz URI [URI ...]>>>
+
+ Create a file of zero length.
+
+ Example:
+
+ * <<<hadoop -touchz pathname>>>
+
+ Exit Code:
+ Returns 0 on success and -1 on error.
Added: hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/HttpAuthentication.apt.vm
URL: http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/HttpAuthentication.apt.vm?rev=1424459&view=auto
==============================================================================
--- hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/HttpAuthentication.apt.vm (added)
+++ hadoop/common/trunk/hadoop-common-project/hadoop-common/src/site/apt/HttpAuthentication.apt.vm Thu Dec 20 13:41:43 2012
@@ -0,0 +1,99 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~ http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+ ---
+ Authentication for Hadoop HTTP web-consoles
+ ---
+ ---
+ ${maven.build.timestamp}
+
+Authentication for Hadoop HTTP web-consoles
+
+%{toc|section=1|fromDepth=0}
+
+* Introduction
+
+ This document describes how to configure Hadoop HTTP web-consoles to
+ require user authentication.
+
+ By default Hadoop HTTP web-consoles (JobTracker, NameNode, TaskTrackers
+ and DataNodes) allow access without any form of authentication.
+
+ Similarly to Hadoop RPC, Hadoop HTTP web-consoles can be configured to
+ require Kerberos authentication using HTTP SPNEGO protocol (supported
+ by browsers like Firefox and Internet Explorer).
+
+ In addition, Hadoop HTTP web-consoles support the equivalent of
+ Hadoop's Pseudo/Simple authentication. If this option is enabled, user
+ must specify their user name in the first browser interaction using the
+ user.name query string parameter. For example:
+ <<<http://localhost:50030/jobtracker.jsp?user.name=babu>>>.
+
+ If a custom authentication mechanism is required for the HTTP
+ web-consoles, it is possible to implement a plugin to support the
+ alternate authentication mechanism (refer to Hadoop hadoop-auth for details
+ on writing an <<<AuthenticatorHandler>>>).
+
+ The next section describes how to configure Hadoop HTTP web-consoles to
+ require user authentication.
+
+* Configuration
+
+ The following properties should be in the <<<core-site.xml>>> of all the
+ nodes in the cluster.
+
+ <<<hadoop.http.filter.initializers>>>: add to this property the
+ <<<org.apache.hadoop.security.AuthenticationFilterInitializer>>> initializer
+ class.
+
+ <<<hadoop.http.authentication.type>>>: Defines authentication used for the
+ HTTP web-consoles. The supported values are: <<<simple>>> | <<<kerberos>>> |
+ <<<#AUTHENTICATION_HANDLER_CLASSNAME#>>>. The dfeault value is <<<simple>>>.
+
+ <<<hadoop.http.authentication.token.validity>>>: Indicates how long (in
+ seconds) an authentication token is valid before it has to be renewed.
+ The default value is <<<36000>>>.
+
+ <<<hadoop.http.authentication.signature.secret.file>>>: The signature secret
+ file for signing the authentication tokens. If not set a random secret is
+ generated at startup time. The same secret should be used for all nodes
+ in the cluster, JobTracker, NameNode, DataNode and TastTracker. The
+ default value is <<<${user.home}/hadoop-http-auth-signature-secret>>>.
+ IMPORTANT: This file should be readable only by the Unix user running the
+ daemons.
+
+ <<<hadoop.http.authentication.cookie.domain>>>: The domain to use for the
+ HTTP cookie that stores the authentication token. In order to
+ authentiation to work correctly across all nodes in the cluster the
+ domain must be correctly set. There is no default value, the HTTP
+ cookie will not have a domain working only with the hostname issuing
+ the HTTP cookie.
+
+ IMPORTANT: when using IP addresses, browsers ignore cookies with domain
+ settings. For this setting to work properly all nodes in the cluster
+ must be configured to generate URLs with <<<hostname.domain>>> names on it.
+
+ <<<hadoop.http.authentication.simple.anonymous.allowed>>>: Indicates if
+ anonymous requests are allowed when using 'simple' authentication. The
+ default value is <<<true>>>
+
+ <<<hadoop.http.authentication.kerberos.principal>>>: Indicates the Kerberos
+ principal to be used for HTTP endpoint when using 'kerberos'
+ authentication. The principal short name must be <<<HTTP>>> per Kerberos HTTP
+ SPNEGO specification. The default value is <<<HTTP/_HOST@$LOCALHOST>>>,
+ where <<<_HOST>>> -if present- is replaced with bind address of the HTTP
+ server.
+
+ <<<hadoop.http.authentication.kerberos.keytab>>>: Location of the keytab file
+ with the credentials for the Kerberos principal used for the HTTP
+ endpoint. The default value is <<<${user.home}/hadoop.keytab>>>.i
+