You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Roshan James <ro...@gmail.com> on 2009/06/19 01:28:39 UTC

Can a hadoop pipes job be given multiple input directories?

In the documentation for Hadoop Streaming it says that the "-input" option
can be specified multiple times for multiples input directories. The same
does not seem to work with Pipes.

Is there some way to specify multiple input directories for pipes jobs?

Roshan

ps. With muliple input dirs this is what happens (i.e. there is no clear
error message of any sort).

*+ bin/hadoop pipes -conf pipes.xml -input /in-dir-har/test.har -input
/in-dir -output /out-dir
bin/hadoop pipes
  [-input <path>] // Input directory
  [-output <path>] // Output directory
  [-jar <jar file> // jar filename
  [-inputformat <class>] // InputFormat class
  [-map <class>] // Java Map class
  [-partitioner <class>] // Java Partitioner
  [-reduce <class>] // Java Reduce class
  [-writer <class>] // Java RecordWriter
  [-program <executable>] // executable URI
  [-reduces <num>] // number of reduces

Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|jobtracker:port>    specify a job tracker
-files <comma separated list of files>    specify comma separated files to
be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files
to include in the classpath.
-archives <comma separated list of archives>    specify comma separated
archives to be unarchived on the compute machines.

The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
*