You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "W.P. McNeill" <bi...@gmail.com> on 2011/01/26 19:53:04 UTC

Is there a standard command line convention for input and output directories

I like to pass positional arguments to Hadoop processes where all but the
last argument is an input directory and the last argument is an output
directory.  It seems like there are a couple ways to integrate this with
specifying these directories with -D mapred.input.dir and -D
mapred.output.dir.  Is there an accepted standard way to specify Hadoop
input and output directories?

Re: Is there a standard command line convention for input and output directories

Posted by Harsh J <qw...@gmail.com>.
Hadoop Streaming has a -input and -output option for specifying input
and output directory or file patterns but I believe
GenericOptionsParser (which is used via the ToolRunner, for plain Java
MapReduce programs), so far, does not support those two options with
names.

Instead, you will have to specify them with the -D way and ensure that
the property you're setting for it is the right one for the release
you're on. [mapred.input.dir is deprecated in favor of
mapreduce.input.fileinputformat.inputdir in the future versions, for
example].

Or you could write an additional opts parser and let it handle such
arguments (-input/-output, like streaming has in it) after the
ToolRunner is done parsing its accepted ones.

On Thu, Jan 27, 2011 at 12:23 AM, W.P. McNeill <bi...@gmail.com> wrote:
> I like to pass positional arguments to Hadoop processes where all but the
> last argument is an input directory and the last argument is an output
> directory.  It seems like there are a couple ways to integrate this with
> specifying these directories with -D mapred.input.dir and -D
> mapred.output.dir.  Is there an accepted standard way to specify Hadoop
> input and output directories?
>



-- 
Harsh J
www.harshj.com