You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Varad Meru <me...@gmail.com> on 2012/09/23 09:54:40 UTC

Passing Command-line Parameters to the Job Submit Command

Hi,

I want to run the PiEstimator example from using the following command

$hadoop job -submit pieestimatorconf.xml

which contains all the info required by hadoop to run the job. E.g. the
input file location, the output file location and other details.

<property><name>mapred.jar</name><value>file:////Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar</value></property>
<property><name>mapred.map.tasks</name><value>20</value></property>
<property><name>mapred.reduce.tasks</name><value>2</value></property>
...
<property><name>mapred.job.name</name><value>PiEstimator</value></property>
<property><name>mapred.output.dir</name><value>file:////Users/varadmeru/Work/out</value></property>

Now, as we now, to run the PiEstimator, we can use the following command too

$hadoop jar hadoop-examples.1.0.3 pi 5 10

where 5 and 10 are the arguments to the main class of the PiEstimator. How
can I pass the same arguments (5 and 10) using the job -submit command
through conf. file or any other way, without changing the code of the
examples to reflect the use of environment variables.

Thanks in advance,
Varad

-----------------
Varad Meru
Software Engineer,
Business Intelligence and Analytics,
Persistent Systems and Solutions Ltd.,
Pune, India.

Re: Passing Command-line Parameters to the Job Submit Command

Posted by Varad Meru <me...@gmail.com>.

Thanks Hemanth,

Yes, the java variables passed as -Dkey=value. But for the arguments passed to the main method (i.e. String[] args) I cannot find any other way to pass them apart from hadoop jar CLASSNAME arguments. So if I have a job file, I'll will compulsorily have to use the java variables, and not the command line arguments.

Thanks,
Varad

On 25-Sep-2012, at 12:40 PM, Hemanth Yamijala wrote:

> By java environment variables, do you mean the ones passed as
> -Dkey=value ? That's one way of passing them. I suppose another way is
> to have a client side site configuration (like mapred-site.xml) that
> is in the classpath of the client app.
> 
> Thanks
> Hemanth
> 
> On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru <me...@gmail.com> wrote:
>> Thanks Hemanth,
>> 
>> But in general, if we want to pass arguments to any job (not only
>> PiEstimator from examples-jar) and submit the Job to the Job queue
>> scheduler, by the looks of it, we might always need to use the java
>> environment variables only.
>> 
>> Is my above assumption correct?
>> 
>> Thanks,
>> Varad
>> 
>> On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala <yh...@gmail.com>wrote:
>> 
>>> Varad,
>>> 
>>> Looking at the code for the PiEstimator class which implements the
>>> 'pi' example, the two arguments are mandatory and are used *before*
>>> the job is submitted for execution - i.e on the client side. In
>>> particular, one of them (nSamples) is used not by the MapReduce job,
>>> but by the client code (i.e. PiEstimator) to generate some input.
>>> 
>>> Hence, I believe all of this additional work that is being done by the
>>> PiEstimator class will be bypassed if we directly use the job -submit
>>> command. In other words, I don't think these two ways of running the
>>> job:
>>> 
>>> - using the "hadoop jar examples pi"
>>> - using hadoop job -submit
>>> 
>>> are equivalent.
>>> 
>>> As a general answer to your question though, if additional parameters
>>> are used by the Mappers or reducers, then they will generally be set
>>> as additional job specific configuration items. So, one way of using
>>> them with the job -submit command will be to find out the specific
>>> names of the configuration items (from code, or some other
>>> documentation), and include them in the job.xml used when submitting
>>> the job.
>>> 
>>> Thanks
>>> Hemanth
>>> 
>>> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru <me...@gmail.com> wrote:
>>>> Hi,
>>>> 
>>>> I want to run the PiEstimator example from using the following command
>>>> 
>>>> $hadoop job -submit pieestimatorconf.xml
>>>> 
>>>> which contains all the info required by hadoop to run the job. E.g. the
>>>> input file location, the output file location and other details.
>>>> 
>>>> 
>>> <property><name>mapred.jar</name><value>file:////Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar</value></property>
>>>> <property><name>mapred.map.tasks</name><value>20</value></property>
>>>> <property><name>mapred.reduce.tasks</name><value>2</value></property>
>>>> ...
>>>> <property><name>mapred.job.name
>>> </name><value>PiEstimator</value></property>
>>>> 
>>> <property><name>mapred.output.dir</name><value>file:////Users/varadmeru/Work/out</value></property>
>>>> 
>>>> Now, as we now, to run the PiEstimator, we can use the following command
>>> too
>>>> 
>>>> $hadoop jar hadoop-examples.1.0.3 pi 5 10
>>>> 
>>>> where 5 and 10 are the arguments to the main class of the PiEstimator.
>>> How
>>>> can I pass the same arguments (5 and 10) using the job -submit command
>>>> through conf. file or any other way, without changing the code of the
>>>> examples to reflect the use of environment variables.
>>>> 
>>>> Thanks in advance,
>>>> Varad
>>>> 
>>>> -----------------
>>>> Varad Meru
>>>> Software Engineer,
>>>> Business Intelligence and Analytics,
>>>> Persistent Systems and Solutions Ltd.,
>>>> Pune, India.
>>>

Re: Passing Command-line Parameters to the Job Submit Command

Posted by Mohit Anchlia <mo...@gmail.com>.

You could always write your own properties file and read it as resource.

On Tue, Sep 25, 2012 at 12:10 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> By java environment variables, do you mean the ones passed as
> -Dkey=value ? That's one way of passing them. I suppose another way is
> to have a client side site configuration (like mapred-site.xml) that
> is in the classpath of the client app.
>
> Thanks
> Hemanth
>
> On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru <me...@gmail.com> wrote:
> > Thanks Hemanth,
> >
> > But in general, if we want to pass arguments to any job (not only
> > PiEstimator from examples-jar) and submit the Job to the Job queue
> > scheduler, by the looks of it, we might always need to use the java
> > environment variables only.
> >
> > Is my above assumption correct?
> >
> > Thanks,
> > Varad
> >
> > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala <yhemanth@gmail.com
> >wrote:
> >
> >> Varad,
> >>
> >> Looking at the code for the PiEstimator class which implements the
> >> 'pi' example, the two arguments are mandatory and are used *before*
> >> the job is submitted for execution - i.e on the client side. In
> >> particular, one of them (nSamples) is used not by the MapReduce job,
> >> but by the client code (i.e. PiEstimator) to generate some input.
> >>
> >> Hence, I believe all of this additional work that is being done by the
> >> PiEstimator class will be bypassed if we directly use the job -submit
> >> command. In other words, I don't think these two ways of running the
> >> job:
> >>
> >> - using the "hadoop jar examples pi"
> >> - using hadoop job -submit
> >>
> >> are equivalent.
> >>
> >> As a general answer to your question though, if additional parameters
> >> are used by the Mappers or reducers, then they will generally be set
> >> as additional job specific configuration items. So, one way of using
> >> them with the job -submit command will be to find out the specific
> >> names of the configuration items (from code, or some other
> >> documentation), and include them in the job.xml used when submitting
> >> the job.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru <me...@gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > I want to run the PiEstimator example from using the following command
> >> >
> >> > $hadoop job -submit pieestimatorconf.xml
> >> >
> >> > which contains all the info required by hadoop to run the job. E.g.
> the
> >> > input file location, the output file location and other details.
> >> >
> >> >
> >>
> <property><name>mapred.jar</name><value>file:////Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar</value></property>
> >> > <property><name>mapred.map.tasks</name><value>20</value></property>
> >> > <property><name>mapred.reduce.tasks</name><value>2</value></property>
> >> > ...
> >> > <property><name>mapred.job.name
> >> </name><value>PiEstimator</value></property>
> >> >
> >>
> <property><name>mapred.output.dir</name><value>file:////Users/varadmeru/Work/out</value></property>
> >> >
> >> > Now, as we now, to run the PiEstimator, we can use the following
> command
> >> too
> >> >
> >> > $hadoop jar hadoop-examples.1.0.3 pi 5 10
> >> >
> >> > where 5 and 10 are the arguments to the main class of the PiEstimator.
> >> How
> >> > can I pass the same arguments (5 and 10) using the job -submit command
> >> > through conf. file or any other way, without changing the code of the
> >> > examples to reflect the use of environment variables.
> >> >
> >> > Thanks in advance,
> >> > Varad
> >> >
> >> > -----------------
> >> > Varad Meru
> >> > Software Engineer,
> >> > Business Intelligence and Analytics,
> >> > Persistent Systems and Solutions Ltd.,
> >> > Pune, India.
> >>
>

Re: Passing Command-line Parameters to the Job Submit Command

Posted by Bertrand Dechoux <de...@gmail.com>.

Building on Hemanth answer : at the end your variables should be in the
job.xml (the second file needed with the jar to run a job). Building this
job.xml can be done in various way but it does inherit from your local
configuration and you can change it using the java API but at the end it is
only a xml file so you are not hand tied.

I know there is a job file that you can provide with the shell command :
http://hadoop.apache.org/docs/r1.0.3/commands_manual.html#job

But I haven't used it yet so I can tell you more about this option.

Regards

Bertrand

On Tue, Sep 25, 2012 at 9:10 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> By java environment variables, do you mean the ones passed as
> -Dkey=value ? That's one way of passing them. I suppose another way is
> to have a client side site configuration (like mapred-site.xml) that
> is in the classpath of the client app.
>
> Thanks
> Hemanth
>
> On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru <me...@gmail.com> wrote:
> > Thanks Hemanth,
> >
> > But in general, if we want to pass arguments to any job (not only
> > PiEstimator from examples-jar) and submit the Job to the Job queue
> > scheduler, by the looks of it, we might always need to use the java
> > environment variables only.
> >
> > Is my above assumption correct?
> >
> > Thanks,
> > Varad
> >
> > On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala <yhemanth@gmail.com
> >wrote:
> >
> >> Varad,
> >>
> >> Looking at the code for the PiEstimator class which implements the
> >> 'pi' example, the two arguments are mandatory and are used *before*
> >> the job is submitted for execution - i.e on the client side. In
> >> particular, one of them (nSamples) is used not by the MapReduce job,
> >> but by the client code (i.e. PiEstimator) to generate some input.
> >>
> >> Hence, I believe all of this additional work that is being done by the
> >> PiEstimator class will be bypassed if we directly use the job -submit
> >> command. In other words, I don't think these two ways of running the
> >> job:
> >>
> >> - using the "hadoop jar examples pi"
> >> - using hadoop job -submit
> >>
> >> are equivalent.
> >>
> >> As a general answer to your question though, if additional parameters
> >> are used by the Mappers or reducers, then they will generally be set
> >> as additional job specific configuration items. So, one way of using
> >> them with the job -submit command will be to find out the specific
> >> names of the configuration items (from code, or some other
> >> documentation), and include them in the job.xml used when submitting
> >> the job.
> >>
> >> Thanks
> >> Hemanth
> >>
> >> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru <me...@gmail.com>
> wrote:
> >> > Hi,
> >> >
> >> > I want to run the PiEstimator example from using the following command
> >> >
> >> > $hadoop job -submit pieestimatorconf.xml
> >> >
> >> > which contains all the info required by hadoop to run the job. E.g.
> the
> >> > input file location, the output file location and other details.
> >> >
> >> >
> >>
> <property><name>mapred.jar</name><value>file:////Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar</value></property>
> >> > <property><name>mapred.map.tasks</name><value>20</value></property>
> >> > <property><name>mapred.reduce.tasks</name><value>2</value></property>
> >> > ...
> >> > <property><name>mapred.job.name
> >> </name><value>PiEstimator</value></property>
> >> >
> >>
> <property><name>mapred.output.dir</name><value>file:////Users/varadmeru/Work/out</value></property>
> >> >
> >> > Now, as we now, to run the PiEstimator, we can use the following
> command
> >> too
> >> >
> >> > $hadoop jar hadoop-examples.1.0.3 pi 5 10
> >> >
> >> > where 5 and 10 are the arguments to the main class of the PiEstimator.
> >> How
> >> > can I pass the same arguments (5 and 10) using the job -submit command
> >> > through conf. file or any other way, without changing the code of the
> >> > examples to reflect the use of environment variables.
> >> >
> >> > Thanks in advance,
> >> > Varad
> >> >
> >> > -----------------
> >> > Varad Meru
> >> > Software Engineer,
> >> > Business Intelligence and Analytics,
> >> > Persistent Systems and Solutions Ltd.,
> >> > Pune, India.
> >>
>



-- 
Bertrand Dechoux

Re: Passing Command-line Parameters to the Job Submit Command

Posted by Hemanth Yamijala <yh...@gmail.com>.

By java environment variables, do you mean the ones passed as
-Dkey=value ? That's one way of passing them. I suppose another way is
to have a client side site configuration (like mapred-site.xml) that
is in the classpath of the client app.

Thanks
Hemanth

On Tue, Sep 25, 2012 at 12:20 AM, Varad Meru <me...@gmail.com> wrote:
> Thanks Hemanth,
>
> But in general, if we want to pass arguments to any job (not only
> PiEstimator from examples-jar) and submit the Job to the Job queue
> scheduler, by the looks of it, we might always need to use the java
> environment variables only.
>
> Is my above assumption correct?
>
> Thanks,
> Varad
>
> On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala <yh...@gmail.com>wrote:
>
>> Varad,
>>
>> Looking at the code for the PiEstimator class which implements the
>> 'pi' example, the two arguments are mandatory and are used *before*
>> the job is submitted for execution - i.e on the client side. In
>> particular, one of them (nSamples) is used not by the MapReduce job,
>> but by the client code (i.e. PiEstimator) to generate some input.
>>
>> Hence, I believe all of this additional work that is being done by the
>> PiEstimator class will be bypassed if we directly use the job -submit
>> command. In other words, I don't think these two ways of running the
>> job:
>>
>> - using the "hadoop jar examples pi"
>> - using hadoop job -submit
>>
>> are equivalent.
>>
>> As a general answer to your question though, if additional parameters
>> are used by the Mappers or reducers, then they will generally be set
>> as additional job specific configuration items. So, one way of using
>> them with the job -submit command will be to find out the specific
>> names of the configuration items (from code, or some other
>> documentation), and include them in the job.xml used when submitting
>> the job.
>>
>> Thanks
>> Hemanth
>>
>> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru <me...@gmail.com> wrote:
>> > Hi,
>> >
>> > I want to run the PiEstimator example from using the following command
>> >
>> > $hadoop job -submit pieestimatorconf.xml
>> >
>> > which contains all the info required by hadoop to run the job. E.g. the
>> > input file location, the output file location and other details.
>> >
>> >
>> <property><name>mapred.jar</name><value>file:////Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar</value></property>
>> > <property><name>mapred.map.tasks</name><value>20</value></property>
>> > <property><name>mapred.reduce.tasks</name><value>2</value></property>
>> > ...
>> > <property><name>mapred.job.name
>> </name><value>PiEstimator</value></property>
>> >
>> <property><name>mapred.output.dir</name><value>file:////Users/varadmeru/Work/out</value></property>
>> >
>> > Now, as we now, to run the PiEstimator, we can use the following command
>> too
>> >
>> > $hadoop jar hadoop-examples.1.0.3 pi 5 10
>> >
>> > where 5 and 10 are the arguments to the main class of the PiEstimator.
>> How
>> > can I pass the same arguments (5 and 10) using the job -submit command
>> > through conf. file or any other way, without changing the code of the
>> > examples to reflect the use of environment variables.
>> >
>> > Thanks in advance,
>> > Varad
>> >
>> > -----------------
>> > Varad Meru
>> > Software Engineer,
>> > Business Intelligence and Analytics,
>> > Persistent Systems and Solutions Ltd.,
>> > Pune, India.
>>

Re: Passing Command-line Parameters to the Job Submit Command

Posted by Varad Meru <me...@gmail.com>.

Thanks Hemanth,

But in general, if we want to pass arguments to any job (not only
PiEstimator from examples-jar) and submit the Job to the Job queue
scheduler, by the looks of it, we might always need to use the java
environment variables only.

Is my above assumption correct?

Thanks,
Varad

On Mon, Sep 24, 2012 at 9:48 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Varad,
>
> Looking at the code for the PiEstimator class which implements the
> 'pi' example, the two arguments are mandatory and are used *before*
> the job is submitted for execution - i.e on the client side. In
> particular, one of them (nSamples) is used not by the MapReduce job,
> but by the client code (i.e. PiEstimator) to generate some input.
>
> Hence, I believe all of this additional work that is being done by the
> PiEstimator class will be bypassed if we directly use the job -submit
> command. In other words, I don't think these two ways of running the
> job:
>
> - using the "hadoop jar examples pi"
> - using hadoop job -submit
>
> are equivalent.
>
> As a general answer to your question though, if additional parameters
> are used by the Mappers or reducers, then they will generally be set
> as additional job specific configuration items. So, one way of using
> them with the job -submit command will be to find out the specific
> names of the configuration items (from code, or some other
> documentation), and include them in the job.xml used when submitting
> the job.
>
> Thanks
> Hemanth
>
> On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru <me...@gmail.com> wrote:
> > Hi,
> >
> > I want to run the PiEstimator example from using the following command
> >
> > $hadoop job -submit pieestimatorconf.xml
> >
> > which contains all the info required by hadoop to run the job. E.g. the
> > input file location, the output file location and other details.
> >
> >
> <property><name>mapred.jar</name><value>file:////Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar</value></property>
> > <property><name>mapred.map.tasks</name><value>20</value></property>
> > <property><name>mapred.reduce.tasks</name><value>2</value></property>
> > ...
> > <property><name>mapred.job.name
> </name><value>PiEstimator</value></property>
> >
> <property><name>mapred.output.dir</name><value>file:////Users/varadmeru/Work/out</value></property>
> >
> > Now, as we now, to run the PiEstimator, we can use the following command
> too
> >
> > $hadoop jar hadoop-examples.1.0.3 pi 5 10
> >
> > where 5 and 10 are the arguments to the main class of the PiEstimator.
> How
> > can I pass the same arguments (5 and 10) using the job -submit command
> > through conf. file or any other way, without changing the code of the
> > examples to reflect the use of environment variables.
> >
> > Thanks in advance,
> > Varad
> >
> > -----------------
> > Varad Meru
> > Software Engineer,
> > Business Intelligence and Analytics,
> > Persistent Systems and Solutions Ltd.,
> > Pune, India.
>

Re: Passing Command-line Parameters to the Job Submit Command

Posted by Hemanth Yamijala <yh...@gmail.com>.

Varad,

Looking at the code for the PiEstimator class which implements the
'pi' example, the two arguments are mandatory and are used *before*
the job is submitted for execution - i.e on the client side. In
particular, one of them (nSamples) is used not by the MapReduce job,
but by the client code (i.e. PiEstimator) to generate some input.

Hence, I believe all of this additional work that is being done by the
PiEstimator class will be bypassed if we directly use the job -submit
command. In other words, I don't think these two ways of running the
job:

- using the "hadoop jar examples pi"
- using hadoop job -submit

are equivalent.

As a general answer to your question though, if additional parameters
are used by the Mappers or reducers, then they will generally be set
as additional job specific configuration items. So, one way of using
them with the job -submit command will be to find out the specific
names of the configuration items (from code, or some other
documentation), and include them in the job.xml used when submitting
the job.

Thanks
Hemanth

On Sun, Sep 23, 2012 at 1:24 PM, Varad Meru <me...@gmail.com> wrote:
> Hi,
>
> I want to run the PiEstimator example from using the following command
>
> $hadoop job -submit pieestimatorconf.xml
>
> which contains all the info required by hadoop to run the job. E.g. the
> input file location, the output file location and other details.
>
> <property><name>mapred.jar</name><value>file:////Users/varadmeru/Work/Hadoop/hadoop-examples-1.0.3.jar</value></property>
> <property><name>mapred.map.tasks</name><value>20</value></property>
> <property><name>mapred.reduce.tasks</name><value>2</value></property>
> ...
> <property><name>mapred.job.name</name><value>PiEstimator</value></property>
> <property><name>mapred.output.dir</name><value>file:////Users/varadmeru/Work/out</value></property>
>
> Now, as we now, to run the PiEstimator, we can use the following command too
>
> $hadoop jar hadoop-examples.1.0.3 pi 5 10
>
> where 5 and 10 are the arguments to the main class of the PiEstimator. How
> can I pass the same arguments (5 and 10) using the job -submit command
> through conf. file or any other way, without changing the code of the
> examples to reflect the use of environment variables.
>
> Thanks in advance,
> Varad
>
> -----------------
> Varad Meru
> Software Engineer,
> Business Intelligence and Analytics,
> Persistent Systems and Solutions Ltd.,
> Pune, India.