You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Devajyoti Sarkar <ds...@q-kk.com> on 2010/07/29 10:55:52 UTC

Parameters that can be set per job

Hi,

Is there a list of configuration parameters that can be set per job.
Specifically, can one set:

- mapred.tasktracker.map.tasks.maximum
- mapred.tasktracker.reduce.tasks.maximum
- mapred.map.multithreadedrunner.threads
- mapred.child.java.opts
- mapred.task.timeout

Also, I am trying to migrate from 0.18.3 to 0.20.2. In the new API JobConf
is deprecated. How does one set the per job configuration parameters that
were available in JobConf (e.g. mapred.map.max.attempts, etc.).

I guess there must be documentation on this but I could not find it. I
appreciate any advice you may have.

Cheers,
Dev

Re: Parameters that can be set per job

Posted by Harsh J <qw...@gmail.com>.
On Thu, Jul 29, 2010 at 2:25 PM, Devajyoti Sarkar <ds...@q-kk.com> wrote:
> Hi,
>
> Is there a list of configuration parameters that can be set per job.
> Specifically, can one set:
>
> - mapred.tasktracker.map.tasks.maximum
> - mapred.tasktracker.reduce.tasks.maximum
> - mapred.map.multithreadedrunner.threads
> - mapred.child.java.opts
> - mapred.task.timeout
I'm not sure if the tasktracker options can be set from a job-level. I
might be wrong, cause I've not tried this.
>
> Also, I am trying to migrate from 0.18.3 to 0.20.2. In the new API JobConf
> is deprecated. How does one set the per job configuration parameters that
> were available in JobConf (e.g. mapred.map.max.attempts, etc.).
You would use the 'Configuration' class directly (before a Job object
is made from it). Or use Job.getConfiguration() to access it from your
'Job' object.
>
> I guess there must be documentation on this but I could not find it. I
> appreciate any advice you may have.

These things are also covered across the API, and many many wikis and
blogs from the community.

Here's a good, light article for porting driver/map/reduce to new API:
http://sonerbalkir.blogspot.com/2010/01/new-hadoop-api-020x.html

>
> Cheers,
> Dev
>



-- 
Harsh J
www.harshj.com

Re: Parameters that can be set per job

Posted by Devajyoti Sarkar <ds...@q-kk.com>.
Thanks a lot!

On Fri, Jul 30, 2010 at 9:58 AM, Hemanth Yamijala <yh...@gmail.com>wrote:

> Hi,
>
> > Is there a list of configuration parameters that can be set per job.
>
> I'm almost certain there's no list that documents per-job settable
> parameters that well. From 0.21 onwards, I think a convention adopted
> is to name all job-related or task-related parameters to include 'job'
> or 'map' or 'reduce' or 'task' in the name somewhere. These can be set
> per job. The best option is to go over the documentation of any
> parameters you are interested in, in the *-default.xml files.
>
> > Specifically, can one set:
> >
> > - mapred.tasktracker.map.tasks.maximum
> > - mapred.tasktracker.reduce.tasks.maximum
>
> No, these are tasktracker specific parameters (as is indicated in the
> name also). They cannot be set per job.
>
> > - mapred.map.multithreadedrunner.threads
> > - mapred.child.java.opts
> > - mapred.task.timeout
>
> These can be set (again, the naming convention is helpful)
>
> > Also, I am trying to migrate from 0.18.3 to 0.20.2. In the new API
> JobConf
> > is deprecated. How does one set the per job configuration parameters that
> > were available in JobConf (e.g. mapred.map.max.attempts, etc.).
> >
> > I guess there must be documentation on this but I could not find it. I
> > appreciate any advice you may have.
> >
> > Cheers,
> > Dev
> >
>

Re: Parameters that can be set per job

Posted by Hemanth Yamijala <yh...@gmail.com>.
Hi,

> Is there a list of configuration parameters that can be set per job.

I'm almost certain there's no list that documents per-job settable
parameters that well. From 0.21 onwards, I think a convention adopted
is to name all job-related or task-related parameters to include 'job'
or 'map' or 'reduce' or 'task' in the name somewhere. These can be set
per job. The best option is to go over the documentation of any
parameters you are interested in, in the *-default.xml files.

> Specifically, can one set:
>
> - mapred.tasktracker.map.tasks.maximum
> - mapred.tasktracker.reduce.tasks.maximum

No, these are tasktracker specific parameters (as is indicated in the
name also). They cannot be set per job.

> - mapred.map.multithreadedrunner.threads
> - mapred.child.java.opts
> - mapred.task.timeout

These can be set (again, the naming convention is helpful)

> Also, I am trying to migrate from 0.18.3 to 0.20.2. In the new API JobConf
> is deprecated. How does one set the per job configuration parameters that
> were available in JobConf (e.g. mapred.map.max.attempts, etc.).
>
> I guess there must be documentation on this but I could not find it. I
> appreciate any advice you may have.
>
> Cheers,
> Dev
>