You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by pa...@gmail.com on 2009/03/08 07:56:55 UTC

Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter for  
whole Job
or they matter for each node independently?
For example, if one of them (or both) contains:
<property>
   <name>mapred.map.tasks</name>
   <value>6</value>
</property>

then is it means that six mappers will be executed on all nodes or 6 on  
each node?

Thans.

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by Owen O'Malley <om...@apache.org>.

On Mar 9, 2009, at 8:10 AM, Nick Cen wrote:

> A clear naming convention will make it more easy to configure. But i  
> think
> besides the system and job level , i think there are also some  
> parameters
> take effect in node level like mapred.tasktracker.map.tasks.maximum,  
> as far
> as i can remember, we can set this differently for different node.

There are only a few that are actually pushed around by the system.  
The system directory and the heartbeat interval are the only ones that  
readily come to mind. For the most part, the other ones that act like  
that are only used by the job tracker and therefore only need to be  
present on the job tracker.

And probably a better structure would be:

mapred.job.* -- job specific
mapred.system.* -- used by both master and slaves
mapred.master.* -- used by the JobTracker
mapred.slave.*  -- used by the TaskTrackers

-- Owen

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by Nick Cen <ce...@gmail.com>.

A clear naming convention will make it more easy to configure. But i think
besides the system and job level , i think there are also some parameters
take effect in node level like mapred.tasktracker.map.tasks.maximum, as far
as i can remember, we can set this differently for different node.

2009/3/9 Owen O'Malley <om...@apache.org>

> On Mar 7, 2009, at 10:56 PM, pavelkolodin@gmail.com wrote:
>
>
>> Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter for
>> whole Job
>> or they matter for each node independently?
>>
>
> Please never modify hadoop-default. That is for the system defaults. Please
> use hadoop-site for your configuration.
>
> It depends on the property whether they come from the job's configuration
> or the system's. Some  like io.sort.mb and mapred.map.tasks come from the
> job, while others like mapred.tasktracker.map.tasks.maximum come from the
> system. The job parameters come from the submitting client, while the system
> parameters need to be distributed to each worker node.
>
> -- Owen
>
>  For example, if one of them (or both) contains:
>> <property>
>>  <name>mapred.map.tasks</name>
>>  <value>6</value>
>> </property>
>>
>> then is it means that six mappers will be executed on all nodes or 6 on
>> each node?
>>
>
> That means that your job will default to 6 maps.
> mapred.tasktracker.map.tasks.maximum specifies the number of maps running on
> each node.
>
> And yes, we really should do a cleanup of the property names to do
> something like:
>
> mapred.job.*
> mapred.system.*
>
> to separate the job from the system parameters.
>
> -- Owen
>



-- 
http://daily.appspot.com/food/

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by Doug Cutting <cu...@apache.org>.

Owen O'Malley wrote:
> It depends on the property whether they come from the job's 
> configuration or the system's. Some  like io.sort.mb and 
> mapred.map.tasks come from the job, while others like 
> mapred.tasktracker.map.tasks.maximum come from the system.

There is some method to the madness.

Things that are only set programmatically, like most job parameters, 
e.g, the mapper, reducer, etc, are not listed in hadoop-default.xml, 
since they don't make sense to configure cluster-wide.

Defaults are overidden by hadoop-site.xml, but a job can then override 
hadoop-site.xml unless hadoop-site.xml declares it to be final, in which 
case any value specified in a job is ignored.

There are a few odd cases of things that jobs might want to override but 
they cannot.  For example, a job might wish to override 
mapred.tasktracker.map.tasks.maximum, but, if you think a bit more, this 
is read by the tasktracker at startup and cannot be reasonably changed 
per job, since a tasktracker can run tasks from different jobs 
simultaneously.

So things that make sense per-job and are not declared final in your 
hadoop-site.xml can generally be overridden by the job.

Doug

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by Owen O'Malley <om...@apache.org>.

On Mar 7, 2009, at 10:56 PM, pavelkolodin@gmail.com wrote:

>
> Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter  
> for whole Job
> or they matter for each node independently?

Please never modify hadoop-default. That is for the system defaults.  
Please use hadoop-site for your configuration.

It depends on the property whether they come from the job's  
configuration or the system's. Some  like io.sort.mb and  
mapred.map.tasks come from the job, while others like  
mapred.tasktracker.map.tasks.maximum come from the system. The job  
parameters come from the submitting client, while the system  
parameters need to be distributed to each worker node.

-- Owen

> For example, if one of them (or both) contains:
> <property>
>  <name>mapred.map.tasks</name>
>  <value>6</value>
> </property>
>
> then is it means that six mappers will be executed on all nodes or 6  
> on each node?

That means that your job will default to 6 maps.  
mapred.tasktracker.map.tasks.maximum specifies the number of maps  
running on each node.

And yes, we really should do a cleanup of the property names to do  
something like:

mapred.job.*
mapred.system.*

to separate the job from the system parameters.

-- Owen

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by Rasit OZDAS <ra...@gmail.com>.

Some parameters are global (I can't give an example now),
they are cluster-wide even if they're defined in hadoop-site.xml

Rasit

2009/3/9 Nick Cen <ce...@gmail.com>

> for Q1: i think so , but i think it is a good practice to keep the
> hadoop-default.xml untouched.
> for Q2: i use this property for debugging in eclipse.
>
>
>
> 2009/3/9 <pa...@gmail.com>
>
> >
> >
> >  The hadoop-site.xml will take effect only on that specified node. So
> each
> >> node can have its own configuration with hadoop-site.xml.
> >>
> >>
> > As i understand, parameters in "hadoop-site" overwrites these ones in
> > "hadoop-default".
> > So "hadoop-default" also individual for each node?
> >
> > Q2: what means "local" as value of "mapred.job.tracker"?
> >
> > thanks
> >
>
>
>
> --
> http://daily.appspot.com/food/
>



-- 
M. Raşit ÖZDAŞ

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by Nick Cen <ce...@gmail.com>.

for Q1: i think so , but i think it is a good practice to keep the
hadoop-default.xml untouched.
for Q2: i use this property for debugging in eclipse.



2009/3/9 <pa...@gmail.com>

>
>
>  The hadoop-site.xml will take effect only on that specified node. So each
>> node can have its own configuration with hadoop-site.xml.
>>
>>
> As i understand, parameters in "hadoop-site" overwrites these ones in
> "hadoop-default".
> So "hadoop-default" also individual for each node?
>
> Q2: what means "local" as value of "mapred.job.tracker"?
>
> thanks
>



-- 
http://daily.appspot.com/food/

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by pa...@gmail.com.


> The hadoop-site.xml will take effect only on that specified node. So each
> node can have its own configuration with hadoop-site.xml.
>

As i understand, parameters in "hadoop-site" overwrites these ones in  
"hadoop-default".
So "hadoop-default" also individual for each node?

Q2: what means "local" as value of "mapred.job.tracker"?

thanks

Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?

Posted by Nick Cen <ce...@gmail.com>.

The hadoop-site.xml will take effect only on that specified node. So each
node can have its own configuration with hadoop-site.xml.

2009/3/8 <pa...@gmail.com>

>
> Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter for
> whole Job
> or they matter for each node independently?
> For example, if one of them (or both) contains:
> <property>
>  <name>mapred.map.tasks</name>
>  <value>6</value>
> </property>
>
> then is it means that six mappers will be executed on all nodes or 6 on
> each node?
>
> Thans.
>



-- 
http://daily.appspot.com/food/