You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by pa...@gmail.com on 2009/03/08 07:56:55 UTC
Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole
cluster or each node?
Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter for
whole Job
or they matter for each node independently?
For example, if one of them (or both) contains:
<property>
<name>mapred.map.tasks</name>
<value>6</value>
</property>
then is it means that six mappers will be executed on all nodes or 6 on
each node?
Thans.
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?
Posted by Owen O'Malley <om...@apache.org>.
On Mar 9, 2009, at 8:10 AM, Nick Cen wrote:
> A clear naming convention will make it more easy to configure. But i
> think
> besides the system and job level , i think there are also some
> parameters
> take effect in node level like mapred.tasktracker.map.tasks.maximum,
> as far
> as i can remember, we can set this differently for different node.
There are only a few that are actually pushed around by the system.
The system directory and the heartbeat interval are the only ones that
readily come to mind. For the most part, the other ones that act like
that are only used by the job tracker and therefore only need to be
present on the job tracker.
And probably a better structure would be:
mapred.job.* -- job specific
mapred.system.* -- used by both master and slaves
mapred.master.* -- used by the JobTracker
mapred.slave.* -- used by the TaskTrackers
-- Owen
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole
cluster or each node?
Posted by Nick Cen <ce...@gmail.com>.
A clear naming convention will make it more easy to configure. But i think
besides the system and job level , i think there are also some parameters
take effect in node level like mapred.tasktracker.map.tasks.maximum, as far
as i can remember, we can set this differently for different node.
2009/3/9 Owen O'Malley <om...@apache.org>
> On Mar 7, 2009, at 10:56 PM, pavelkolodin@gmail.com wrote:
>
>
>> Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter for
>> whole Job
>> or they matter for each node independently?
>>
>
> Please never modify hadoop-default. That is for the system defaults. Please
> use hadoop-site for your configuration.
>
> It depends on the property whether they come from the job's configuration
> or the system's. Some like io.sort.mb and mapred.map.tasks come from the
> job, while others like mapred.tasktracker.map.tasks.maximum come from the
> system. The job parameters come from the submitting client, while the system
> parameters need to be distributed to each worker node.
>
> -- Owen
>
> For example, if one of them (or both) contains:
>> <property>
>> <name>mapred.map.tasks</name>
>> <value>6</value>
>> </property>
>>
>> then is it means that six mappers will be executed on all nodes or 6 on
>> each node?
>>
>
> That means that your job will default to 6 maps.
> mapred.tasktracker.map.tasks.maximum specifies the number of maps running on
> each node.
>
> And yes, we really should do a cleanup of the property names to do
> something like:
>
> mapred.job.*
> mapred.system.*
>
> to separate the job from the system parameters.
>
> -- Owen
>
--
http://daily.appspot.com/food/
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole
cluster or each node?
Posted by Doug Cutting <cu...@apache.org>.
Owen O'Malley wrote:
> It depends on the property whether they come from the job's
> configuration or the system's. Some like io.sort.mb and
> mapred.map.tasks come from the job, while others like
> mapred.tasktracker.map.tasks.maximum come from the system.
There is some method to the madness.
Things that are only set programmatically, like most job parameters,
e.g, the mapper, reducer, etc, are not listed in hadoop-default.xml,
since they don't make sense to configure cluster-wide.
Defaults are overidden by hadoop-site.xml, but a job can then override
hadoop-site.xml unless hadoop-site.xml declares it to be final, in which
case any value specified in a job is ignored.
There are a few odd cases of things that jobs might want to override but
they cannot. For example, a job might wish to override
mapred.tasktracker.map.tasks.maximum, but, if you think a bit more, this
is read by the tasktracker at startup and cannot be reasonably changed
per job, since a tasktracker can run tasks from different jobs
simultaneously.
So things that make sense per-job and are not declared final in your
hadoop-site.xml can generally be overridden by the job.
Doug
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole cluster or each node?
Posted by Owen O'Malley <om...@apache.org>.
On Mar 7, 2009, at 10:56 PM, pavelkolodin@gmail.com wrote:
>
> Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter
> for whole Job
> or they matter for each node independently?
Please never modify hadoop-default. That is for the system defaults.
Please use hadoop-site for your configuration.
It depends on the property whether they come from the job's
configuration or the system's. Some like io.sort.mb and
mapred.map.tasks come from the job, while others like
mapred.tasktracker.map.tasks.maximum come from the system. The job
parameters come from the submitting client, while the system
parameters need to be distributed to each worker node.
-- Owen
> For example, if one of them (or both) contains:
> <property>
> <name>mapred.map.tasks</name>
> <value>6</value>
> </property>
>
> then is it means that six mappers will be executed on all nodes or 6
> on each node?
That means that your job will default to 6 maps.
mapred.tasktracker.map.tasks.maximum specifies the number of maps
running on each node.
And yes, we really should do a cleanup of the property names to do
something like:
mapred.job.*
mapred.system.*
to separate the job from the system parameters.
-- Owen
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole
cluster or each node?
Posted by Rasit OZDAS <ra...@gmail.com>.
Some parameters are global (I can't give an example now),
they are cluster-wide even if they're defined in hadoop-site.xml
Rasit
2009/3/9 Nick Cen <ce...@gmail.com>
> for Q1: i think so , but i think it is a good practice to keep the
> hadoop-default.xml untouched.
> for Q2: i use this property for debugging in eclipse.
>
>
>
> 2009/3/9 <pa...@gmail.com>
>
> >
> >
> > The hadoop-site.xml will take effect only on that specified node. So
> each
> >> node can have its own configuration with hadoop-site.xml.
> >>
> >>
> > As i understand, parameters in "hadoop-site" overwrites these ones in
> > "hadoop-default".
> > So "hadoop-default" also individual for each node?
> >
> > Q2: what means "local" as value of "mapred.job.tracker"?
> >
> > thanks
> >
>
>
>
> --
> http://daily.appspot.com/food/
>
--
M. Raşit ÖZDAŞ
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole
cluster or each node?
Posted by Nick Cen <ce...@gmail.com>.
for Q1: i think so , but i think it is a good practice to keep the
hadoop-default.xml untouched.
for Q2: i use this property for debugging in eclipse.
2009/3/9 <pa...@gmail.com>
>
>
> The hadoop-site.xml will take effect only on that specified node. So each
>> node can have its own configuration with hadoop-site.xml.
>>
>>
> As i understand, parameters in "hadoop-site" overwrites these ones in
> "hadoop-default".
> So "hadoop-default" also individual for each node?
>
> Q2: what means "local" as value of "mapred.job.tracker"?
>
> thanks
>
--
http://daily.appspot.com/food/
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole
cluster or each node?
Posted by pa...@gmail.com.
> The hadoop-site.xml will take effect only on that specified node. So each
> node can have its own configuration with hadoop-site.xml.
>
As i understand, parameters in "hadoop-site" overwrites these ones in
"hadoop-default".
So "hadoop-default" also individual for each node?
Q2: what means "local" as value of "mapred.job.tracker"?
thanks
Re: Does "hadoop-default.xml" + "hadoop-site.xml" matter for whole
cluster or each node?
Posted by Nick Cen <ce...@gmail.com>.
The hadoop-site.xml will take effect only on that specified node. So each
node can have its own configuration with hadoop-site.xml.
2009/3/8 <pa...@gmail.com>
>
> Does "hadoop-default.xml" + "hadoop-site.xml" of master host matter for
> whole Job
> or they matter for each node independently?
> For example, if one of them (or both) contains:
> <property>
> <name>mapred.map.tasks</name>
> <value>6</value>
> </property>
>
> then is it means that six mappers will be executed on all nodes or 6 on
> each node?
>
> Thans.
>
--
http://daily.appspot.com/food/