You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Zhang, Zhang" <zh...@corp.aol.com> on 2010/01/20 01:31:56 UTC
Which config parameters are node-specific?
Where do I find information about which config parameters can be set as per-node property, and which ones apply to all nodes? For example, I have a cluster consisting of two classes of nodes. One class is dual-core 4GB memory nodes, and the other class is 16-core 128GB memory nodes. It certainly makes sense to configure them differently. So the questions is, which parameters I should pay attention to? I vaguely know that probably at least the following ones can be set as node-specific:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
But anything beyond that? How about the following ones, can I set them as node-specific parameters?
mapred.child.java.opts
tasktracker.http.threads
dfs.datanode.handler.count
io.sort.factor
io.sort.mb
mapred.inmem.merge.threshold
mapred.job.reduce.input.buffer.percent
Thanks!
Zhang
Re: Which config parameters are node-specific?
Posted by Jeff Zhang <zj...@gmail.com>.
I believe all these parameters can be set as node-specific, because they are
in different JVM.
Correct me if I am wrong.
On Wed, Jan 20, 2010 at 8:31 AM, Zhang, Zhang <zh...@corp.aol.com>wrote:
>
> Where do I find information about which config parameters can be set as
> per-node property, and which ones apply to all nodes? For example, I have a
> cluster consisting of two classes of nodes. One class is dual-core 4GB
> memory nodes, and the other class is 16-core 128GB memory nodes. It
> certainly makes sense to configure them differently. So the questions is,
> which parameters I should pay attention to? I vaguely know that probably at
> least the following ones can be set as node-specific:
>
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
>
>
> But anything beyond that? How about the following ones, can I set them as
> node-specific parameters?
>
> mapred.child.java.opts
> tasktracker.http.threads
> dfs.datanode.handler.count
> io.sort.factor
> io.sort.mb
> mapred.inmem.merge.threshold
> mapred.job.reduce.input.buffer.percent
>
>
> Thanks!
>
> Zhang
>
>
--
Best Regards
Jeff Zhang
Re: Which config parameters are node-specific?
Posted by Edward Capriolo <ed...@gmail.com>.
This is a tricky problem. To add further confusion some variables are
used in multiple components.
Mapred.local.dir is used by task and job tracker.hadoop.tmp.dir is
the default for everything.
On 1/20/10, Allen Wittenauer <aw...@linkedin.com> wrote:
>
>
>
> On 1/19/10 7:54 PM, "Amareshwari Sri Ramadasu" <am...@yahoo-inc.com>
> wrote:
>
>> Hi Zhang,
>>
>> The following parameters are node specific.
>> mapred.tasktracker.map.tasks.maximum
>> mapred.tasktracker.reduce.tasks.maximum
>> tasktracker.http.threads
>> dfs.datanode.handler.count
>>
>> The rest of the parameters are Job-specific.
>
> ... Except for the ones that are namenode and jobtracker specific.
>
> :(
>
> Hadoop configuration sucks greatly, and the lack of real documentation on
> what parameters exist (it seems like every month there is a "new" hidden
> param) and what actually uses them (i.e., where is final actually taking
> into account?) doesn't help.
>
> [ and no, *-default.xml and/or "read the source!" is not good enough.]
>
>
Re: Which config parameters are node-specific?
Posted by Allen Wittenauer <aw...@linkedin.com>.
On 1/19/10 7:54 PM, "Amareshwari Sri Ramadasu" <am...@yahoo-inc.com>
wrote:
> Hi Zhang,
>
> The following parameters are node specific.
> mapred.tasktracker.map.tasks.maximum
> mapred.tasktracker.reduce.tasks.maximum
> tasktracker.http.threads
> dfs.datanode.handler.count
>
> The rest of the parameters are Job-specific.
... Except for the ones that are namenode and jobtracker specific.
:(
Hadoop configuration sucks greatly, and the lack of real documentation on
what parameters exist (it seems like every month there is a "new" hidden
param) and what actually uses them (i.e., where is final actually taking
into account?) doesn't help.
[ and no, *-default.xml and/or "read the source!" is not good enough.]
Re: Which config parameters are node-specific?
Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Hi Zhang,
The following parameters are node specific.
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
tasktracker.http.threads
dfs.datanode.handler.count
The rest of the parameters are Job-specific.
Thanks
Amareshwari
On 1/20/10 6:01 AM, "Zhang, Zhang" <zh...@corp.aol.com> wrote:
Where do I find information about which config parameters can be set as per-node property, and which ones apply to all nodes? For example, I have a cluster consisting of two classes of nodes. One class is dual-core 4GB memory nodes, and the other class is 16-core 128GB memory nodes. It certainly makes sense to configure them differently. So the questions is, which parameters I should pay attention to? I vaguely know that probably at least the following ones can be set as node-specific:
mapred.tasktracker.map.tasks.maximum
mapred.tasktracker.reduce.tasks.maximum
But anything beyond that? How about the following ones, can I set them as node-specific parameters?
mapred.child.java.opts
tasktracker.http.threads
dfs.datanode.handler.count
io.sort.factor
io.sort.mb
mapred.inmem.merge.threshold
mapred.job.reduce.input.buffer.percent
Thanks!
Zhang