You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by "Zhang, Zhang" <zh...@corp.aol.com> on 2010/01/20 01:31:56 UTC

Which config parameters are node-specific?

Where do I find information about which config parameters can be set as per-node property, and which ones apply to all nodes? For example, I have a cluster consisting of two classes of nodes. One class is dual-core 4GB memory nodes, and the other class is 16-core 128GB memory nodes. It certainly makes sense to configure them differently. So the questions is, which parameters I should pay attention to? I vaguely know that probably at least the following ones can be set as node-specific:

    mapred.tasktracker.map.tasks.maximum
    mapred.tasktracker.reduce.tasks.maximum
    

But anything beyond that? How about the following ones, can I set them as node-specific parameters?

    mapred.child.java.opts
    tasktracker.http.threads
    dfs.datanode.handler.count
    io.sort.factor
    io.sort.mb
    mapred.inmem.merge.threshold
    mapred.job.reduce.input.buffer.percent


Thanks!

Zhang


Re: Which config parameters are node-specific?

Posted by Jeff Zhang <zj...@gmail.com>.
I believe all these parameters can be set as node-specific, because they are
in different JVM.
Correct me if I am wrong.



On Wed, Jan 20, 2010 at 8:31 AM, Zhang, Zhang <zh...@corp.aol.com>wrote:

>
> Where do I find information about which config parameters can be set as
> per-node property, and which ones apply to all nodes? For example, I have a
> cluster consisting of two classes of nodes. One class is dual-core 4GB
> memory nodes, and the other class is 16-core 128GB memory nodes. It
> certainly makes sense to configure them differently. So the questions is,
> which parameters I should pay attention to? I vaguely know that probably at
> least the following ones can be set as node-specific:
>
>    mapred.tasktracker.map.tasks.maximum
>    mapred.tasktracker.reduce.tasks.maximum
>
>
> But anything beyond that? How about the following ones, can I set them as
> node-specific parameters?
>
>    mapred.child.java.opts
>    tasktracker.http.threads
>    dfs.datanode.handler.count
>    io.sort.factor
>    io.sort.mb
>    mapred.inmem.merge.threshold
>    mapred.job.reduce.input.buffer.percent
>
>
> Thanks!
>
> Zhang
>
>


-- 
Best Regards

Jeff Zhang

Re: Which config parameters are node-specific?

Posted by Edward Capriolo <ed...@gmail.com>.
This is a tricky problem. To add further confusion some variables are
used in multiple components.

 Mapred.local.dir is used by task and job tracker.hadoop.tmp.dir is
the default for everything.



On 1/20/10, Allen Wittenauer <aw...@linkedin.com> wrote:
>
>
>
> On 1/19/10 7:54 PM, "Amareshwari Sri Ramadasu" <am...@yahoo-inc.com>
> wrote:
>
>> Hi Zhang,
>>
>> The following parameters are node specific.
>>    mapred.tasktracker.map.tasks.maximum
>>    mapred.tasktracker.reduce.tasks.maximum
>>    tasktracker.http.threads
>>    dfs.datanode.handler.count
>>
>> The rest of the parameters are Job-specific.
>
> ... Except for the ones that are namenode and jobtracker specific.
>
> :(
>
> Hadoop configuration sucks greatly, and the lack of real documentation on
> what parameters exist (it seems like every month there is a "new" hidden
> param) and what actually uses them (i.e., where is final actually taking
> into account?) doesn't help.
>
> [ and no, *-default.xml and/or "read the source!" is not good enough.]
>
>

Re: Which config parameters are node-specific?

Posted by Allen Wittenauer <aw...@linkedin.com>.


On 1/19/10 7:54 PM, "Amareshwari Sri Ramadasu" <am...@yahoo-inc.com>
wrote:

> Hi Zhang,
> 
> The following parameters are node specific.
>    mapred.tasktracker.map.tasks.maximum
>    mapred.tasktracker.reduce.tasks.maximum
>    tasktracker.http.threads
>    dfs.datanode.handler.count
> 
> The rest of the parameters are Job-specific.

... Except for the ones that are namenode and jobtracker specific.

:(

Hadoop configuration sucks greatly, and the lack of real documentation on
what parameters exist (it seems like every month there is a "new" hidden
param) and what actually uses them (i.e., where is final actually taking
into account?) doesn't help.

[ and no, *-default.xml and/or "read the source!" is not good enough.]


Re: Which config parameters are node-specific?

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Hi Zhang,

The following parameters are node specific.
   mapred.tasktracker.map.tasks.maximum
   mapred.tasktracker.reduce.tasks.maximum
   tasktracker.http.threads
   dfs.datanode.handler.count

The rest of the parameters are Job-specific.

Thanks
Amareshwari

On 1/20/10 6:01 AM, "Zhang, Zhang" <zh...@corp.aol.com> wrote:



Where do I find information about which config parameters can be set as per-node property, and which ones apply to all nodes? For example, I have a cluster consisting of two classes of nodes. One class is dual-core 4GB memory nodes, and the other class is 16-core 128GB memory nodes. It certainly makes sense to configure them differently. So the questions is, which parameters I should pay attention to? I vaguely know that probably at least the following ones can be set as node-specific:

    mapred.tasktracker.map.tasks.maximum
    mapred.tasktracker.reduce.tasks.maximum


But anything beyond that? How about the following ones, can I set them as node-specific parameters?

    mapred.child.java.opts
    tasktracker.http.threads
    dfs.datanode.handler.count
    io.sort.factor
    io.sort.mb
    mapred.inmem.merge.threshold
    mapred.job.reduce.input.buffer.percent


Thanks!

Zhang