You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Wei Jiang <ha...@gmail.com> on 2008/07/15 22:17:34 UTC

Two questions about hadoop

Hi all,

I am a new user with hadoop and have some questions about it.

1)about setting the number of maps/reduces:  With running hadoop on a 8-node
cluster, I set mapred.map.tasks to 64 and
mapred.tasktracker.map.tasks.maximum to 8, but by examining the counter
"launched map tasks" from the output, I found that hadoop launched from 96
to110 map tasks in different jobs. The size of the dataset is 6.4GB and the
dfs.block.size is set to be 64MB. Why is the number of launched map tasks
different in different running jobs with the same dataset size and block
size? Is there a way to make the hadoop launch the same number of map tasks
as specified exactly?

2)about the launched map tasks. Does the number of launched map tasks imply
that hadoop would spawn a new thread for each map task? How can I know the
number of threads launched by hadoop in a particular job?

Thanks very much~~

-- 
---
Wei

Re: Two questions about hadoop

Posted by chaitanya krishna <ch...@gmail.com>.

Hi,

Try setting number of map tasks in the program itself. For example, in the
Wordcount example, you can set the number of maptasks in run method as

conf.setNumMapTasks<no. of map tasks>

I hope this answers your first query.

Regards,
V.V.Chaitanya Krishna
IIIT,Hyderabad


On Wed, Jul 16, 2008 at 1:47 AM, Wei Jiang <ha...@gmail.com> wrote:

> Hi all,
>
> I am a new user with hadoop and have some questions about it.
>
> 1)about setting the number of maps/reduces:  With running hadoop on a
> 8-node
> cluster, I set mapred.map.tasks to 64 and
> mapred.tasktracker.map.tasks.maximum to 8, but by examining the counter
> "launched map tasks" from the output, I found that hadoop launched from 96
> to110 map tasks in different jobs. The size of the dataset is 6.4GB and the
> dfs.block.size is set to be 64MB. Why is the number of launched map tasks
> different in different running jobs with the same dataset size and block
> size? Is there a way to make the hadoop launch the same number of map tasks
> as specified exactly?
>
> 2)about the launched map tasks. Does the number of launched map tasks imply
> that hadoop would spawn a new thread for each map task? How can I know the
> number of threads launched by hadoop in a particular job?
>
> Thanks very much~~
>
> --
> ---
> Wei
>