You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Something Something <ma...@gmail.com> on 2010/01/18 08:21:13 UTC

How do I trigger multiple Mapper tasks?

Hello,

I read the documentation about running multiple Mapper tasks, but I can't
get multiple Mappers to work.  I am running under EC2 with 10 nodes.

Here's what I know:

1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
size, but I would like to override that.  My file is small, but each line
triggers fairly long running complicated calculations that should be run in
parallel.

2)  I tried setting the following property in the mapred-site.xml (only on
Master), but that doesn't seem to help:

<property>
  <name>mapred.map.tasks</name>
  <value>10</value>
</property>

I still see the following message:

10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1

(Also, I know for fact that multiple mappers are not running!)


3) I read somewhere that JobConf has a method called setNumMapTasks, but
this class has been deprecated, and as such I am not using.  Besides this
method just provides a hint to Hadoop, I heard.

So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.

Re: How do I trigger multiple Mapper tasks?

Posted by Chandraprakash Bhagtani <cp...@gmail.com>.
you can set *mapred.max.split.size* property in mapred-site.xml to create
more splits and map tasks.

On Mon, Jan 18, 2010 at 12:51 PM, Something Something <
mailinglists19@gmail.com> wrote:

> Hello,
>
> I read the documentation about running multiple Mapper tasks, but I can't
> get multiple Mappers to work.  I am running under EC2 with 10 nodes.
>
> Here's what I know:
>
> 1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
> size, but I would like to override that.  My file is small, but each line
> triggers fairly long running complicated calculations that should be run in
> parallel.
>
> 2)  I tried setting the following property in the mapred-site.xml (only on
> Master), but that doesn't seem to help:
>
> <property>
>  <name>mapred.map.tasks</name>
>  <value>10</value>
> </property>
>
> I still see the following message:
>
> 10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
> 10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1
>
> (Also, I know for fact that multiple mappers are not running!)
>
>
> 3) I read somewhere that JobConf has a method called setNumMapTasks, but
> this class has been deprecated, and as such I am not using.  Besides this
> method just provides a hint to Hadoop, I heard.
>
> So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.
>



-- 
Thanks & Regards,
Chandra Prakash Bhagtani,
Impetus Infotech (india) Pvt Ltd.

Re: How do I trigger multiple Mapper tasks?

Posted by Something Something <ma...@gmail.com>.
Thanks for the replies.  The NLineInputFormat uses JobConf which has been
deprecated so I would rather not use that class.  But I looked at the
FileInputFormat which has the following method:

    FileInputFormat.setMinInputSplitSize(job, 100);

I thought if I set InputSplitSize to 100, for every 100 lines in the input
file a Mapper would be triggered.  My input file has 500 lines, so I was
expecting to see 5 Mappers, but only one Mapper is triggered.

Please help.  Thanks.


On Sun, Jan 17, 2010 at 11:45 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

>
> Changing the audience to mapreduce-user.
>
> Setting the number of map tasks (mapred.map.tasks or
> JobConf.setNumMapTasks()) does not guarantee that number of maps in the job
> will be set to that. It will only be used as a hint. Number of maps is
> decided by your InputFormat. You should implement InputFormat.getSplits() to
> define how the input should be split. The fact is "number of splits is equal
> to the number of maps".
> If you are using default InputFormat (i.e. TextInputFormat), number of maps
> is decided by DFS block size. If you use NLineInputFormat with
> mapred.line.input.format.linespermap=1, number of maps will be number of
> lines in the file.
> More details @
>
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks%28int%29
>
> Thanks
> Amareshwari
> On 1/18/10 12:51 PM, "Something Something" <ma...@gmail.com>
> wrote:
>
> Hello,
>
> I read the documentation about running multiple Mapper tasks, but I can't
> get multiple Mappers to work.  I am running under EC2 with 10 nodes.
>
> Here's what I know:
>
> 1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
> size, but I would like to override that.  My file is small, but each line
> triggers fairly long running complicated calculations that should be run in
> parallel.
>
> 2)  I tried setting the following property in the mapred-site.xml (only on
> Master), but that doesn't seem to help:
>
> <property>
>  <name>mapred.map.tasks</name>
>  <value>10</value>
> </property>
>
> I still see the following message:
>
> 10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
> 10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1
>
> (Also, I know for fact that multiple mappers are not running!)
>
>
> 3) I read somewhere that JobConf has a method called setNumMapTasks, but
> this class has been deprecated, and as such I am not using.  Besides this
> method just provides a hint to Hadoop, I heard.
>
> So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.
>
>

Re: How do I trigger multiple Mapper tasks?

Posted by Something Something <ma...@gmail.com>.
Thanks for the replies.  The NLineInputFormat uses JobConf which has been
deprecated so I would rather not use that class.  But I looked at the
FileInputFormat which has the following method:

    FileInputFormat.setMinInputSplitSize(job, 100);

I thought if I set InputSplitSize to 100, for every 100 lines in the input
file a Mapper would be triggered.  My input file has 500 lines, so I was
expecting to see 5 Mappers, but only one Mapper is triggered.

Please help.  Thanks.


On Sun, Jan 17, 2010 at 11:45 PM, Amareshwari Sri Ramadasu <
amarsri@yahoo-inc.com> wrote:

>
> Changing the audience to mapreduce-user.
>
> Setting the number of map tasks (mapred.map.tasks or
> JobConf.setNumMapTasks()) does not guarantee that number of maps in the job
> will be set to that. It will only be used as a hint. Number of maps is
> decided by your InputFormat. You should implement InputFormat.getSplits() to
> define how the input should be split. The fact is "number of splits is equal
> to the number of maps".
> If you are using default InputFormat (i.e. TextInputFormat), number of maps
> is decided by DFS block size. If you use NLineInputFormat with
> mapred.line.input.format.linespermap=1, number of maps will be number of
> lines in the file.
> More details @
>
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks%28int%29
>
> Thanks
> Amareshwari
> On 1/18/10 12:51 PM, "Something Something" <ma...@gmail.com>
> wrote:
>
> Hello,
>
> I read the documentation about running multiple Mapper tasks, but I can't
> get multiple Mappers to work.  I am running under EC2 with 10 nodes.
>
> Here's what I know:
>
> 1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
> size, but I would like to override that.  My file is small, but each line
> triggers fairly long running complicated calculations that should be run in
> parallel.
>
> 2)  I tried setting the following property in the mapred-site.xml (only on
> Master), but that doesn't seem to help:
>
> <property>
>  <name>mapred.map.tasks</name>
>  <value>10</value>
> </property>
>
> I still see the following message:
>
> 10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
> 10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1
>
> (Also, I know for fact that multiple mappers are not running!)
>
>
> 3) I read somewhere that JobConf has a method called setNumMapTasks, but
> this class has been deprecated, and as such I am not using.  Besides this
> method just provides a hint to Hadoop, I heard.
>
> So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.
>
>

Re: How do I trigger multiple Mapper tasks?

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Changing the audience to mapreduce-user.

Setting the number of map tasks (mapred.map.tasks or JobConf.setNumMapTasks()) does not guarantee that number of maps in the job will be set to that. It will only be used as a hint. Number of maps is decided by your InputFormat. You should implement InputFormat.getSplits() to define how the input should be split. The fact is "number of splits is equal to the number of maps".
If you are using default InputFormat (i.e. TextInputFormat), number of maps is decided by DFS block size. If you use NLineInputFormat with mapred.line.input.format.linespermap=1, number of maps will be number of lines in the file.
More details @
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks%28int%29

Thanks
Amareshwari
On 1/18/10 12:51 PM, "Something Something" <ma...@gmail.com> wrote:

Hello,

I read the documentation about running multiple Mapper tasks, but I can't
get multiple Mappers to work.  I am running under EC2 with 10 nodes.

Here's what I know:

1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
size, but I would like to override that.  My file is small, but each line
triggers fairly long running complicated calculations that should be run in
parallel.

2)  I tried setting the following property in the mapred-site.xml (only on
Master), but that doesn't seem to help:

<property>
  <name>mapred.map.tasks</name>
  <value>10</value>
</property>

I still see the following message:

10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1

(Also, I know for fact that multiple mappers are not running!)


3) I read somewhere that JobConf has a method called setNumMapTasks, but
this class has been deprecated, and as such I am not using.  Besides this
method just provides a hint to Hadoop, I heard.

So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.


Re: How do I trigger multiple Mapper tasks?

Posted by Amareshwari Sri Ramadasu <am...@yahoo-inc.com>.
Changing the audience to mapreduce-user.

Setting the number of map tasks (mapred.map.tasks or JobConf.setNumMapTasks()) does not guarantee that number of maps in the job will be set to that. It will only be used as a hint. Number of maps is decided by your InputFormat. You should implement InputFormat.getSplits() to define how the input should be split. The fact is "number of splits is equal to the number of maps".
If you are using default InputFormat (i.e. TextInputFormat), number of maps is decided by DFS block size. If you use NLineInputFormat with mapred.line.input.format.linespermap=1, number of maps will be number of lines in the file.
More details @
http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks%28int%29

Thanks
Amareshwari
On 1/18/10 12:51 PM, "Something Something" <ma...@gmail.com> wrote:

Hello,

I read the documentation about running multiple Mapper tasks, but I can't
get multiple Mappers to work.  I am running under EC2 with 10 nodes.

Here's what I know:

1)   I guess, by default, No. of Mapper tasks will be decided by DFS block
size, but I would like to override that.  My file is small, but each line
triggers fairly long running complicated calculations that should be run in
parallel.

2)  I tried setting the following property in the mapred-site.xml (only on
Master), but that doesn't seem to help:

<property>
  <name>mapred.map.tasks</name>
  <value>10</value>
</property>

I still see the following message:

10/01/18 01:56:34 INFO mapred.JobClient:     Launched map tasks=1
10/01/18 01:56:34 INFO mapred.JobClient:     Data-local map tasks=1

(Also, I know for fact that multiple mappers are not running!)


3) I read somewhere that JobConf has a method called setNumMapTasks, but
this class has been deprecated, and as such I am not using.  Besides this
method just provides a hint to Hadoop, I heard.

So how do I trigger multiple Mapper tasks?  Please let me know.  Thanks.