You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Jean-Marc Spaggiari <je...@spaggiari.org> on 2012/10/11 21:59:31 UTC

Force the number of map tasks in MR?

Hi,

Is there a way to force the number of map tasks in a MR?

I have a 25 regions table splitted over 6 nodes. But the MR is running
the tasks only 2 by 2.

Is there a way to force it to run one task on each regionserver
serving at least one region? Why is the MR waiting for 2 taskes to
complete before sending to the other tasks?

I'm starting the MR with a caching of 100.

I tried mapred.map.tasks and speculative=false with no success.

Any idea how I can increase it this number of tasks?

Thanks,

JM

Re: Force the number of map tasks in MR?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
Hi Bryan,

J-D replied in another thread. The issue was because of a
misconfiguration on the mapred side. I was facing only the local job
tracker.. that's why only 2 tasks was running at a time.

I re-configured the cluster and it's now working very well.

Next step is to build my own mapreduce for testing...

Thanks,

JM

2012/10/11, Bryan Beaudreault <bb...@hubspot.com>:
> JM,
>
> Are you trying to use HTableInputFormat to scan HBase from map reduce?  If
> so, there is a map task per region so you should have 25 regions.  If only
> 2 are running at once thats a problem with your hadoop setup.  Is your job
> running in a pool with only 2 slots available?
>
> If not HTableInputFormat, Jon is right.  If your input is a splittable
> format, like SequenceFileInputFormat, you can further split them using the
> setting mapred.max.split.size.  I believe the default is 100mb or
> something.  This can be set on a per-job basis using the job conf.
>
> - Bryan
>
> On Thu, Oct 11, 2012 at 4:50 PM, Jonathan Bishop
> <jb...@gmail.com>wrote:
>
>> JM,
>>
>> The number of map tasks will be limited by the number of input splits
>> available. Assuming you are reading files, that is.
>>
>> Also, you need to reboot your cluster for those setting to take effect.
>>
>> Hope this helps,
>>
>> Jon Bishop
>>
>> On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari <
>> jean-marc@spaggiari.org> wrote:
>>
>> > But this is the limit per tasktracker, right?
>> >
>> > And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12
>> > tasks?
>> >
>> > Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ
>> >
>> > I just tried with the setting below (changing 2 by 6) but I'm getting
>> > the same result.
>> >
>> > JM
>> >
>> > 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
>> > > J-M,
>> > >
>> > >   It should be in the mapred-site.xml the values
>> > > are mapred.tasktracker.map.tasks.maximum and
>> > > mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4
>> > >
>> > > <property>
>> > >   <name>mapreduce.tasktracker.map.tasks.maximum</name>
>> > >   <value>2</value>
>> > >   <description>The maximum number of map tasks that will be run
>> > >   simultaneously by a task tracker.
>> > >   </description>
>> > > </property>
>> > >
>> > > <property>
>> > >   <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>> > >   <value>2</value>
>> > >   <description>The maximum number of reduce tasks that will be run
>> > >   simultaneously by a task tracker.
>> > >   </description>
>> > > </property>
>> > >
>> > > This would explain why they are going 2 by 2.  Does this help?
>> > >
>> > > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
>> > > jean-marc@spaggiari.org> wrote:
>> > >
>> > >> I don't know. I did not touched that. Where can I found this
>> > information?
>> > >>
>> > >> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
>> > >> > What are you max tasks set to?
>> > >> >
>> > >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
>> > >> > jean-marc@spaggiari.org> wrote:
>> > >> >
>> > >> >> Hi,
>> > >> >>
>> > >> >> Is there a way to force the number of map tasks in a MR?
>> > >> >>
>> > >> >> I have a 25 regions table splitted over 6 nodes. But the MR is
>> > running
>> > >> >> the tasks only 2 by 2.
>> > >> >>
>> > >> >> Is there a way to force it to run one task on each regionserver
>> > >> >> serving at least one region? Why is the MR waiting for 2 taskes
>> > >> >> to
>> > >> >> complete before sending to the other tasks?
>> > >> >>
>> > >> >> I'm starting the MR with a caching of 100.
>> > >> >>
>> > >> >> I tried mapred.map.tasks and speculative=false with no success.
>> > >> >>
>> > >> >> Any idea how I can increase it this number of tasks?
>> > >> >>
>> > >> >> Thanks,
>> > >> >>
>> > >> >> JM
>> > >> >>
>> > >> >
>> > >> >
>> > >> >
>> > >> > --
>> > >> > Kevin O'Dell
>> > >> > Customer Operations Engineer, Cloudera
>> > >>
>> > >
>> > >
>> > >
>> > > --
>> > > Kevin O'Dell
>> > > Customer Operations Engineer, Cloudera
>> >
>>
>

Re: Force the number of map tasks in MR?

Posted by Bryan Beaudreault <bb...@hubspot.com>.
JM,

Are you trying to use HTableInputFormat to scan HBase from map reduce?  If
so, there is a map task per region so you should have 25 regions.  If only
2 are running at once thats a problem with your hadoop setup.  Is your job
running in a pool with only 2 slots available?

If not HTableInputFormat, Jon is right.  If your input is a splittable
format, like SequenceFileInputFormat, you can further split them using the
setting mapred.max.split.size.  I believe the default is 100mb or
something.  This can be set on a per-job basis using the job conf.

- Bryan

On Thu, Oct 11, 2012 at 4:50 PM, Jonathan Bishop <jb...@gmail.com>wrote:

> JM,
>
> The number of map tasks will be limited by the number of input splits
> available. Assuming you are reading files, that is.
>
> Also, you need to reboot your cluster for those setting to take effect.
>
> Hope this helps,
>
> Jon Bishop
>
> On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
> > But this is the limit per tasktracker, right?
> >
> > And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12
> > tasks?
> >
> > Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ
> >
> > I just tried with the setting below (changing 2 by 6) but I'm getting
> > the same result.
> >
> > JM
> >
> > 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> > > J-M,
> > >
> > >   It should be in the mapred-site.xml the values
> > > are mapred.tasktracker.map.tasks.maximum and
> > > mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4
> > >
> > > <property>
> > >   <name>mapreduce.tasktracker.map.tasks.maximum</name>
> > >   <value>2</value>
> > >   <description>The maximum number of map tasks that will be run
> > >   simultaneously by a task tracker.
> > >   </description>
> > > </property>
> > >
> > > <property>
> > >   <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
> > >   <value>2</value>
> > >   <description>The maximum number of reduce tasks that will be run
> > >   simultaneously by a task tracker.
> > >   </description>
> > > </property>
> > >
> > > This would explain why they are going 2 by 2.  Does this help?
> > >
> > > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
> > > jean-marc@spaggiari.org> wrote:
> > >
> > >> I don't know. I did not touched that. Where can I found this
> > information?
> > >>
> > >> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> > >> > What are you max tasks set to?
> > >> >
> > >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
> > >> > jean-marc@spaggiari.org> wrote:
> > >> >
> > >> >> Hi,
> > >> >>
> > >> >> Is there a way to force the number of map tasks in a MR?
> > >> >>
> > >> >> I have a 25 regions table splitted over 6 nodes. But the MR is
> > running
> > >> >> the tasks only 2 by 2.
> > >> >>
> > >> >> Is there a way to force it to run one task on each regionserver
> > >> >> serving at least one region? Why is the MR waiting for 2 taskes to
> > >> >> complete before sending to the other tasks?
> > >> >>
> > >> >> I'm starting the MR with a caching of 100.
> > >> >>
> > >> >> I tried mapred.map.tasks and speculative=false with no success.
> > >> >>
> > >> >> Any idea how I can increase it this number of tasks?
> > >> >>
> > >> >> Thanks,
> > >> >>
> > >> >> JM
> > >> >>
> > >> >
> > >> >
> > >> >
> > >> > --
> > >> > Kevin O'Dell
> > >> > Customer Operations Engineer, Cloudera
> > >>
> > >
> > >
> > >
> > > --
> > > Kevin O'Dell
> > > Customer Operations Engineer, Cloudera
> >
>

Re: Force the number of map tasks in MR?

Posted by Jonathan Bishop <jb...@gmail.com>.
JM,

The number of map tasks will be limited by the number of input splits
available. Assuming you are reading files, that is.

Also, you need to reboot your cluster for those setting to take effect.

Hope this helps,

Jon Bishop

On Thu, Oct 11, 2012 at 1:44 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> But this is the limit per tasktracker, right?
>
> And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12
> tasks?
>
> Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ
>
> I just tried with the setting below (changing 2 by 6) but I'm getting
> the same result.
>
> JM
>
> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> > J-M,
> >
> >   It should be in the mapred-site.xml the values
> > are mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4
> >
> > <property>
> >   <name>mapreduce.tasktracker.map.tasks.maximum</name>
> >   <value>2</value>
> >   <description>The maximum number of map tasks that will be run
> >   simultaneously by a task tracker.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
> >   <value>2</value>
> >   <description>The maximum number of reduce tasks that will be run
> >   simultaneously by a task tracker.
> >   </description>
> > </property>
> >
> > This would explain why they are going 2 by 2.  Does this help?
> >
> > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> I don't know. I did not touched that. Where can I found this
> information?
> >>
> >> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> >> > What are you max tasks set to?
> >> >
> >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Is there a way to force the number of map tasks in a MR?
> >> >>
> >> >> I have a 25 regions table splitted over 6 nodes. But the MR is
> running
> >> >> the tasks only 2 by 2.
> >> >>
> >> >> Is there a way to force it to run one task on each regionserver
> >> >> serving at least one region? Why is the MR waiting for 2 taskes to
> >> >> complete before sending to the other tasks?
> >> >>
> >> >> I'm starting the MR with a caching of 100.
> >> >>
> >> >> I tried mapred.map.tasks and speculative=false with no success.
> >> >>
> >> >> Any idea how I can increase it this number of tasks?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> JM
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Kevin O'Dell
> >> > Customer Operations Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
>

Re: Force the number of map tasks in MR?

Posted by Kevin O'dell <ke...@cloudera.com>.
Lets combine this with JD's request and work off of that thread.  Can we
work off of that thread and follow up with LocalJobRunner question?

On Thu, Oct 11, 2012 at 4:44 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> But this is the limit per tasktracker, right?
>
> And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12
> tasks?
>
> Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ
>
> I just tried with the setting below (changing 2 by 6) but I'm getting
> the same result.
>
> JM
>
> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> > J-M,
> >
> >   It should be in the mapred-site.xml the values
> > are mapred.tasktracker.map.tasks.maximum and
> > mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4
> >
> > <property>
> >   <name>mapreduce.tasktracker.map.tasks.maximum</name>
> >   <value>2</value>
> >   <description>The maximum number of map tasks that will be run
> >   simultaneously by a task tracker.
> >   </description>
> > </property>
> >
> > <property>
> >   <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
> >   <value>2</value>
> >   <description>The maximum number of reduce tasks that will be run
> >   simultaneously by a task tracker.
> >   </description>
> > </property>
> >
> > This would explain why they are going 2 by 2.  Does this help?
> >
> > On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> I don't know. I did not touched that. Where can I found this
> information?
> >>
> >> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> >> > What are you max tasks set to?
> >> >
> >> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
> >> > jean-marc@spaggiari.org> wrote:
> >> >
> >> >> Hi,
> >> >>
> >> >> Is there a way to force the number of map tasks in a MR?
> >> >>
> >> >> I have a 25 regions table splitted over 6 nodes. But the MR is
> running
> >> >> the tasks only 2 by 2.
> >> >>
> >> >> Is there a way to force it to run one task on each regionserver
> >> >> serving at least one region? Why is the MR waiting for 2 taskes to
> >> >> complete before sending to the other tasks?
> >> >>
> >> >> I'm starting the MR with a caching of 100.
> >> >>
> >> >> I tried mapred.map.tasks and speculative=false with no success.
> >> >>
> >> >> Any idea how I can increase it this number of tasks?
> >> >>
> >> >> Thanks,
> >> >>
> >> >> JM
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Kevin O'Dell
> >> > Customer Operations Engineer, Cloudera
> >>
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Force the number of map tasks in MR?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
But this is the limit per tasktracker, right?

And I have 6 nodes, so 6 tasktrackers, which mean it should go up to 12 tasks?

Take a look at 2.7 here: http://wiki.apache.org/hadoop/FAQ

I just tried with the setting below (changing 2 by 6) but I'm getting
the same result.

JM

2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> J-M,
>
>   It should be in the mapred-site.xml the values
> are mapred.tasktracker.map.tasks.maximum and
> mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4
>
> <property>
>   <name>mapreduce.tasktracker.map.tasks.maximum</name>
>   <value>2</value>
>   <description>The maximum number of map tasks that will be run
>   simultaneously by a task tracker.
>   </description>
> </property>
>
> <property>
>   <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
>   <value>2</value>
>   <description>The maximum number of reduce tasks that will be run
>   simultaneously by a task tracker.
>   </description>
> </property>
>
> This would explain why they are going 2 by 2.  Does this help?
>
> On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> I don't know. I did not touched that. Where can I found this information?
>>
>> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
>> > What are you max tasks set to?
>> >
>> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
>> > jean-marc@spaggiari.org> wrote:
>> >
>> >> Hi,
>> >>
>> >> Is there a way to force the number of map tasks in a MR?
>> >>
>> >> I have a 25 regions table splitted over 6 nodes. But the MR is running
>> >> the tasks only 2 by 2.
>> >>
>> >> Is there a way to force it to run one task on each regionserver
>> >> serving at least one region? Why is the MR waiting for 2 taskes to
>> >> complete before sending to the other tasks?
>> >>
>> >> I'm starting the MR with a caching of 100.
>> >>
>> >> I tried mapred.map.tasks and speculative=false with no success.
>> >>
>> >> Any idea how I can increase it this number of tasks?
>> >>
>> >> Thanks,
>> >>
>> >> JM
>> >>
>> >
>> >
>> >
>> > --
>> > Kevin O'Dell
>> > Customer Operations Engineer, Cloudera
>>
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera

Re: Force the number of map tasks in MR?

Posted by Kevin O'dell <ke...@cloudera.com>.
J-M,

  It should be in the mapred-site.xml the values
are mapred.tasktracker.map.tasks.maximum and
mapred.tasktracker.reduce.tasks.maximum.  This is the default in CDH4

<property>
  <name>mapreduce.tasktracker.map.tasks.maximum</name>
  <value>2</value>
  <description>The maximum number of map tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

<property>
  <name>mapreduce.tasktracker.reduce.tasks.maximum</name>
  <value>2</value>
  <description>The maximum number of reduce tasks that will be run
  simultaneously by a task tracker.
  </description>
</property>

This would explain why they are going 2 by 2.  Does this help?

On Thu, Oct 11, 2012 at 4:25 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> I don't know. I did not touched that. Where can I found this information?
>
> 2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> > What are you max tasks set to?
> >
> > On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Hi,
> >>
> >> Is there a way to force the number of map tasks in a MR?
> >>
> >> I have a 25 regions table splitted over 6 nodes. But the MR is running
> >> the tasks only 2 by 2.
> >>
> >> Is there a way to force it to run one task on each regionserver
> >> serving at least one region? Why is the MR waiting for 2 taskes to
> >> complete before sending to the other tasks?
> >>
> >> I'm starting the MR with a caching of 100.
> >>
> >> I tried mapred.map.tasks and speculative=false with no success.
> >>
> >> Any idea how I can increase it this number of tasks?
> >>
> >> Thanks,
> >>
> >> JM
> >>
> >
> >
> >
> > --
> > Kevin O'Dell
> > Customer Operations Engineer, Cloudera
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera

Re: Force the number of map tasks in MR?

Posted by Jean-Marc Spaggiari <je...@spaggiari.org>.
I don't know. I did not touched that. Where can I found this information?

2012/10/11 Kevin O'dell <ke...@cloudera.com>:
> What are you max tasks set to?
>
> On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
> jean-marc@spaggiari.org> wrote:
>
>> Hi,
>>
>> Is there a way to force the number of map tasks in a MR?
>>
>> I have a 25 regions table splitted over 6 nodes. But the MR is running
>> the tasks only 2 by 2.
>>
>> Is there a way to force it to run one task on each regionserver
>> serving at least one region? Why is the MR waiting for 2 taskes to
>> complete before sending to the other tasks?
>>
>> I'm starting the MR with a caching of 100.
>>
>> I tried mapred.map.tasks and speculative=false with no success.
>>
>> Any idea how I can increase it this number of tasks?
>>
>> Thanks,
>>
>> JM
>>
>
>
>
> --
> Kevin O'Dell
> Customer Operations Engineer, Cloudera

Re: Force the number of map tasks in MR?

Posted by Kevin O'dell <ke...@cloudera.com>.
What are you max tasks set to?

On Thu, Oct 11, 2012 at 3:59 PM, Jean-Marc Spaggiari <
jean-marc@spaggiari.org> wrote:

> Hi,
>
> Is there a way to force the number of map tasks in a MR?
>
> I have a 25 regions table splitted over 6 nodes. But the MR is running
> the tasks only 2 by 2.
>
> Is there a way to force it to run one task on each regionserver
> serving at least one region? Why is the MR waiting for 2 taskes to
> complete before sending to the other tasks?
>
> I'm starting the MR with a caching of 100.
>
> I tried mapred.map.tasks and speculative=false with no success.
>
> Any idea how I can increase it this number of tasks?
>
> Thanks,
>
> JM
>



-- 
Kevin O'Dell
Customer Operations Engineer, Cloudera