You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by javateck javateck <ja...@gmail.com> on 2009/04/21 22:20:16 UTC
mapred.tasktracker.map.tasks.maximum
I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
task, it's only using 2 out of 10, any way to know why it's only using 2?
thanks
Re: mapred.tasktracker.map.tasks.maximum
Posted by Miles Osborne <mi...@inf.ed.ac.uk>.
they are the places to check. a job can itself over-ride the number
of mappers and reducers. for example, using streaming, i often state
the number of mappers and reducers i want to use:
-jobconf mapred.reduce.tasks=30
this would tell hadoop to use 30 reducers, for example.
if you don't have enough memory to run a mapper then you will see
error messages logged somewhere. perhaps it might be useful to check
all the logs
Miles
2009/4/22 javateck javateck <ja...@gmail.com>:
> I want to have something to clarify, for the max task slots, are these
> places to check:
> 1. hadoop-site.xml
> 2. the specific job's job.conf which can be retrieved though the job, for
> example, logs/job_200904212336_0002_conf.xml
>
> Any other place to limit the task map counts?
>
> In my case, it's strange, because I set 10 for "
> mapred.tasktracker.map.tasks.maximum", and I check the job's conf is also
> 10, but actual hadoop is just using 2 map jobs.
>
>
> On Tue, Apr 21, 2009 at 1:20 PM, javateck javateck <ja...@gmail.com>wrote:
>
>> I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
>> task, it's only using 2 out of 10, any way to know why it's only using 2?
>> thanks
>>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: mapred.tasktracker.map.tasks.maximum
Posted by javateck javateck <ja...@gmail.com>.
I want to have something to clarify, for the max task slots, are these
places to check:
1. hadoop-site.xml
2. the specific job's job.conf which can be retrieved though the job, for
example, logs/job_200904212336_0002_conf.xml
Any other place to limit the task map counts?
In my case, it's strange, because I set 10 for "
mapred.tasktracker.map.tasks.maximum", and I check the job's conf is also
10, but actual hadoop is just using 2 map jobs.
On Tue, Apr 21, 2009 at 1:20 PM, javateck javateck <ja...@gmail.com>wrote:
> I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
> task, it's only using 2 out of 10, any way to know why it's only using 2?
> thanks
>
Re: mapred.tasktracker.map.tasks.maximum
Posted by javateck javateck <ja...@gmail.com>.
no, it's plain text file with \t delimited. And I'm expecting one mapper per
file, because I have 175 files, and I got 189 map tasks from what I can see
from the web UI. My issue is that since I have 189 map tasks waiting, why
hadoop is just using 2 of my 10 map slots, and I assume that all map tasks
should be independent.
On Tue, Apr 21, 2009 at 2:23 PM, Miles Osborne <mi...@inf.ed.ac.uk> wrote:
> is your input data compressed? if so then you will get one mapper per file
>
> Miles
>
> 2009/4/21 javateck javateck <ja...@gmail.com>:
> > Hi Koji,
> >
> > Thanks for helping.
> >
> > I don't know why hadoop is just using 2 out of 10 map tasks slots.
> >
> > Sure, I just cut and paste the job tracker web UI, clearly I set the max
> > tasks to 10(which I can verify from hadoop-site.xml and from the
> individual
> > job configuration also), and I did have the first mapreduce running at 10
> > map tasks when I checked from UI, but all subsequent queries are running
> > with 2 map tasks. And I have almost 176 files with each input file around
> > 62~75MB.
> >
> >
> > *mapred.tasktracker.map.tasks.maximum* 10
> >
> > *Kind*
> >
> > *% Complete*
> >
> > *Num Tasks*
> >
> > *Pending*
> >
> > *Running*
> >
> > *Complete*
> >
> > *Killed*
> >
> > *Failed/Killed*<
> http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025>
> >
> > *Task Attempts*
> >
> > *map*
> >
> > 28.04%
> >
> >
> >
> > 189
> >
> > 134
> >
> > 2
> >
> > 53
> >
> > 0
> >
> > 0 / 0
> >
> > *reduce*
> >
> > 0.00%
> >
> >
> > 1
> >
> > 1<
> http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending
> >
> >
> > 0
> >
> > 0
> >
> > 0
> >
> > 0 / 0
> >
> > *
> > *
> >
> > On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi <knoguchi@yahoo-inc.com
> >wrote:
> >
> >> It's probably a silly question, but you do have more than 2 mappers on
> >> your second job?
> >>
> >> If yes, I have no idea what's happening.
> >>
> >> Koji
> >>
> >> -----Original Message-----
> >> From: javateck javateck [mailto:javateck@gmail.com]
> >> Sent: Tuesday, April 21, 2009 1:38 PM
> >> To: core-user@hadoop.apache.org
> >> Subject: Re: mapred.tasktracker.map.tasks.maximum
> >>
> >> right, I set it in hadoop-site.xml before starting the whole hadoop
> >> processes, I have one job running fully utilizing the 10 map tasks, but
> >> subsequent queries are only using 2 of them, don't know why.
> >> I have enough RAM also, no paging out is happening, I'm running on
> >> 0.18.3.
> >> Right now I put all processes on one machine, namenode, datanode,
> >> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
> >>
> >>
> >> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi
> >> <kn...@yahoo-inc.com>wrote:
> >>
> >> > This is a cluster config and not a per job config.
> >> >
> >> > So this has to be set when the mapreduce cluster first comes up.
> >> >
> >> > Koji
> >> >
> >> >
> >> > -----Original Message-----
> >> > From: javateck javateck [mailto:javateck@gmail.com]
> >> > Sent: Tuesday, April 21, 2009 1:20 PM
> >> > To: core-user@hadoop.apache.org
> >> > Subject: mapred.tasktracker.map.tasks.maximum
> >> >
> >> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run
> >> a
> >> > task, it's only using 2 out of 10, any way to know why it's only using
> >> > 2?
> >> > thanks
> >> >
> >>
> >
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
Re: mapred.tasktracker.map.tasks.maximum
Posted by Miles Osborne <mi...@inf.ed.ac.uk>.
is your input data compressed? if so then you will get one mapper per file
Miles
2009/4/21 javateck javateck <ja...@gmail.com>:
> Hi Koji,
>
> Thanks for helping.
>
> I don't know why hadoop is just using 2 out of 10 map tasks slots.
>
> Sure, I just cut and paste the job tracker web UI, clearly I set the max
> tasks to 10(which I can verify from hadoop-site.xml and from the individual
> job configuration also), and I did have the first mapreduce running at 10
> map tasks when I checked from UI, but all subsequent queries are running
> with 2 map tasks. And I have almost 176 files with each input file around
> 62~75MB.
>
>
> *mapred.tasktracker.map.tasks.maximum* 10
>
> *Kind*
>
> *% Complete*
>
> *Num Tasks*
>
> *Pending*
>
> *Running*
>
> *Complete*
>
> *Killed*
>
> *Failed/Killed*<http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025>
>
> *Task Attempts*
>
> *map*
>
> 28.04%
>
>
>
> 189
>
> 134
>
> 2
>
> 53
>
> 0
>
> 0 / 0
>
> *reduce*
>
> 0.00%
>
>
> 1
>
> 1<http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending>
>
> 0
>
> 0
>
> 0
>
> 0 / 0
>
> *
> *
>
> On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
>
>> It's probably a silly question, but you do have more than 2 mappers on
>> your second job?
>>
>> If yes, I have no idea what's happening.
>>
>> Koji
>>
>> -----Original Message-----
>> From: javateck javateck [mailto:javateck@gmail.com]
>> Sent: Tuesday, April 21, 2009 1:38 PM
>> To: core-user@hadoop.apache.org
>> Subject: Re: mapred.tasktracker.map.tasks.maximum
>>
>> right, I set it in hadoop-site.xml before starting the whole hadoop
>> processes, I have one job running fully utilizing the 10 map tasks, but
>> subsequent queries are only using 2 of them, don't know why.
>> I have enough RAM also, no paging out is happening, I'm running on
>> 0.18.3.
>> Right now I put all processes on one machine, namenode, datanode,
>> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
>>
>>
>> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi
>> <kn...@yahoo-inc.com>wrote:
>>
>> > This is a cluster config and not a per job config.
>> >
>> > So this has to be set when the mapreduce cluster first comes up.
>> >
>> > Koji
>> >
>> >
>> > -----Original Message-----
>> > From: javateck javateck [mailto:javateck@gmail.com]
>> > Sent: Tuesday, April 21, 2009 1:20 PM
>> > To: core-user@hadoop.apache.org
>> > Subject: mapred.tasktracker.map.tasks.maximum
>> >
>> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run
>> a
>> > task, it's only using 2 out of 10, any way to know why it's only using
>> > 2?
>> > thanks
>> >
>>
>
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: mapred.tasktracker.map.tasks.maximum
Posted by javateck javateck <ja...@gmail.com>.
Hi Koji,
Thanks for helping.
I don't know why hadoop is just using 2 out of 10 map tasks slots.
Sure, I just cut and paste the job tracker web UI, clearly I set the max
tasks to 10(which I can verify from hadoop-site.xml and from the individual
job configuration also), and I did have the first mapreduce running at 10
map tasks when I checked from UI, but all subsequent queries are running
with 2 map tasks. And I have almost 176 files with each input file around
62~75MB.
*mapred.tasktracker.map.tasks.maximum* 10
*Kind*
*% Complete*
*Num Tasks*
*Pending*
*Running*
*Complete*
*Killed*
*Failed/Killed*<http://etsx18.apple.com:50030/jobfailures.jsp?jobid=job_200904211923_0025>
*Task Attempts*
*map*
28.04%
189
134
2
53
0
0 / 0
*reduce*
0.00%
1
1<http://etsx18.apple.com:50030/jobtasks.jsp?jobid=job_200904211923_0025&type=reduce&pagenum=1&state=pending>
0
0
0
0 / 0
*
*
On Tue, Apr 21, 2009 at 1:56 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
> It's probably a silly question, but you do have more than 2 mappers on
> your second job?
>
> If yes, I have no idea what's happening.
>
> Koji
>
> -----Original Message-----
> From: javateck javateck [mailto:javateck@gmail.com]
> Sent: Tuesday, April 21, 2009 1:38 PM
> To: core-user@hadoop.apache.org
> Subject: Re: mapred.tasktracker.map.tasks.maximum
>
> right, I set it in hadoop-site.xml before starting the whole hadoop
> processes, I have one job running fully utilizing the 10 map tasks, but
> subsequent queries are only using 2 of them, don't know why.
> I have enough RAM also, no paging out is happening, I'm running on
> 0.18.3.
> Right now I put all processes on one machine, namenode, datanode,
> jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
>
>
> On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi
> <kn...@yahoo-inc.com>wrote:
>
> > This is a cluster config and not a per job config.
> >
> > So this has to be set when the mapreduce cluster first comes up.
> >
> > Koji
> >
> >
> > -----Original Message-----
> > From: javateck javateck [mailto:javateck@gmail.com]
> > Sent: Tuesday, April 21, 2009 1:20 PM
> > To: core-user@hadoop.apache.org
> > Subject: mapred.tasktracker.map.tasks.maximum
> >
> > I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run
> a
> > task, it's only using 2 out of 10, any way to know why it's only using
> > 2?
> > thanks
> >
>
RE: mapred.tasktracker.map.tasks.maximum
Posted by Koji Noguchi <kn...@yahoo-inc.com>.
It's probably a silly question, but you do have more than 2 mappers on
your second job?
If yes, I have no idea what's happening.
Koji
-----Original Message-----
From: javateck javateck [mailto:javateck@gmail.com]
Sent: Tuesday, April 21, 2009 1:38 PM
To: core-user@hadoop.apache.org
Subject: Re: mapred.tasktracker.map.tasks.maximum
right, I set it in hadoop-site.xml before starting the whole hadoop
processes, I have one job running fully utilizing the 10 map tasks, but
subsequent queries are only using 2 of them, don't know why.
I have enough RAM also, no paging out is happening, I'm running on
0.18.3.
Right now I put all processes on one machine, namenode, datanode,
jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi
<kn...@yahoo-inc.com>wrote:
> This is a cluster config and not a per job config.
>
> So this has to be set when the mapreduce cluster first comes up.
>
> Koji
>
>
> -----Original Message-----
> From: javateck javateck [mailto:javateck@gmail.com]
> Sent: Tuesday, April 21, 2009 1:20 PM
> To: core-user@hadoop.apache.org
> Subject: mapred.tasktracker.map.tasks.maximum
>
> I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run
a
> task, it's only using 2 out of 10, any way to know why it's only using
> 2?
> thanks
>
Re: mapred.tasktracker.map.tasks.maximum
Posted by javateck javateck <ja...@gmail.com>.
right, I set it in hadoop-site.xml before starting the whole hadoop
processes, I have one job running fully utilizing the 10 map tasks, but
subsequent queries are only using 2 of them, don't know why.
I have enough RAM also, no paging out is happening, I'm running on 0.18.3.
Right now I put all processes on one machine, namenode, datanode,
jobtracker, tasktracker, I have a 2*4core CPU, and 20GB RAM.
On Tue, Apr 21, 2009 at 1:25 PM, Koji Noguchi <kn...@yahoo-inc.com>wrote:
> This is a cluster config and not a per job config.
>
> So this has to be set when the mapreduce cluster first comes up.
>
> Koji
>
>
> -----Original Message-----
> From: javateck javateck [mailto:javateck@gmail.com]
> Sent: Tuesday, April 21, 2009 1:20 PM
> To: core-user@hadoop.apache.org
> Subject: mapred.tasktracker.map.tasks.maximum
>
> I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
> task, it's only using 2 out of 10, any way to know why it's only using
> 2?
> thanks
>
RE: mapred.tasktracker.map.tasks.maximum
Posted by Koji Noguchi <kn...@yahoo-inc.com>.
This is a cluster config and not a per job config.
So this has to be set when the mapreduce cluster first comes up.
Koji
-----Original Message-----
From: javateck javateck [mailto:javateck@gmail.com]
Sent: Tuesday, April 21, 2009 1:20 PM
To: core-user@hadoop.apache.org
Subject: mapred.tasktracker.map.tasks.maximum
I set my "mapred.tasktracker.map.tasks.maximum" to 10, but when I run a
task, it's only using 2 out of 10, any way to know why it's only using
2?
thanks