You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by He Chen <ai...@gmail.com> on 2010/04/22 18:50:13 UTC

Hadoop does not follow my setting

Hi everyone

I am doing a benchmark by using Hadoop 0.20.0's wordcount example. I have a
30GB file. I plan to test differenct number of mappers' performance. For
example, for a wordcount job, I plan to test 22 mappers, 44 mappers, 66
mappers and 110 mappers.

However, I set the "mapred.map.tasks" equals to 22. But when I ran the job,
it shows 436 mappers total.

I think maybe the wordcount set its parameters inside the its own program. I
give "-Dmapred.map.tasks=22" to this program. But it is still 436 again in
my another try.  I found out that 30GB divide by 436 is just 64MB, it is
just my block size.

Any suggestions will be appreciated.

Thank you in advance!

-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588

Re: Hadoop does not follow my setting

Posted by He Chen <ai...@gmail.com>.
Hey Eric Sammer

Thank you for the reply. Actually, I only care about the number of mappers
in my circumstance. Looks like, I should write the wordcount program with my
own InputFormat class.

2010/4/22 Eric Sammer <es...@cloudera.com>

> This is normal and expected. The mapred.map.tasks parameter is only a
> hint. The InputFormat gets to decide how to calculate splits.
> FileInputFormat and all subclasses, including TextInputFormat, use a
> few parameters to figure out what the appropriate split size will be
> but under most circumstances, this winds up being the block size. If
> you used fewer map tasks than blocks, you would sacrifice data
> locality which would only hurt performance.
>
> 2010/4/22 He Chen <ai...@gmail.com>:
>  > Hi everyone
> >
> > I am doing a benchmark by using Hadoop 0.20.0's wordcount example. I have
> a
> > 30GB file. I plan to test differenct number of mappers' performance. For
> > example, for a wordcount job, I plan to test 22 mappers, 44 mappers, 66
> > mappers and 110 mappers.
> >
> > However, I set the "mapred.map.tasks" equals to 22. But when I ran the
> job,
> > it shows 436 mappers total.
> >
> > I think maybe the wordcount set its parameters inside the its own
> program. I
> > give "-Dmapred.map.tasks=22" to this program. But it is still 436 again
> in
> > my another try.  I found out that 30GB divide by 436 is just 64MB, it is
> > just my block size.
> >
> > Any suggestions will be appreciated.
> >
> > Thank you in advance!
> >
> > --
> > Best Wishes!
> > 顺送商祺!
> >
> > --
> > Chen He
> > (402)613-9298
> > PhD. student of CSE Dept.
> > Holland Computing Center
> > University of Nebraska-Lincoln
> > Lincoln NE 68588
> >
>
>
>
> --
> Eric Sammer
> phone: +1-917-287-2675
> twitter: esammer
> data: www.cloudera.com
>

Re: Hadoop does not follow my setting

Posted by Eric Sammer <es...@cloudera.com>.
This is normal and expected. The mapred.map.tasks parameter is only a
hint. The InputFormat gets to decide how to calculate splits.
FileInputFormat and all subclasses, including TextInputFormat, use a
few parameters to figure out what the appropriate split size will be
but under most circumstances, this winds up being the block size. If
you used fewer map tasks than blocks, you would sacrifice data
locality which would only hurt performance.

2010/4/22 He Chen <ai...@gmail.com>:
> Hi everyone
>
> I am doing a benchmark by using Hadoop 0.20.0's wordcount example. I have a
> 30GB file. I plan to test differenct number of mappers' performance. For
> example, for a wordcount job, I plan to test 22 mappers, 44 mappers, 66
> mappers and 110 mappers.
>
> However, I set the "mapred.map.tasks" equals to 22. But when I ran the job,
> it shows 436 mappers total.
>
> I think maybe the wordcount set its parameters inside the its own program. I
> give "-Dmapred.map.tasks=22" to this program. But it is still 436 again in
> my another try.  I found out that 30GB divide by 436 is just 64MB, it is
> just my block size.
>
> Any suggestions will be appreciated.
>
> Thank you in advance!
>
> --
> Best Wishes!
> 顺送商祺!
>
> --
> Chen He
> (402)613-9298
> PhD. student of CSE Dept.
> Holland Computing Center
> University of Nebraska-Lincoln
> Lincoln NE 68588
>



-- 
Eric Sammer
phone: +1-917-287-2675
twitter: esammer
data: www.cloudera.com

Re: Hadoop does not follow my setting

Posted by He Chen <ai...@gmail.com>.
In some extents, for 30GB file, if it is well balanced the overhead imposed
by data locality may not be too much. We will see. I will report my results
to this mail-list.

On Thu, Apr 22, 2010 at 2:44 PM, Allen Wittenauer
<aw...@linkedin.com>wrote:

>
> On Apr 22, 2010, at 11:46 AM, He Chen wrote:
>
> > Yes, but if you have more mappers, you may have more waves to execute. I
> > mean if I have 110 mappers for a job and I only have 22 cores. Then, it
> will
> > execute 5 waves approximately, If I have only 22 mappers, It will save
> the
> > overhead time.
>
> But you'll sacrifice data locality, which means that instead of testing the
> cpu, you'll be testing cpu+network.
>
>
>


-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588

Re: Hadoop does not follow my setting

Posted by Allen Wittenauer <aw...@linkedin.com>.
On Apr 22, 2010, at 11:46 AM, He Chen wrote:

> Yes, but if you have more mappers, you may have more waves to execute. I
> mean if I have 110 mappers for a job and I only have 22 cores. Then, it will
> execute 5 waves approximately, If I have only 22 mappers, It will save the
> overhead time.

But you'll sacrifice data locality, which means that instead of testing the cpu, you'll be testing cpu+network.



Re: Hadoop does not follow my setting

Posted by He Chen <ai...@gmail.com>.
Yes, but if you have more mappers, you may have more waves to execute. I
mean if I have 110 mappers for a job and I only have 22 cores. Then, it will
execute 5 waves approximately, If I have only 22 mappers, It will save the
overhead time.

2010/4/22 Edward Capriolo <ed...@gmail.com>

> 2010/4/22 He Chen <ai...@gmail.com>
>
> > Hi Raymond Jennings III
> >
> > I use 22 mappers because I have 22 cores in my clusters. Is this what you
> > want?
> >
> > On Thu, Apr 22, 2010 at 11:55 AM, Raymond Jennings III <
> > raymondjiii@yahoo.com> wrote:
> >
> > > Isn't the number of mappers specified "only a suggestion" ?
> > >
> > > --- On Thu, 4/22/10, He Chen <ai...@gmail.com> wrote:
> > >
> > > > From: He Chen <ai...@gmail.com>
> > > > Subject: Hadoop does not follow my setting
> > > > To: common-user@hadoop.apache.org
> > > > Date: Thursday, April 22, 2010, 12:50 PM
> > >  > Hi everyone
> > > >
> > > > I am doing a benchmark by using Hadoop 0.20.0's wordcount
> > > > example. I have a
> > > > 30GB file. I plan to test differenct number of mappers'
> > > > performance. For
> > > > example, for a wordcount job, I plan to test 22 mappers, 44
> > > > mappers, 66
> > > > mappers and 110 mappers.
> > > >
> > > > However, I set the "mapred.map.tasks" equals to 22. But
> > > > when I ran the job,
> > > > it shows 436 mappers total.
> > > >
> > > > I think maybe the wordcount set its parameters inside the
> > > > its own program. I
> > > > give "-Dmapred.map.tasks=22" to this program. But it is
> > > > still 436 again in
> > > > my another try.  I found out that 30GB divide by 436
> > > > is just 64MB, it is
> > > > just my block size.
> > > >
> > > > Any suggestions will be appreciated.
> > > >
> > > > Thank you in advance!
> > > >
> > > > --
> > > > Best Wishes!
> > > > 顺送商祺!
> > > >
> > > > --
> > > > Chen He
> > > > (402)613-9298
> > > > PhD. student of CSE Dept.
> > > > Holland Computing Center
> > > > University of Nebraska-Lincoln
> > > > Lincoln NE 68588
> > > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Best Wishes!
> > 顺送商祺!
> >
> > --
> > Chen He
> > (402)613-9298
> > PhD. student of CSE Dept.
> > Holland Computing Center
> > University of Nebraska-Lincoln
> > Lincoln NE 68588
> >
>
> No matter how many total mappers exist for the job only a certain number of
> them run at once.
>



-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588

Re: Hadoop does not follow my setting

Posted by Edward Capriolo <ed...@gmail.com>.
2010/4/22 He Chen <ai...@gmail.com>

> Hi Raymond Jennings III
>
> I use 22 mappers because I have 22 cores in my clusters. Is this what you
> want?
>
> On Thu, Apr 22, 2010 at 11:55 AM, Raymond Jennings III <
> raymondjiii@yahoo.com> wrote:
>
> > Isn't the number of mappers specified "only a suggestion" ?
> >
> > --- On Thu, 4/22/10, He Chen <ai...@gmail.com> wrote:
> >
> > > From: He Chen <ai...@gmail.com>
> > > Subject: Hadoop does not follow my setting
> > > To: common-user@hadoop.apache.org
> > > Date: Thursday, April 22, 2010, 12:50 PM
> >  > Hi everyone
> > >
> > > I am doing a benchmark by using Hadoop 0.20.0's wordcount
> > > example. I have a
> > > 30GB file. I plan to test differenct number of mappers'
> > > performance. For
> > > example, for a wordcount job, I plan to test 22 mappers, 44
> > > mappers, 66
> > > mappers and 110 mappers.
> > >
> > > However, I set the "mapred.map.tasks" equals to 22. But
> > > when I ran the job,
> > > it shows 436 mappers total.
> > >
> > > I think maybe the wordcount set its parameters inside the
> > > its own program. I
> > > give "-Dmapred.map.tasks=22" to this program. But it is
> > > still 436 again in
> > > my another try.  I found out that 30GB divide by 436
> > > is just 64MB, it is
> > > just my block size.
> > >
> > > Any suggestions will be appreciated.
> > >
> > > Thank you in advance!
> > >
> > > --
> > > Best Wishes!
> > > 顺送商祺!
> > >
> > > --
> > > Chen He
> > > (402)613-9298
> > > PhD. student of CSE Dept.
> > > Holland Computing Center
> > > University of Nebraska-Lincoln
> > > Lincoln NE 68588
> > >
> >
> >
> >
> >
>
>
> --
> Best Wishes!
> 顺送商祺!
>
> --
> Chen He
> (402)613-9298
> PhD. student of CSE Dept.
> Holland Computing Center
> University of Nebraska-Lincoln
> Lincoln NE 68588
>

No matter how many total mappers exist for the job only a certain number of
them run at once.

Re: Hadoop does not follow my setting

Posted by He Chen <ai...@gmail.com>.
Hi Raymond Jennings III

I use 22 mappers because I have 22 cores in my clusters. Is this what you
want?

On Thu, Apr 22, 2010 at 11:55 AM, Raymond Jennings III <
raymondjiii@yahoo.com> wrote:

> Isn't the number of mappers specified "only a suggestion" ?
>
> --- On Thu, 4/22/10, He Chen <ai...@gmail.com> wrote:
>
> > From: He Chen <ai...@gmail.com>
> > Subject: Hadoop does not follow my setting
> > To: common-user@hadoop.apache.org
> > Date: Thursday, April 22, 2010, 12:50 PM
>  > Hi everyone
> >
> > I am doing a benchmark by using Hadoop 0.20.0's wordcount
> > example. I have a
> > 30GB file. I plan to test differenct number of mappers'
> > performance. For
> > example, for a wordcount job, I plan to test 22 mappers, 44
> > mappers, 66
> > mappers and 110 mappers.
> >
> > However, I set the "mapred.map.tasks" equals to 22. But
> > when I ran the job,
> > it shows 436 mappers total.
> >
> > I think maybe the wordcount set its parameters inside the
> > its own program. I
> > give "-Dmapred.map.tasks=22" to this program. But it is
> > still 436 again in
> > my another try.  I found out that 30GB divide by 436
> > is just 64MB, it is
> > just my block size.
> >
> > Any suggestions will be appreciated.
> >
> > Thank you in advance!
> >
> > --
> > Best Wishes!
> > 顺送商祺!
> >
> > --
> > Chen He
> > (402)613-9298
> > PhD. student of CSE Dept.
> > Holland Computing Center
> > University of Nebraska-Lincoln
> > Lincoln NE 68588
> >
>
>
>
>


-- 
Best Wishes!
顺送商祺!

--
Chen He
(402)613-9298
PhD. student of CSE Dept.
Holland Computing Center
University of Nebraska-Lincoln
Lincoln NE 68588

Re: Hadoop does not follow my setting

Posted by Raymond Jennings III <ra...@yahoo.com>.
Isn't the number of mappers specified "only a suggestion" ?

--- On Thu, 4/22/10, He Chen <ai...@gmail.com> wrote:

> From: He Chen <ai...@gmail.com>
> Subject: Hadoop does not follow my setting
> To: common-user@hadoop.apache.org
> Date: Thursday, April 22, 2010, 12:50 PM
> Hi everyone
> 
> I am doing a benchmark by using Hadoop 0.20.0's wordcount
> example. I have a
> 30GB file. I plan to test differenct number of mappers'
> performance. For
> example, for a wordcount job, I plan to test 22 mappers, 44
> mappers, 66
> mappers and 110 mappers.
> 
> However, I set the "mapred.map.tasks" equals to 22. But
> when I ran the job,
> it shows 436 mappers total.
> 
> I think maybe the wordcount set its parameters inside the
> its own program. I
> give "-Dmapred.map.tasks=22" to this program. But it is
> still 436 again in
> my another try.  I found out that 30GB divide by 436
> is just 64MB, it is
> just my block size.
> 
> Any suggestions will be appreciated.
> 
> Thank you in advance!
> 
> -- 
> Best Wishes!
> 顺送商祺!
> 
> --
> Chen He
> (402)613-9298
> PhD. student of CSE Dept.
> Holland Computing Center
> University of Nebraska-Lincoln
> Lincoln NE 68588
>