You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Naama Kraus <na...@gmail.com> on 2008/07/31 08:24:35 UTC

Determining number of mappers and number of input splits

Hi,

I am a bit confused of how the framework determines the number of mappers of
a job and the number of input splits.
Could anyone summarize ?

I thought the number of mappers can't be determined by the user (only
hinted), is that correct ? What are the related configuration properties /
methods ?

How is the number of input splits being determined ? Their size ? How is
this one effected by the number of mappers ?

There are various threads in this issue (including questions of myself), but
I still don't have a clear picture.

Thanks for any information,
Naama

-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Determining number of mappers and number of input splits

Posted by Naama Kraus <na...@gmail.com>.
Thanks for the info, Naama

On Sat, Aug 2, 2008 at 1:14 AM, James Moore <ja...@gmail.com> wrote:

> On Wed, Jul 30, 2008 at 11:24 PM, Naama Kraus <na...@gmail.com>
> wrote:
> > Hi,
> >
> > I am a bit confused of how the framework determines the number of mappers
> of
> > a job and the number of input splits.
> > Could anyone summarize ?
>
> Take a look at http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> Things start to become a little more clear when you think about
> Hadoop-size datasets.  It's common that you usually care about tuning
> the number of simultaneous jobs running on a single machine (one per
> core?  one per hard drive? one per <whatever>?), and the total number
> is just "many."
>
> --
> James Moore | james@restphone.com
> Ruby and Ruby on Rails consulting
> blog.restphone.com
>



-- 
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)

Re: Determining number of mappers and number of input splits

Posted by James Moore <ja...@gmail.com>.
On Wed, Jul 30, 2008 at 11:24 PM, Naama Kraus <na...@gmail.com> wrote:
> Hi,
>
> I am a bit confused of how the framework determines the number of mappers of
> a job and the number of input splits.
> Could anyone summarize ?

Take a look at http://wiki.apache.org/hadoop/HowManyMapsAndReduces

Things start to become a little more clear when you think about
Hadoop-size datasets.  It's common that you usually care about tuning
the number of simultaneous jobs running on a single machine (one per
core?  one per hard drive? one per <whatever>?), and the total number
is just "many."

-- 
James Moore | james@restphone.com
Ruby and Ruby on Rails consulting
blog.restphone.com