You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Naama Kraus <na...@gmail.com> on 2008/07/31 08:24:35 UTC
Determining number of mappers and number of input splits
Hi,
I am a bit confused of how the framework determines the number of mappers of
a job and the number of input splits.
Could anyone summarize ?
I thought the number of mappers can't be determined by the user (only
hinted), is that correct ? What are the related configuration properties /
methods ?
How is the number of input splits being determined ? Their size ? How is
this one effected by the number of mappers ?
There are various threads in this issue (including questions of myself), but
I still don't have a clear picture.
Thanks for any information,
Naama
--
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)
Re: Determining number of mappers and number of input splits
Posted by Naama Kraus <na...@gmail.com>.
Thanks for the info, Naama
On Sat, Aug 2, 2008 at 1:14 AM, James Moore <ja...@gmail.com> wrote:
> On Wed, Jul 30, 2008 at 11:24 PM, Naama Kraus <na...@gmail.com>
> wrote:
> > Hi,
> >
> > I am a bit confused of how the framework determines the number of mappers
> of
> > a job and the number of input splits.
> > Could anyone summarize ?
>
> Take a look at http://wiki.apache.org/hadoop/HowManyMapsAndReduces
>
> Things start to become a little more clear when you think about
> Hadoop-size datasets. It's common that you usually care about tuning
> the number of simultaneous jobs running on a single machine (one per
> core? one per hard drive? one per <whatever>?), and the total number
> is just "many."
>
> --
> James Moore | james@restphone.com
> Ruby and Ruby on Rails consulting
> blog.restphone.com
>
--
oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo 00 oo
00 oo 00 oo
"If you want your children to be intelligent, read them fairy tales. If you
want them to be more intelligent, read them more fairy tales." (Albert
Einstein)
Re: Determining number of mappers and number of input splits
Posted by James Moore <ja...@gmail.com>.
On Wed, Jul 30, 2008 at 11:24 PM, Naama Kraus <na...@gmail.com> wrote:
> Hi,
>
> I am a bit confused of how the framework determines the number of mappers of
> a job and the number of input splits.
> Could anyone summarize ?
Take a look at http://wiki.apache.org/hadoop/HowManyMapsAndReduces
Things start to become a little more clear when you think about
Hadoop-size datasets. It's common that you usually care about tuning
the number of simultaneous jobs running on a single machine (one per
core? one per hard drive? one per <whatever>?), and the total number
is just "many."
--
James Moore | james@restphone.com
Ruby and Ruby on Rails consulting
blog.restphone.com