You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by eltonsky <el...@hotmail.com> on 2010/04/15 12:49:48 UTC

What it's gonna happend when it comes to large number of maps?

Hello everyone,

I know when map func generates intermediate output, reduce func will pull
data directly from all maps' local disk. Although we can use combiner func
to minimize the amount of data, when we have many mappers, say 10,000, that
will be a crazy IO headache. And that dosen't seem right. 

Can anyone highlighten me on this?

Regards,
Elton
-- 
View this message in context: http://old.nabble.com/What-it%27s-gonna-happend-when-it-comes-to-large-number-of-maps--tp28253798p28253798.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.


Re: What it's gonna happend when it comes to large number of maps?

Posted by Rekha Joshi <re...@yahoo-inc.com>.
No guru to enlighten/highlighten ;) , but here's what I think -

This is a balance call between the in-memory and disk usage.If you think the mappers are high which can again be a function of your input size, block size, you can tune further parameters during map/reduce steps to avoid expensive IO operations.

Have you looked into the parameters on the hadoop tutorial page?
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Map+Parameters
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html#Shuffle%2FReduce+Parameters

Cheers,
/

On 4/15/10 4:19 PM, "eltonsky" <el...@hotmail.com> wrote:



Hello everyone,

I know when map func generates intermediate output, reduce func will pull
data directly from all maps' local disk. Although we can use combiner func
to minimize the amount of data, when we have many mappers, say 10,000, that
will be a crazy IO headache. And that dosen't seem right.

Can anyone highlighten me on this?

Regards,
Elton
--
View this message in context: http://old.nabble.com/What-it%27s-gonna-happend-when-it-comes-to-large-number-of-maps--tp28253798p28253798.html
Sent from the Hadoop core-dev mailing list archive at Nabble.com.