You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@hadoop.apache.org by Syed Wasti <md...@hotmail.com> on 2010/07/21 19:15:29 UTC

Lazy initialization of Reducers

Hi,

I read about this Reducer Lazy initialization in a document found in the below URL.

http://www.scribd.com/doc/23046928/Hadoop-Performance-Tuning



It says “:In M/R job Reducers are initialized with Mappers at the job initialization, but the reduce method is called in reduce phase when all the maps had been finished. So in large jobs where Reducer loads data (>100 MB for business logic) in-memory on initialization, the performance can be increased by lazily initializing Reducers i.e. loading data in reduce method controlled by an initialize flag variable which assures that it is loaded only once. By lazily initializing Reducers which require memory (for business logic) on initialization, number of maps can be increased.”



But I did not find any other resource which talks about Reducer Lazy initialization.

Does anyone have experience on this ?

If yes, how and where can I set this parameter to get it working.



Thanks for the support.


Regards
Syed Wasti

Re: Lazy initialization of Reducers

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

Moving to mapreduce-user@, bcc general@. Please do not use the  
general@ list for project specific discussions.

On Jul 21, 2010, at 10:15 AM, Syed Wasti wrote:
> It says “:In M/R job Reducers are initialized with Mappers at the  
> job initialization, but the reduce method is called in reduce phase  
> when all the maps had been finished. So in large jobs where Reducer  
> loads data (>100 MB for business logic) in-memory on initialization,  
> the performance can be increased by lazily initializing Reducers  
> i.e. loading data in reduce method controlled by an initialize flag  
> variable which assures that it is loaded only once. By lazily  
> initializing Reducers which require memory (for business logic) on  
> initialization, number of maps can be increased.”

The part about 'loading data in reduce method controlled by an  
initialize flag variable which assures that it is loaded only once'  
makes no sense to me.

However, you can 'slowstart' reduces by ensuring sufficient maps are  
complete before _any_ reduces are launched... from mapred-default.xml:

<property>
   <name>mapred.reduce.slowstart.completed.maps</name>
   <value>0.05</value>
   <description>Fraction of the number of maps in the job which should  
be
   complete before reduces are scheduled for the job.
   </description>
</property>

Arun

Re: Lazy initialization of Reducers

Posted by Ted Yu <yu...@gmail.com>.

I don't find such parameter in 0.20.2

Please create such flag in your own class.

On Wed, Jul 21, 2010 at 10:15 AM, Syed Wasti <md...@hotmail.com> wrote:

>
> Hi,
>
> I read about this Reducer Lazy initialization in a document found in the
> below URL.
>
> http://www.scribd.com/doc/23046928/Hadoop-Performance-Tuning
>
>
>
> It says “:In M/R job Reducers are initialized with Mappers at the job
> initialization, but the reduce method is called in reduce phase when all the
> maps had been finished. So in large jobs where Reducer loads data (>100 MB
> for business logic) in-memory on initialization, the performance can be
> increased by lazily initializing Reducers i.e. loading data in reduce method
> controlled by an initialize flag variable which assures that it is loaded
> only once. By lazily initializing Reducers which require memory (for
> business logic) on initialization, number of maps can be increased.”
>
>
>
> But I did not find any other resource which talks about Reducer Lazy
> initialization.
>
> Does anyone have experience on this ?
>
> If yes, how and where can I set this parameter to get it working.
>
>
>
> Thanks for the support.
>
>
> Regards
> Syed Wasti
>
>
>

Re: Lazy initialization of Reducers

Posted by Arun C Murthy <ac...@yahoo-inc.com>.

Moving to mapreduce-user@, bcc general@. Please do not use the  
general@ list for project specific discussions.

On Jul 21, 2010, at 10:15 AM, Syed Wasti wrote:
> It says “:In M/R job Reducers are initialized with Mappers at the  
> job initialization, but the reduce method is called in reduce phase  
> when all the maps had been finished. So in large jobs where Reducer  
> loads data (>100 MB for business logic) in-memory on initialization,  
> the performance can be increased by lazily initializing Reducers  
> i.e. loading data in reduce method controlled by an initialize flag  
> variable which assures that it is loaded only once. By lazily  
> initializing Reducers which require memory (for business logic) on  
> initialization, number of maps can be increased.”

The part about 'loading data in reduce method controlled by an  
initialize flag variable which assures that it is loaded only once'  
makes no sense to me.

However, you can 'slowstart' reduces by ensuring sufficient maps are  
complete before _any_ reduces are launched... from mapred-default.xml:

<property>
   <name>mapred.reduce.slowstart.completed.maps</name>
   <value>0.05</value>
   <description>Fraction of the number of maps in the job which should  
be
   complete before reduces are scheduled for the job.
   </description>
</property>

Arun