You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ted Yu (JIRA)" <ji...@apache.org> on 2010/07/21 19:39:53 UTC

[jira] Created: (MAPREDUCE-1956) allow reducer to initialize lazily

allow reducer to initialize lazily
----------------------------------

                 Key: MAPREDUCE-1956
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1956
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: tasktracker
    Affects Versions: 0.20.2
            Reporter: Ted Yu


>From http://www.scribd.com/doc/23046928/Hadoop-Performance-Tuning:
"In M/R job Reducers are initialized with Mappers at the job initialization, but the reduce method is called in reduce phase when all the maps had been finished. So in large jobs where Reducer loads data (>100 MB for business logic) in-memory on initialization, the performance can be increased by lazily initializing Reducers i.e. loading data in reduce method controlled by an initialize flag variable which assures that it is loaded only once. By lazily initializing Reducers which require memory (for business logic) on initialization, number of maps can be increased."

Introducing a parameter for this purpose would allow more people to utilize the above pattern.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (MAPREDUCE-1956) allow reducer to initialize lazily

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy resolved MAPREDUCE-1956.
--------------------------------------

    Resolution: Invalid

The part about 'loading data in reduce method controlled by an  
initialize flag variable which assures that it is loaded only once'  
makes no sense to me.

However, you can 'slowstart' reduces by ensuring sufficient maps are  
complete before _any_ reduces are launched... from mapred-default.xml:

<property>
  <name>mapred.reduce.slowstart.completed.maps</name>
  <value>0.05</value>
  <description>Fraction of the number of maps in the job which should  
be
  complete before reduces are scheduled for the job.
  </description>
</property>

> allow reducer to initialize lazily
> ----------------------------------
>
>                 Key: MAPREDUCE-1956
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1956
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tasktracker
>    Affects Versions: 0.20.2
>            Reporter: Ted Yu
>
> From http://www.scribd.com/doc/23046928/Hadoop-Performance-Tuning:
> "In M/R job Reducers are initialized with Mappers at the job initialization, but the reduce method is called in reduce phase when all the maps had been finished. So in large jobs where Reducer loads data (>100 MB for business logic) in-memory on initialization, the performance can be increased by lazily initializing Reducers i.e. loading data in reduce method controlled by an initialize flag variable which assures that it is loaded only once. By lazily initializing Reducers which require memory (for business logic) on initialization, number of maps can be increased."
> Introducing a parameter for this purpose would allow more people to utilize the above pattern.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.