You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Jerry Chen (JIRA)" <ji...@apache.org> on 2012/12/11 03:27:21 UTC

[jira] [Created] (MAPREDUCE-4868) Allow multiple iteration for map

Jerry Chen created MAPREDUCE-4868:
-------------------------------------

             Summary: Allow multiple iteration for map
                 Key: MAPREDUCE-4868
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4868
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mrv2
    Affects Versions: 3.0.0, 2.0.3-alpha
            Reporter: Jerry Chen
             Fix For: 3.0.0, 2.0.3-alpha


Currently, the Mapper class allows advanced users to override "public void run(Context context)" method for more control over the map the execution of the mapper, while Context interface limit the operations over the data which is the foundation of "more control".

One of use cases is that when I am considering a hive optimziation problem, I want to go two passes over the input data instead of using a another job or task ( which may slower the whole process). Each pass do the same thing but with a different parameters.

This is a new paradigm of Map Reduce usage and can be archived easily by extend Context interface a little with the more control over the data such as reset the input.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira