You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Vivek Ratan (JIRA)" <ji...@apache.org> on 2007/07/11 02:16:04 UTC

[jira] Assigned: (HADOOP-475) The value iterator to reduce function should be clonable

     [ https://issues.apache.org/jira/browse/HADOOP-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Vivek Ratan reassigned HADOOP-475:
----------------------------------

    Assignee: Vivek Ratan  (was: Owen O'Malley)

> The value iterator to reduce function should be clonable
> --------------------------------------------------------
>
>                 Key: HADOOP-475
>                 URL: https://issues.apache.org/jira/browse/HADOOP-475
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>            Reporter: Runping Qi
>            Assignee: Vivek Ratan
>
> In the current framework, when the user implements the reduce method of Reducer class, 
> the user can only iterate through the value iterator once. 
> This makes it hard for the user to perform join-like operations with in the reduce method. 
> To address problem, one approach is to make the input value iterator clonable. Then the user can iterate the values in different ways.
> If the iterator can be reset, then the user can perform nested iterations over the data, thus 
> carry out join-likeoperations.
> The user code in reduce method would be something like:
>                   iterator1 = values.clone();
>                   iterator2 = values.clone();
>                  while (iterator1.hasNext()) {
>                       val1 = iterator1.next();
>                       iterator2.reset();
>                       while (iterator2.hasNext()) {
>                            val2 = iterator.next();
>                            do something vased on val1 and val2
>                            .......................
>                       }
>                  }
> One possible optimization is that if the values are sorted based on a secondary key, 
> the reset function can take a secondary key as an argument and reset the iterator to the begining
> position of the secondary key. It will be very helpful if there is a utility that returns a list of iterators,
> one per secondary key value, from the given iterator:
>                           TreeMap getIteratorsBasedOnSecondaryKey(iterator);
> Each entry in the returned map object is a pair of <secondary key, iterator for the values with the same secondary key>.
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.