You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Vivek Ratan (JIRA)" <ji...@apache.org> on 2007/07/11 02:16:05 UTC
[jira] Resolved: (HADOOP-475) The value iterator to reduce function
should be clonable
[ https://issues.apache.org/jira/browse/HADOOP-475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vivek Ratan resolved HADOOP-475.
--------------------------------
Resolution: Won't Fix
As per Runping's comments, we don't need this functionality right away.
> The value iterator to reduce function should be clonable
> --------------------------------------------------------
>
> Key: HADOOP-475
> URL: https://issues.apache.org/jira/browse/HADOOP-475
> Project: Hadoop
> Issue Type: New Feature
> Components: mapred
> Reporter: Runping Qi
> Assignee: Vivek Ratan
>
> In the current framework, when the user implements the reduce method of Reducer class,
> the user can only iterate through the value iterator once.
> This makes it hard for the user to perform join-like operations with in the reduce method.
> To address problem, one approach is to make the input value iterator clonable. Then the user can iterate the values in different ways.
> If the iterator can be reset, then the user can perform nested iterations over the data, thus
> carry out join-likeoperations.
> The user code in reduce method would be something like:
> iterator1 = values.clone();
> iterator2 = values.clone();
> while (iterator1.hasNext()) {
> val1 = iterator1.next();
> iterator2.reset();
> while (iterator2.hasNext()) {
> val2 = iterator.next();
> do something vased on val1 and val2
> .......................
> }
> }
> One possible optimization is that if the values are sorted based on a secondary key,
> the reset function can take a secondary key as an argument and reset the iterator to the begining
> position of the secondary key. It will be very helpful if there is a utility that returns a list of iterators,
> one per secondary key value, from the given iterator:
> TreeMap getIteratorsBasedOnSecondaryKey(iterator);
> Each entry in the returned map object is a pair of <secondary key, iterator for the values with the same secondary key>.
>
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.