You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "eric baldeschwieler (JIRA)" <ji...@apache.org> on 2006/11/07 05:12:38 UTC

[jira] Commented: (HADOOP-686) job.setOutputValueComparatorClass(theClass) should be supported

    [ http://issues.apache.org/jira/browse/HADOOP-686?page=comments#action_12447621 ] 
            
eric baldeschwieler commented on HADOOP-686:
--------------------------------------------

There exist hacks to achieve this effect now.  I'll try to scare up an expert to document them.  Also a good long term feature.

> job.setOutputValueComparatorClass(theClass) should be supported
> ---------------------------------------------------------------
>
>                 Key: HADOOP-686
>                 URL: http://issues.apache.org/jira/browse/HADOOP-686
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: mapred
>         Environment: all environment
>            Reporter: Feng Jiang
>
> if the input of Reduce phase is :
> K2, V3
> K2, V2
> K1, V5
> K1, V3
> K1, V4
> in the current hadoop, the reduce output could be:
> K1, (V5, V3, V4)
> K2, (V3, V2)
> But I hope hadoop supports job.setOutputValueComparatorClass(theClass), so that i can make values are in order, and the output could be:
> K1, (V3, V4, V5) 
> K2, (V2, V3)
> This feature is very important, I think. Without it, we have to take the sorting by ourselves, and have to worry about the possibility that the values are too large to fit into memory. Then the codes becomes too hard to read. That is the reason why i think this feature is so important, and should be done in the hadoop framework.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira