You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Harsh J (Resolved) (JIRA)" <ji...@apache.org> on 2012/01/16 10:17:40 UTC

[jira] [Resolved] (MAPREDUCE-221) Generic 'Sort' Infrastructure for Map-Reduce framework.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Harsh J resolved MAPREDUCE-221.
-------------------------------

    Resolution: Incomplete

This has grown stale, and am closing it out.

While most of this can be (rather easily) done with the pluggable APIs we provide today, if you still feel the framework ought to carry this instead in generic fashion (which sort of limits it, given users' many data formats Hadoop crunches on today), please do reopen.
                
> Generic 'Sort' Infrastructure for Map-Reduce framework.
> -------------------------------------------------------
>
>                 Key: MAPREDUCE-221
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-221
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>
> It would be useful to add a generic *sort* infrastructure to the Map-Reduce framework to ease usage.
> Specifically the idea to add a fairly generic and powerful *comparator* which can be configured by the user to meet his specific needs.
> Spec:
> --------
>  
>   The proposal is to model generic (uber) comparator along the lines of the the standard unix *sort* command. The comparator provides the following (configurable) functionality:
>   a) Separator for breaking up the data (stream) into 'columns'.
>   b) Multiple key ranges for specifying priorities of 'columns'. (ala --keys/-k option of unix sort i.e. -k 2,3 -k 1,4 etc.)
>   c) A variant of a) to let user specify byte range-boundaries without using a separator for 'columns'.
>   d) Option to sort 'reverse'.
>   e) Option to do a 'stable' sort i.e. don't do a last-ditch comparision of all bytes if all key ranges match.
>   f) Option to do 'numeric' comparisions instead of lexicographical comparisions?
>   Of course all these are optional with the default behaviour as-is today.
>      - * - * -
>  Anything more/less?
> thanks,
> Arun

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira