You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2006/05/25 22:25:30 UTC

[jira] Commented: (HADOOP-254) use http to shuffle data between the maps and the reduces

    [ http://issues.apache.org/jira/browse/HADOOP-254?page=comments#action_12413298 ] 

Doug Cutting commented on HADOOP-254:
-------------------------------------

This looks great!

A couple of improvements:

1. in MapOutputLocation.getFile(), shouldn't things be closed in a 'finally' clause?
2. does MapOutputFile still need to be a Writable?  I don't think so.  We should remove its write & readFields implementations & any other methods that are no longer called.
3. do we have any way to detect when map outputs are lost or corrupted?  that was a useful mechanism that i'd hate to lose.
4. Sameer promised that you'd remove RPC.callRaw() in this patch.


> use http to shuffle data between the maps and the reduces
> ---------------------------------------------------------
>
>          Key: HADOOP-254
>          URL: http://issues.apache.org/jira/browse/HADOOP-254
>      Project: Hadoop
>         Type: Improvement

>   Components: mapred
>     Versions: 0.2.1
>     Reporter: Owen O'Malley
>     Assignee: Owen O'Malley
>      Fix For: 0.3
>  Attachments: http-shuffle.patch
>
> To speed up the shuffle time, I'll use http (via the task tracker's jetty server) to send the map outputs.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira