You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2007/04/19 08:15:15 UTC

[jira] Created: (HADOOP-1270) Randomize the fetch of map outputs

Randomize the fetch of map outputs
----------------------------------

                 Key: HADOOP-1270
                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
             Project: Hadoop
          Issue Type: Improvement
          Components: mapred
            Reporter: Arun C Murthy
             Fix For: 0.13.0


HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 

However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.

I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.

Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12491774 ] 

Doug Cutting commented on HADOOP-1270:
--------------------------------------

This looks reasonable to me.  Have you yet tested whether it improves performance?

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1270:
----------------------------------

    Attachment: HADOOP-1270_20070505_3.patch

Miniscule documentation fix:

$ diff HADOOP-1270_20070504_2.patch HADOOP-1270_20070505_3.patch 
> -     * a hashmap from mapId to MapOutputLocation for retrials
> +     * a list of map output locations for fetch retrials 


> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, HADOOP-1270_20070504_2.patch, HADOOP-1270_20070505_3.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1270:
----------------------------------

    Affects Version/s: 0.12.3
               Status: Patch Available  (was: Open)

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, HADOOP-1270_20070504_2.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy reassigned HADOOP-1270:
-------------------------------------

    Assignee: Arun C Murthy

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493395 ] 

Hadoop QA commented on HADOOP-1270:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12356229/HADOOP-1270_20070425_1.patch applied and successfully tested against trunk revision r534624.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/109/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/109/console

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490118 ] 

Doug Cutting commented on HADOOP-1270:
--------------------------------------

> randomize the list before firing the fetches

Yes, that was the original reason for randomizing, to avoid overloading nodes as their maps complete.  +1


> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>             Fix For: 0.13.0
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1270:
----------------------------------

    Attachment: HADOOP-1270_20070504_2.patch

Here is an updated version of the patch to reflect changes to trunk... basically HADOOP-1144 conflicted with this patch, couldn't help it.

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, HADOOP-1270_20070504_2.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1270:
---------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Arun!

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, HADOOP-1270_20070504_2.patch, HADOOP-1270_20070505_3.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1270:
---------------------------------

    Status: Open  (was: Patch Available)

Sorry, this patch no longer applies cleanly to trunk.  Can you please generate a new version?  Thanks!

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1270:
----------------------------------

    Attachment: post-H-1270.png
                pre-H-1270.png

Couple of screenshots to illustrate performance gains of this patch during a run of the 'sort' benchmark on ~400 nodes.

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1270:
----------------------------------

    Status: Patch Available  (was: Open)

Sorry, got sidetracked on other issues.

I've tested this patch and see ~10-15% improvement in shuffle (copy phase in MB/s)... marking this ready to go.

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-1270:
----------------------------------

    Attachment: HADOOP-1270_20070425_1.patch

Here is a straight-forward patch which randomizes the shuffles...

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494252 ] 

Hadoop QA commented on HADOOP-1270:
-----------------------------------

Integrated in Hadoop-Nightly #82 (See http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/82/)

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, HADOOP-1270_20070504_2.patch, HADOOP-1270_20070505_3.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493600 ] 

Hadoop QA commented on HADOOP-1270:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12356749/HADOOP-1270_20070504_2.patch applied and successfully tested against trunk revision r534975.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/116/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/116/console

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.12.3
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch, HADOOP-1270_20070504_2.patch, post-H-1270.png, pre-H-1270.png
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-1270) Randomize the fetch of map outputs

Posted by "Devaraj Das (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493440 ] 

Devaraj Das commented on HADOOP-1270:
-------------------------------------

+1
Also, this patch solves another issue that was reported as part of HADOOP-1183. That is, it solves the problem of old map events (corresponding to the failing fetches from locations which are currently not valid) overwriting new valid map events. This is because the datastructure, knownOutputs, has been made a List in this patch (earlier it was a Map) and so MapOutputLocations will get appended to the list rather than being overwritten.

> Randomize the fetch of map outputs
> ----------------------------------
>
>                 Key: HADOOP-1270
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1270
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>         Assigned To: Arun C Murthy
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1270_20070425_1.patch
>
>
> HADOOP-248 did away with random probing of maps for locating map outputs and instead we now rely on TaskCompletionEvents for the same. 
> However we lost out on the benefit that the randomization in probing resulted in an added benefit where the map's jetty isn't overloaded with requests for the outputs. We have now a situation where a map completes, the JT is notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp the poor map's jetty and this repeats for each map.
> I propose we make a minor change where we collect a set of TaskCompletionEvents and randomize the list before firing the fetches. Should help fix this mass-hysteria at the map's jetty.
> Thoughts?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.