You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2008/04/22 15:02:22 UTC

[jira] Resolved: (HADOOP-3275) Reduce task does not handle map outputs fetching efficiently

     [ https://issues.apache.org/jira/browse/HADOOP-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das resolved HADOOP-3275.
---------------------------------

    Resolution: Duplicate

The discussion and the associated fix can continue in HADOOP-3297

> Reduce task does not handle map outputs fetching efficiently 
> -------------------------------------------------------------
>
>                 Key: HADOOP-3275
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3275
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>
> I ran a job just counting the number of records in the input data (with combiner)
> The map phase took about less than 10 minutes.
> But the shuffling took additional 30 minutes!
> After examining the code and experimenting a few tweakings, we discovered the probe_sample_size (50/100) in task tracker is too small. 
> The fetchers just cannot be kept busy. After changing that probing size to 5000, the shuffling time reduce to 13 minutes.
> With that setting, the fetching (30 threads) became bottleneck. That is basically limited by how many http fetches a thread can do per second.
> To further improve shuffling, we may need to consider to use keep alive http connection and fetch multiple segments per http connection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.