You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2009/05/14 09:18:45 UTC

[jira] Created: (HADOOP-5830) Reuse output collectors across maps running on the same jvm

Reuse output collectors across maps running on the same jvm
-----------------------------------------------------------

Key: HADOOP-5830
URL: https://issues.apache.org/jira/browse/HADOOP-5830
Project: Hadoop Core
Issue Type: Improvement
Components: mapred
Reporter: Arun C Murthy

We have evidence that cutting the shuffle-crossbar between maps and reduces (m * r) leads to perfomant applications since:
# It cuts down the number of connections necessary to shuffle and hence reduces load on the serving-side (TaskTracker) and improves latency (terasort, HADOOP-1338, HADOOP-5223)
# Reduces seeks required for the TaskTracker to serve the map-outputs

So far we've had to manually tune applications to cut down the shuffle- crossbar by having fatter maps with custom input formats etc. For e.g. we saw a significant improvement while running the petasort when we went from ~800,000 maps to 80,00 maps (1.5G to 15G per map) i.e. from 48+ hours to 16 hours,

The downsides are:
# The burden falls on the application-writer to tune this with custom input-formats etc.
# The naive method of using a higher min.split.size leads to considerable non-local i/o on the maps.

Given these, the proposal is to keep the 'output collector' open across jvm reuse for maps, there-by enabling 'combiners' across map-tasks. This would have the happy-effect of fixing both the above. The downsides are that it will add latency to jobs (since map-outputs cannot be shuffled till a few maps on the same jvm are done, then followed by a final sort/merge/combine) and the failure cases get a bit more complicated.

Thoughts? Lets discuss...

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5830) Reuse output collectors across maps running on the same jvm

Posted by "Runping Qi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709396#action_12709396 ] 

Runping Qi commented on HADOOP-5830:
------------------------------------


I am glad this topic re-appeared:)
Why not pursuing the ideas of having large logical map tasks as discussed in  https://issues.apache.org/jira/browse/HADOOP-2560?


> Reuse output collectors across maps running on the same jvm
> -----------------------------------------------------------
>
>                 Key: HADOOP-5830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5830
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>
> We have evidence that cutting the shuffle-crossbar between maps and reduces (m * r) leads to perfomant applications since:
> # It cuts down the number of connections necessary to shuffle and hence reduces load on the serving-side (TaskTracker) and improves latency (terasort, HADOOP-1338, HADOOP-5223)
> # Reduces seeks required for the TaskTracker to serve the map-outputs
> So far we've had to manually tune applications to cut down the shuffle- crossbar by having fatter maps with custom input formats etc. For e.g. we saw a significant improvement while running the petasort when we went from ~800,000 maps to 80,00 maps (1.5G to 15G per map) i.e. from 48+ hours to 16 hours,  
> The downsides are:
> # The burden falls on the application-writer to tune this with custom input-formats etc.
> # The naive method of using a higher min.split.size leads to considerable non-local i/o on the maps.
> Given these, the proposal is to keep the 'output collector' open across jvm reuse for maps, there-by enabling 'combiners' across map-tasks. This would have the happy-effect of fixing both the above. The downsides are that it will add latency to jobs (since map-outputs cannot be shuffled till a few maps on the same jvm are done, then followed by a final sort/merge/combine) and the failure cases get a bit more complicated.
> Thoughts? Lets discuss...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5830) Reuse output collectors across maps running on the same jvm

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709456#action_12709456 ] 

Hong Tang commented on HADOOP-5830:
-----------------------------------

This is probably complementary to Hadoop-2560 (btw, is the new CombineFileInputFormat capable of picking blocks from different files and lump them into one input split?). I consider this approach is better because it could achieve better load balancing, and lower overhead of map failure or speculative execution.

> Reuse output collectors across maps running on the same jvm
> -----------------------------------------------------------
>
>                 Key: HADOOP-5830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5830
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>
> We have evidence that cutting the shuffle-crossbar between maps and reduces (m * r) leads to perfomant applications since:
> # It cuts down the number of connections necessary to shuffle and hence reduces load on the serving-side (TaskTracker) and improves latency (terasort, HADOOP-1338, HADOOP-5223)
> # Reduces seeks required for the TaskTracker to serve the map-outputs
> So far we've had to manually tune applications to cut down the shuffle- crossbar by having fatter maps with custom input formats etc. For e.g. we saw a significant improvement while running the petasort when we went from ~800,000 maps to 80,00 maps (1.5G to 15G per map) i.e. from 48+ hours to 16 hours,  
> The downsides are:
> # The burden falls on the application-writer to tune this with custom input-formats etc.
> # The naive method of using a higher min.split.size leads to considerable non-local i/o on the maps.
> Given these, the proposal is to keep the 'output collector' open across jvm reuse for maps, there-by enabling 'combiners' across map-tasks. This would have the happy-effect of fixing both the above. The downsides are that it will add latency to jobs (since map-outputs cannot be shuffled till a few maps on the same jvm are done, then followed by a final sort/merge/combine) and the failure cases get a bit more complicated.
> Thoughts? Lets discuss...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-5830) Reuse output collectors across maps running on the same jvm

Posted by "Hong Tang (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HADOOP-5830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12709302#action_12709302 ] 

Hong Tang commented on HADOOP-5830:
-----------------------------------

To minimize the impact on the job latency, we may disable this when less than x% map finishes.

> Reuse output collectors across maps running on the same jvm
> -----------------------------------------------------------
>
>                 Key: HADOOP-5830
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5830
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>
> We have evidence that cutting the shuffle-crossbar between maps and reduces (m * r) leads to perfomant applications since:
> # It cuts down the number of connections necessary to shuffle and hence reduces load on the serving-side (TaskTracker) and improves latency (terasort, HADOOP-1338, HADOOP-5223)
> # Reduces seeks required for the TaskTracker to serve the map-outputs
> So far we've had to manually tune applications to cut down the shuffle- crossbar by having fatter maps with custom input formats etc. For e.g. we saw a significant improvement while running the petasort when we went from ~800,000 maps to 80,00 maps (1.5G to 15G per map) i.e. from 48+ hours to 16 hours,  
> The downsides are:
> # The burden falls on the application-writer to tune this with custom input-formats etc.
> # The naive method of using a higher min.split.size leads to considerable non-local i/o on the maps.
> Given these, the proposal is to keep the 'output collector' open across jvm reuse for maps, there-by enabling 'combiners' across map-tasks. This would have the happy-effect of fixing both the above. The downsides are that it will add latency to jobs (since map-outputs cannot be shuffled till a few maps on the same jvm are done, then followed by a final sort/merge/combine) and the failure cases get a bit more complicated.
> Thoughts? Lets discuss...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.