You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2009/12/04 23:33:20 UTC

[jira] Created: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Allow heartbeat interval smaller than 3 seconds for tiny clusters
-----------------------------------------------------------------

                 Key: MAPREDUCE-1266
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: jobtracker, task, tasktracker
    Affects Versions: 0.22.0
            Reporter: Todd Lipcon
            Priority: Minor


For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.

I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12788349#action_12788349 ] 

Todd Lipcon commented on MAPREDUCE-1266:
----------------------------------------

bq. If so should we just set the minimum lower to 1.5s or 2s and get it over with?

Sure, I'm happy to not add the conf knob. I'd say we should set the minimum to 0.5s - a localhost ping every 0.5 seconds is negligible on modern machines.

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786213#action_12786213 ] 

Allen Wittenauer commented on MAPREDUCE-1266:
---------------------------------------------

I'm probably be forgetful, but.. we have:

a) heartbeat interval
b) minimum heartbeat interval

such that

a > b, always.

If someone doesn't like b, does it matter?  Wouldn't they just tune a?  I guess i'm asking: why make b configurable at all?

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786474#action_12786474 ] 

Todd Lipcon commented on MAPREDUCE-1266:
----------------------------------------

It basically does that already - the clusterSize variable above is the number of task trackers. Reducing the minimum but leaving the other argument to Math.max should maintain the current behavior for large clusters, and automatically reduce on small ones.

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787419#action_12787419 ] 

Arun C Murthy commented on MAPREDUCE-1266:
------------------------------------------

bq. "We have 1 second of unavoidable slowness, so 3 more per task is OK?"

My response was:

bq. I'm failing to see how *sub-second* heartbeat intervals will help.

----

bq. Are there any substantive reasons against this? I didn't anticipate much discussion over a change in a single constant that only affects tiny clusters 

Adding more config knobs doesn't help... we will need to maintain them even if they are undocumented. I'm vary of adding more knobs without more thought. 

The blog post is quite vague. 

For e.g. would setting it to 2s help as much as setting it to .5s? Or 1.5s? If so should we just set the minimum lower to 1.5s or 2s and get it over with? 


> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831644#action_12831644 ] 

Todd Lipcon commented on MAPREDUCE-1266:
----------------------------------------

bq. if you are using jvm reuse, then that 1s disappears, right? 

Not really, since JVM reuse doesn't reuse between maps and reduces.

The time sequence of a small job looks like:

Client:
  Submit job
JT:
  Create tasks ("initialize job") on JT
  wait for a TT to heartbeat
TT:
  start JVM
child:
  process map task
TT:
  send accelerated heartbeat once map task is complete (I forget whether this is in 0.20 or came later)
  receive reduce task, start reduce JVM (regardless of JVM reuse)
child:
  process reduce task
TT:
  send completion heartbeat

I guess there are also some setup/cleanup tasks going on in there as well. Since we're talking about a hypothetical one map, one reduce, we're just cutting down the time between initting the job and getting the first JVM on a TT.

In a multimapper or multireducer job, the cost shows up in how long it takes for all of the tasks to get scheduled - it will only schedule one task per heartbeat with some schedulers. The fair scheduler after MAPREDUCE-706 can assign multiple at the same time, which should help substantially.

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787295#action_12787295 ] 

Vinod K V commented on MAPREDUCE-1266:
--------------------------------------

bq. BTW, the inspiration was this blog post: http://pero.blogs.aprilmayjune.org/2009/11/30/improve-performance-on-small-hadoop-clusters/
Just a thought, is this improvement perceivable at full/heavy job-load? Heart-beats in an idle cluster are mostly very cheap. Heavy load involves costlier heartbeat processing and so may or may not show the performance improvements you have outlined..

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786552#action_12786552 ] 

Arun C Murthy commented on MAPREDUCE-1266:
------------------------------------------

It takes ~1s for the map/reduce task JVM to come up... 

I'm failing to see how sub-second heartbeat intervals will help.

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786219#action_12786219 ] 

Todd Lipcon commented on MAPREDUCE-1266:
----------------------------------------

Well, actually, in trunk there's mapreduce.jobtracker.heartbeats.in.second which sets the individual trackers such that that number of heartbeats arrive every second. The default is 100, which would be a 10ms interval for a pseudo-distributed cluster, which is silly. So there's a minimum as well, hardcoded. Here's the relevant code:
{code}
    int heartbeatInterval =  Math.max(
                                (int)(1000 * HEARTBEATS_SCALING_FACTOR *
                                      Math.ceil((double)clusterSize /
                                                NUM_HEARTBEATS_IN_SECOND)),
                                HEARTBEAT_INTERVAL_MIN) ;
{code}

HEARTBEAT_INTERVAL_MIN is hard coded to 3 seconds in MRConstants.java.

Maybe I'm misunderstanding your question - are you in support of lowering the minimum and just asking why make it undocumented-configurable instead of hardcoded? I was offering the undocumented configuration option just in case someone had an argument against this change. If everyone's for it, happy to just change the constant.

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787276#action_12787276 ] 

Todd Lipcon commented on MAPREDUCE-1266:
----------------------------------------

Arun: I'm not following. "We have 1 second of unavoidable slowness, so 3 more per task is OK?" BTW, the inspiration was this blog post: http://pero.blogs.aprilmayjune.org/2009/11/30/improve-performance-on-small-hadoop-clusters/

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Konstantin Boudnik (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786470#action_12786470 ] 

Konstantin Boudnik commented on MAPREDUCE-1266:
-----------------------------------------------

Here's an interesting though: why don't we throw some ergonomics into Hadoop's intellect? If heartbeats seem to increase the latencies on small clusters then, perhaps, Hadoop can lower it dynamically if a small cluster is 'detected'?

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831631#action_12831631 ] 

Allen Wittenauer commented on MAPREDUCE-1266:
---------------------------------------------

if you are using jvm reuse, then that 1s disappears, right?


> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1266) Allow heartbeat interval smaller than 3 seconds for tiny clusters

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787296#action_12787296 ] 

Todd Lipcon commented on MAPREDUCE-1266:
----------------------------------------

That's actually not my blog post, I just came about it the other day and figured he had a good point.

Regarding idle vs "heavy job load", again, this is not going to be an improvement for any real clusters. Its only purpose is making pseudo-distributed or other "too small for real work" clusters a bit more responsive.

Are there any substantive reasons against this? I didn't anticipate much discussion over a change in a single constant that only affects tiny clusters :)

> Allow heartbeat interval smaller than 3 seconds for tiny clusters
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-1266
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1266
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker, task, tasktracker
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Priority: Minor
>
> For small clusters, the heartbeat interval has a large effect on job latency. This is especially true on pseudo-distributed or other "tiny" (<5 nodes) clusters. It's not a big deal for production, but new users would have a happier first experience if Hadoop seemed snappier.
> I'd like to change the minimum heartbeat interval from 3.0 seconds to perhaps 0.5 seconds (but have it governed by an undocumented config parameter in case people don't like this change). The cluster size-based ramp up of interval will maintain the current scalable behavior for large clusters with no negative effect.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.