You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Scott Carey (JIRA)" <ji...@apache.org> on 2010/07/01 19:09:50 UTC

[jira] Created: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Lower minimum heartbeat interval for tasktracker > Jobtracker
-------------------------------------------------------------

                 Key: MAPREDUCE-1906
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
    Affects Versions: 0.20.2, 0.20.1
            Reporter: Scott Carey


I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.

Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884354#action_12884354 ] 

Scott Carey commented on MAPREDUCE-1906:
----------------------------------------

JobTracker.java has this code:

(0.21 branch, line 2497)
{code}
 public int getNextHeartbeatInterval() {
	// get the no of task trackers
	int clusterSize = getClusterStatus().getTaskTrackers();
	int heartbeatInterval = Math.max(
	(int)(1000 * HEARTBEATS_SCALING_FACTOR *
	Math.ceil((double)clusterSize /
	NUM_HEARTBEATS_IN_SECOND)),
	HEARTBEAT_INTERVAL_MIN) ;
 	return heartbeatInterval;
} 
{code}

HEARTBEAT_INTERVAL_MIN is 3000 (milliseconds).  This means that only after a cluster has reached 300 nodes does the jobtracker get 100 heartbeats / second.

This throttle is far too large in my experinence.  I have a development cluster with 10 nodes, each node can handle 10 maps and 10 reduces concurrently.  With 0.20, the most the scheduler will do is one map and one reduce per heartbeat.  The result is an always underutilized cluster whenever there are anything but very large jobs running.  Much of our data flows start out large, then end with a couple dozen smaller jobs that are mostly chained together.

I have been running in production and development with a patch to MRConstants.java that improves cluster utilization significantly by changing HEARTBEAT_INTERVAL_MIN to to 300 ms.  In small clusters, a heartbeat every 300ms is not an issue.  The above code already throttles the system, the floor of 3000ms is too large.  It still takes a cluster of 30 machines to get to the 100 heartbeat/sec threshold.

I also could not find an explanation why this was increased from 2000 to 3000 between 0.19 and 0.20.  



I

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Attachment:     (was: MAPREDUCE-1906-0.21.patch)

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21-v2.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Status: Patch Available  (was: Open)

re-submit for hudson.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Status: Open  (was: Patch Available)

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Status: Patch Available  (was: Open)

Is it possible to consider this for 0.21?  

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Attachment: MAPREDUCE-1906-0.21.patch

This patch changes the default minimum TaskTracker > JobTracker heartbeat interval from 3000ms to 300ms.

Effectively, this makes clusters between 30 and 300 nodes increase their heartbeat rate to a cluster-wide 100 heartbeats per second.
Clusters larger than 300 nodes remain unchanged at a cluster-wide 100 heartbeats per second.

Clusters with less than 30 nodes have a constant 300ms between pings per node. so for a 15 node cluster it is 50 heartbeats per second, and for a 3 node cluster it is 10 heartbeats per second.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Status: Open  (was: Patch Available)

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21-v2.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Attachment: MAPREDUCE-1906-0.21-v2.patch

Patch adds one line change to JobTracker.java to make the heartbeat interval a smooth function instead of a step function.  Total patch is two one-line changes.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885726#action_12885726 ] 

Hadoop QA commented on MAPREDUCE-1906:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12448507/MAPREDUCE-1906-0.21.patch
  against trunk revision 960808.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no new tests are needed for this patch.
                        Also please list what manual steps were performed to verify this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    -1 core tests.  The patch failed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/288/console

This message is automatically generated.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12887982#action_12887982 ] 

Scott Carey commented on MAPREDUCE-1906:
----------------------------------------

This is a one-line change to a static constant, no new unit tests are needed.

The three tests that fail are:
 org.apache.hadoop.mapred.TestSimulatorSerialJobSubmission.testMain 
 org.apache.hadoop.mapred.TestSimulatorDeterministicReplay.testMain 
 org.apache.hadoop.mapred.TestMapredHeartbeat.testJobDirCleanup (TestMapredHeartbeat.java:46)


The first two seem unrelated.  

The last one looks like the test is explicitly testing the constant.   The test assumes that the minimum heartbeat interval will in fact be, HEARTBEAT_INTERVAL_MIN, but the calculation in 
JobTracker.getNextHeartbeatInterval() is a step-function.  It essentially treats every NUM_HEARTBEATS_IN_SECOND nodes as a step-function in terms of increase in heartbeat delay.

Currently, with NUM_HEARTBEATS_IN_SECOND = 100 and HEARTBEATS_SCALING_FACTOR = 0.001, a cluster with 500 nodes would have a 5 second heartbeat interval.  But one with 501 nodes would have a 6 second interval.  Is there a good reason for the intervals to be rounded up to the next whole second?  How about we just remove the Math.ceil() and round to the next millisecond.  This will make the test's assumptions be true, and provide smooth throttling as nodes come and go.

However, It is possible that somewhere else in the code there is an assumption that jobtracker pings will be at whole second intervals.  

{code}
public int getNextHeartbeatInterval() {
	// get the no of task trackers
	int clusterSize = getClusterStatus().getTaskTrackers();
	int heartbeatInterval = Math.max(
	(int)(1000 * HEARTBEATS_SCALING_FACTOR *
	((double)clusterSize /
	NUM_HEARTBEATS_IN_SECOND)),
	HEARTBEAT_INTERVAL_MIN) ;
 	return heartbeatInterval;
}
{code}  



What were the reasons for the long minimum ping time in the first place?  Why did it go up from 2 to 3 seconds between 0.19 and 0.20?  

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Attachment:     (was: MAPREDUCE-1906-0.21-v2.patch)

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Attachment: MAPREDUCE-1906-0.21.patch

replaced the original patch with the the latest.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.20.2
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Status: Open  (was: Patch Available)

re-subit for hudson.

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (MAPREDUCE-1906) Lower minimum heartbeat interval for tasktracker > Jobtracker

Posted by "Scott Carey (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MAPREDUCE-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Carey updated MAPREDUCE-1906:
-----------------------------------

    Status: Patch Available  (was: Open)

MAPREDUCE-1906-0.21-v2.patch

changes ping to a smooth function from a step function and lowers the minimum to 300ms.   Clusters larger than 300 nodes only see the step-function > smooth function change.  Clusters between 30 and 300 nodes smoothly increase their ping interval.  Clusters with 30 nodes or less have 300ms ping intervals when the TT has nothing to do.  This improves scheduling latency on small clusters significantly.

The cluster wide ping interval is roughly proportional to how fast the cluster can schedule a job.
|| cluster size || current ping interval (ms) || current ping rate at JT || patched ping interval (ms) || patched ping rate at JT ||
| 10 | 3000 | 3.33 /sec | 300 | 33.3 /sec |
| 30 | 3000 | 10 /sec | 300 | 100 /sec |
| 100 | 3000 | 33.3 /sec | 1000 | 100 /sec |
| 300 | 3000 | 100 /sec | 3000 | 100 /sec |
| 301 | 4000 | 75 /sec | 3010 | 100 /sec |
| 1000 | 10000 | 100 /sec | 10000 | 100 /sec |
| 1001 | 11000 | 91 /sec | 10010 | 100 /sec |

> Lower minimum heartbeat interval for tasktracker > Jobtracker
> -------------------------------------------------------------
>
>                 Key: MAPREDUCE-1906
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1906
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.2, 0.20.1
>            Reporter: Scott Carey
>         Attachments: MAPREDUCE-1906-0.21-v2.patch, MAPREDUCE-1906-0.21.patch
>
>
> I get a 0% to 15% performance increase for smaller clusters by making the heartbeat throttle stop penalizing clusters with less than 300 nodes.
> Between 0.19 and 0.20, the default minimum heartbeat interval increased from 2s to 3s.   If a JobTracker is throttled at 100 heartbeats / sec for large clusters, why should a cluster with 10 nodes be throttled to 3.3 heartbeats per second?  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.