You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Ahmed Hussein (JIRA)" <ji...@apache.org> on 2019/05/30 16:03:00 UTC

[jira] [Commented] (MAPREDUCE-7208) Tuning TaskRuntimeEstimator

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852022#comment-16852022 ] 

Ahmed Hussein commented on MAPREDUCE-7208:
------------------------------------------

 

[~jeagles], [~tgraves], [~vinodkv], [~nroberts]

I had some issues using {{ExponentiallySmoothedTaskRuntimeEstimator}}. I made some investigation and implemented a new estimator that addresses some issues with the existing smoothing factor estimator. Do you mind taking a look at the suggested fixes and implementations?

 

 *{{SimpleExponentialTaskRuntimeEstimator}} (new) Vs {{ExponentiallySmoothedTaskRuntimeEstimator}} (old)*
 # New estimator follows Basic Exponential Smooth.
 # New estimator does not return an estimate for the first few cycles. This increases the accuracy of estimation; especially for long running tasks
 # New Estimator detects tasks that are slowing down. Old estimator fails to detect such scenarios.
 # New Estimator detects stalled tasks. Old estimator will not launch any speculative attempts when an attempt has a sharp slow down.

*Is the default speculator affected?*
 * The speculator is still using the {{LegacyTaskRuntimeEstimator}} by default.
 * The existing implementation uses the statistics.mean to get an {{estimatedNewAttemptRuntime()}}. This causes frequent speculation as the smallest difference between the {{estimatedRuntime}} and the mean will create a new speculativeAttempt. I changed the implementation of {{estimatedNewAttemptRuntime()}} so that it uses (mean + a small delta)
 * I created a n JUnit {{TestSpeculativeExecOnCluster}} that verifies the speculator running on {{MiniMRYarnCluster}}. The test case can be used for the old estimators.

*Tuning parameters:*
 * {{job.task.estimator.simple.exponential.smooth.lambda-ms}}: The lambda value in the smoothing function of the task estimator
 * {{job.task.estimator.simple.exponential.smooth.stagnated-ms}}: The window length in the simple exponential smoothing that considers the task attempt is stagnated. This allows the speculator to detect stalled progress.
 * {{job.task.estimator.simple.exponential.smooth.skip-initials}}: The number of initial readings that the estimator ignores before giving a prediction. A simple smoothing needs several iterations before adjusting and returning good estimates.  The skip-initials parameter instructs the estimator to return "no-information" progress updates did not reach that value.

 

 

> Tuning TaskRuntimeEstimator 
> ----------------------------
>
>                 Key: MAPREDUCE-7208
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7208
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>            Reporter: Ahmed Hussein
>            Assignee: Ahmed Hussein
>            Priority: Minor
>         Attachments: MAPREDUCE-7208.001.patch, smoothing-exponential.md
>
>
> By default, MR uses LegacyTaskRuntimeEstimator to get an estimate of the runtime.  The estimator does not adjust dynamically to the progress rate of the tasks. On the other hand, the existing alternative "ExponentiallySmoothedTaskRuntimeEstimator" behavior in unpredictable.
>  
> There are several dimensions to improve the exponential implementation:
>  # Exponential shooting needs a warmup period. Otherwise, the estimate will be affected by the initial values.
>  # Using a single smoothing factor (Lambda) does not work well for all the tasks. To increase the level of smoothing across the majority of tasks, we need to give a range of flexibility to dynamically adjust the smoothing factor based on the history of the task progress.
>  # Design wise, it is better to separate between the statistical model and the MR interface. We need to have a way to evaluate estimators statistically, without the need to run MR. For example, an estimator can be evaluated as a black box by using a stream of raw data as input and testing the accuracy of the generated stream of estimates.
>  # The exponential estimator speculates frequently and fails to detect slowing tasks. It does not detect slowing tasks. As a result, a taskAttempt that does not do any progress won't trigger a new speculation.
>  
> The file [^smoothing-exponential.md] describes how Simple Exponential smoothing factor works.
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org