You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Steve Loughran (JIRA)" <ji...@apache.org> on 2012/06/30 10:26:44 UTC

[jira] [Commented] (MAPREDUCE-4381) Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404425#comment-13404425 ] 

Steve Loughran commented on MAPREDUCE-4381:
-------------------------------------------

I can see the value in this, though I worry that per-job tuning may cause some people to submit tasks that trigger scalability issues -someone else will have to comment on that. 

Independent of that, 
#yes, the value field name should be switched to the standard form of a variable; being private there's no compatibility problems.
#the parameter name should be extracted into a static final field of Task, along with the default value, for ease of setup from code.
#it needs to be documented somewhere
#How do you think this should be tested? 
                
> Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-4381
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4381
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: task, tasktracker
>            Reporter: Shrinivas Joshi
>            Priority: Minor
>         Attachments: progress_interval.patch
>
>
> Currently PROGRESS_INTERVAL is a hard-coded value and is set to 3000 msec. We tried making it a tunable and experimented with different values. In some cases setting it to a smaller value like 1000 msec helps significantly improve performance of short running jobs such as piEstimator. This is because the task threads do not end up blocking for as many as 3 seconds for their last progress update event. We also noticed close to 14% improvement on Mahout KMeans iteration jobs which take more than 5 minutes on the test cluster that we are using. Please let me know if this seems to be a good idea. I have an initial patch that I have attached here. This is based on branch-1 tree. It may need some rework on MRv2 based branches I think. Also note that I have not changed the variable naming style for PROGRESS_INTERVAL even though it is not a public static final anymore. I can revise the patch if there are no objections to this idea. 
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira