You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Doug Cutting (JIRA)" <ji...@apache.org> on 2008/05/23 19:35:55 UTC

[jira] Commented: (HADOOP-3439) TaskTracker.addDiagnostics(String file, int num, String tag) could exit early if num==0

    [ https://issues.apache.org/jira/browse/HADOOP-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12599442#action_12599442 ] 

Doug Cutting commented on HADOOP-3439:
--------------------------------------

> loads in a conf option (that is not in hadoop-default, incidentally) 

The rule for whether things belong in hadoop-default.xml or not is whether or not they are intended to be overridden in hadoop-site.xml.  Many parameters are only intended to be set by code, adding these to hadoop-default.xml just clutters what's primarily meant to be documentation.  Parameters meant to be set only by code should have static accessor methods on a relevant class, e.g., Foo#setFoo(Configuration c, String value).  Also, it's reasonable to leave out of hadoop-default.xml debugging parameters that are intended only for use by developers, not by end users.

That's been the (unwritten?) policy.  Does it make sense?  If so, perhaps we should record it somewhere...

> TaskTracker.addDiagnostics(String file, int num, String tag) could exit early if num==0
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-3439
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3439
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Steve Loughran
>            Priority: Minor
>
> When a TaskTracker job finishes,  taskFinished() is invoked. 
> as part of its work it
>  1. loads in a conf option (that is not in hadoop-default, incidentally) , mapred.debug.out.lines , default value -1;
>  2. calls addDiagnostics passing in that line count
> addDiagnostics either builds a string buffer of all the output, or creates a linear array of lines and runs adds them, shuffling them up if there are more lines than expected. 
> This is all unneeded if the number of lines to print == 0; the entire reading in of the output file can be skipped. This may speed up termination slightly on a run with a large output file and mapred.debug.out.lines ==0. 
> Note also that a circular buffer would handle the lines>0 problem without having to copy all the strings around.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.