You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vinod K V (JIRA)" <ji...@apache.org> on 2009/10/13 10:23:31 UTC

[jira] Commented: (MAPREDUCE-1100) User's task-logs filling up local disks on the TaskTrackers

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764999#action_12764999 ] 

Vinod K V commented on MAPREDUCE-1100:
--------------------------------------

The available features in the framework which can be used at all for limiting user logs are
 - User-log's limit via mapreduce.task.userlog.limit.kb
 - Log cleanup via mapred.task.userlog.retain.hours

mapreduce.task.userlog.limit.kb is not usable in the current format because of its limitations:
 - If this is used, showing the userlogs is not possible until tasks finish or fail. This is not acceptable.
 - The stdout/stderr files are controlled by using 'tail -c' on the stdout/stderr of the task-jvm. This tail command uses some of the precious memory allocated to the users, which is not accounted or controlled anywhere.
 - syslog files are written to by tasks but the files themselves can be arbitrarily written to by the jvm and its child processes without respecting any of these limits.

mapred.task.userlog.retain.hours cannot completely solve the issue because
 - it only takes into the account the amount of time the logs have to be retained *and not* the disk usage
 - because of MAPREDUCE-927, the cleanup mechanism itself is not guaranteed even in terms of time.

We should have a concrete mechanism to limit the amount of disk logs.

> User's task-logs filling up local disks on the TaskTrackers
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-1100
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1100
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>
> Some user's jobs are filling up TT disks by outrageous logging. mapreduce.task.userlog.limit.kb is not enabled on the cluster. Disks are getting filled up before task-log cleanup via mapred.task.userlog.retain.hours can kick in.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.