You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2008/09/03 03:05:44 UTC

[jira] Updated: (HADOOP-4018) limit memory usage in jobtracker

     [ https://issues.apache.org/jira/browse/HADOOP-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-4018:
-------------------------------------

    Attachment: maxSplits4.patch

Hi Amar, thanks for your comments.

>1. If the job fails on init(), JobTracker invokes JobInProgress.kill(). So ideally you should simply throw an exception if the limit is crossed

Can you pl explain which potion of code you are referring to here?

>2. The api totalNumTasks() is not used anywhere and can be removed.
This API is used by JobInProgress.initTasks. This method computes the number of tasks that is needed by this job.

Regarding 3 and 4 i agree with you that it is better if I can check these limits in the constructor of JobInProgress. But, the number of splits for this current jobis not yet available when the constructor is invoked. That's the reason I do these checks in initTasks. Does it make sense?

regarding point 5, my latest patch has this fix.


> limit memory usage in jobtracker
> --------------------------------
>
>                 Key: HADOOP-4018
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4018
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: maxSplits.patch, maxSplits2.patch, maxSplits3.patch, maxSplits4.patch
>
>
> We have seen instances when a user submitted a job with many thousands of mappers. The JobTracker was running with 3GB heap, but it was still not enough to prevent memory trashing from Garbage collection; effectively the Job Tracker was not able to serve jobs and had to be restarted.
> One simple proposal would be to limit the maximum number of tasks per job. This can be a configurable parameter. Is there other things that eat huge globs of memory in job Tracker?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.