You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-dev@hadoop.apache.org by "Jonathon Hare (JIRA)" <ji...@apache.org> on 2011/09/09 18:43:09 UTC

[jira] [Created] (MAPREDUCE-2973) Logic for determining whether to create a new JVM can interfere with Capacity-Scheduler and JVM reuse

Logic for determining whether to create a new JVM can interfere with Capacity-Scheduler and JVM reuse
-----------------------------------------------------------------------------------------------------

                 Key: MAPREDUCE-2973
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2973
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: tasktracker
    Affects Versions: 0.20.2
         Environment: N/A
            Reporter: Jonathon Hare
            Priority: Minor


We use the capacity scheduler to enable jobs with large memory requirements to be run on our cluster. The individual tasks have a large initial overhead when they load cached data. Using the JVM reuse option ({{mapred.job.reuse.jvm.num.tasks}}) and by caching data in a static variable we can reduce the overhead. 

The current {{JvmManager}} implementation will prefer creating new JVMs to reusing existing ones if the number of already created JVMs is less than the maximum. In the extreme case where the capacity scheduler is used to limit the number of tasks on a node to 1, but the number of [map|reduce] tasks per node is set to say 16, then 16 JVMs will be created before one of them is reused. Obviously, if the amount of cached data in the memory of each JVM is large, then node can rapidly run out of memory! What should really happen in this case is that the first created JVM should be reused, and others should not be spawned.

To work-around this problem on our cluster, we have modified the logic in the {{reapJVM()}} method in {{JvmManager}} to prefer to reuse an existing JVM (idle & belonging to the same job) over starting a new JVM, or killing an existing idle JVM.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira