You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Shrinivas Joshi <js...@gmail.com> on 2011/03/29 22:08:49 UTC

map JVMs do not cease to exist

Noticed this on a TeraSort run - map JVM processes do not exit/cease to
exist even after a long while from successful execution of all map tasks.
Resources consumed by these JVM processes do not seem to be relinquished
either and that causes poor performance in the rest of the reduce phase
which continues execution after map phase is done.

Do you see this as an issue? If so, is it a known issue?

Thanks,
-Shrinivas

Re: map JVMs do not cease to exist

Posted by Shrinivas Joshi <js...@gmail.com>.
What would be the right way to query numMapTasks (as returned by
desiredMaps( ) org.apache.hadoop.mapred.JobInProgress class) from JvmManager
class? In case of JVM reuse enabled mode, numTasksToRun is set to -1 and
hence ranAll() method never returns true. I was thinking that setting
numTasksToRun = ceil(numMapTasks/maxJvms) might get around this issue?

Thanks,
-Shrinivas

On Tue, Apr 26, 2011 at 11:56 AM, Shrinivas Joshi <js...@gmail.com>wrote:

> JVM reuse policy seem to have an effect here. All map JVMs exit soon after
> their individual map tasks finish execution, if JVM reuse policy is
> disabled. However, when JVM reuse policy is enabled, there is no code which
> checks whether all map tasks assigned to a particular JVM process have
> finished execution and kills those JVM processes.
>
> In case of JVM reuse enabled mode, JVM processes are killed at the end as a
> result of  the job being dead. You could see that in tasktracker log where
> it shows messages like "Killing JVM jvm_201104131138_0001_m_182588697 since
> job job_201104131138_0001 is dead"
>
> JVM processes exit with exit code 0 in case of reuse disabled, whereas they
> exit with exit code 143 with reuse enabled.
>
> What would be the right place in the code where a check can be made to see
> if all map tasks assigned to a JVM process are done executing and that the
> process can then be killed. I have looked through JvmManager.java,
> Child.java and TaskTracker.java but wasn't so sure about the right way of
> doing this.
>
> If this is an issue that can be fixed I think it could have a sizable
> impact on performance in certain cases.
>
> Thanks,
> -Shrinivas
>
>
> On Tue, Mar 29, 2011 at 3:08 PM, Shrinivas Joshi <js...@gmail.com>wrote:
>
>> Noticed this on a TeraSort run - map JVM processes do not exit/cease to
>> exist even after a long while from successful execution of all map tasks.
>> Resources consumed by these JVM processes do not seem to be relinquished
>> either and that causes poor performance in the rest of the reduce phase
>> which continues execution after map phase is done.
>>
>> Do you see this as an issue? If so, is it a known issue?
>>
>> Thanks,
>> -Shrinivas
>>
>
>

Re: map JVMs do not cease to exist

Posted by Shrinivas Joshi <js...@gmail.com>.
JVM reuse policy seem to have an effect here. All map JVMs exit soon after
their individual map tasks finish execution, if JVM reuse policy is
disabled. However, when JVM reuse policy is enabled, there is no code which
checks whether all map tasks assigned to a particular JVM process have
finished execution and kills those JVM processes.

In case of JVM reuse enabled mode, JVM processes are killed at the end as a
result of  the job being dead. You could see that in tasktracker log where
it shows messages like "Killing JVM jvm_201104131138_0001_m_182588697 since
job job_201104131138_0001 is dead"

JVM processes exit with exit code 0 in case of reuse disabled, whereas they
exit with exit code 143 with reuse enabled.

What would be the right place in the code where a check can be made to see
if all map tasks assigned to a JVM process are done executing and that the
process can then be killed. I have looked through JvmManager.java,
Child.java and TaskTracker.java but wasn't so sure about the right way of
doing this.

If this is an issue that can be fixed I think it could have a sizable impact
on performance in certain cases.

Thanks,
-Shrinivas

On Tue, Mar 29, 2011 at 3:08 PM, Shrinivas Joshi <js...@gmail.com>wrote:

> Noticed this on a TeraSort run - map JVM processes do not exit/cease to
> exist even after a long while from successful execution of all map tasks.
> Resources consumed by these JVM processes do not seem to be relinquished
> either and that causes poor performance in the rest of the reduce phase
> which continues execution after map phase is done.
>
> Do you see this as an issue? If so, is it a known issue?
>
> Thanks,
> -Shrinivas
>