You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Karthik Kambatla (JIRA)" <ji...@apache.org> on 2014/01/27 18:33:38 UTC

[jira] [Commented] (MAPREDUCE-5736) Jobtracker to hang when jobs with lot of tasks running

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882989#comment-13882989 ] 

Karthik Kambatla commented on MAPREDUCE-5736:
---------------------------------------------

bq. On a more general idea, I was wondering if the usage of the synchronized statement in the JT shouldn't be re-thought. Or maybe all this has already been addressed in YARN.
In YARN/MR2, a bunch of JT's functionality has been relegated to the MR-AM and JobHistoryServer. On top of that, the RM isn't as synchronized as the JT. 

That said, as long as it is not risky, we can see if we can have finer-grained locking for JT.getJobCounters.

> Jobtracker to hang when jobs with lot of tasks running
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-5736
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5736
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobtracker
>            Reporter: Benoit Perroud
>            Priority: Minor
>
> The jobtracker (in MRv1) is progressing slowly when a job with lot of tasks is running. The reason is that JT.getJobCounters is holding a global lock, and with a big job (like 50+K mappers for instance), it could take while to instanciate the ``Counters`` class. This global lock prevent all other activities to run normally, queuing them and degrading the normal activities of the JT.
> I was wondering if job.getCounters(), which is synchronized on a finer granularity (i.e. per job and not global) couldn't be taken out of the synchronized block.
> On a more general idea, I was wondering if the usage of the synchronized statement in the JT shouldn't be re-thought. Or maybe all this has already been addressed in YARN.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)