You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Vladimir Klimontovich (JIRA)" <ji...@apache.org> on 2010/12/27 16:25:45 UTC
[jira] Created: (MAPREDUCE-2235) JobTracker "over-synchronization"
makes it hang up in certain cases
JobTracker "over-synchronization" makes it hang up in certain cases
--------------------------------------------------------------------
Key: MAPREDUCE-2235
URL: https://issues.apache.org/jira/browse/MAPREDUCE-2235
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: jobtracker
Affects Versions: 0.21.0, 0.20.2, 0.20.1
Reporter: Vladimir Klimontovich
There is a genaral problem in JobTracker.java code: it's using "this" synchronization everywhere so only one method could be executed at one moment. When the job submit rate is low (lower then one job in several seconds) tracker's working without a problem. When the job rate is high the following problem occurs:
Inside submitJob() JT copies job jar + xml to local filesystem. After that it's doing "chmod" on those files. Hadoop does chmod by spawning child process. When JT heap is big (like several gigabytes) spawning child process takes a lot of time (because java calls fork()) — in our case it's about 1-2 seconds. So job tracker can't handle high frequency job submits.
Except of that, as heartbeat() method is also synchronized JT stops to process heart-beat as "this" monitor is being held by submit job. That makes JT thins that a lot of TaskTrackers are down.
Following solution could help:
"chmod" is being called from submitJob() method under following line:
JobInProgress job = new JobInProgress(jobId, this, this.conf);
This block could be taken away from synchronized code:
public JobStatus submitJob(JobID jobId) throws IOException {
synchronized (this) {
.... the rest
}
//here we're leaving this line outside syncronized code as it doesn't relate
//on state of JobTracker. Also this line
JobInProgress job = new JobInProgress(jobId, this, this.conf);
synchronized (this) {
.... the rest
}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (MAPREDUCE-2235) JobTracker "over-synchronization"
makes it hang up in certain cases
Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Todd Lipcon resolved MAPREDUCE-2235.
------------------------------------
Resolution: Duplicate
Hi Vladimir. I think this was already covered by MAPREDUCE-1354 in trunk. Let me know if you disagree and we can reopen.
> JobTracker "over-synchronization" makes it hang up in certain cases
> --------------------------------------------------------------------
>
> Key: MAPREDUCE-2235
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2235
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker
> Affects Versions: 0.20.1, 0.20.2, 0.21.0
> Reporter: Vladimir Klimontovich
> Attachments: MAPREDUCE-2235-patch1.txt
>
>
> There is a genaral problem in JobTracker.java code: it's using "this" synchronization everywhere so only one method could be executed at one moment. When the job submit rate is low (lower then one job in several seconds) tracker's working without a problem. When the job rate is high the following problem occurs:
> Inside submitJob() JT copies job jar + xml to local filesystem. After that it's doing "chmod" on those files. Hadoop does chmod by spawning child process. When JT heap is big (like several gigabytes) spawning child process takes a lot of time (because java calls fork()) — in our case it's about 1-2 seconds. So job tracker can't handle high frequency job submits.
> Except of that, as heartbeat() method is also synchronized JT stops to process heart-beat as "this" monitor is being held by submit job. That makes JT thins that a lot of TaskTrackers are down.
> Following solution could help:
> "chmod" is being called from submitJob() method under following line:
> JobInProgress job = new JobInProgress(jobId, this, this.conf);
> This block could be taken away from synchronized code:
> public JobStatus submitJob(JobID jobId) throws IOException {
> synchronized (this) {
> .... the rest
> }
> //here we're leaving this line outside syncronized code as it doesn't relate
> //on state of JobTracker. Also this line
> JobInProgress job = new JobInProgress(jobId, this, this.conf);
> synchronized (this) {
> .... the rest
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (MAPREDUCE-2235) JobTracker
"over-synchronization" makes it hang up in certain cases
Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12975269#action_12975269 ]
Vladimir Klimontovich commented on MAPREDUCE-2235:
--------------------------------------------------
I'll provide a patch for cloudera's distribution shortly. I hope, this patch will also work for mainstream branches.
> JobTracker "over-synchronization" makes it hang up in certain cases
> --------------------------------------------------------------------
>
> Key: MAPREDUCE-2235
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2235
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker
> Affects Versions: 0.20.1, 0.20.2, 0.21.0
> Reporter: Vladimir Klimontovich
>
> There is a genaral problem in JobTracker.java code: it's using "this" synchronization everywhere so only one method could be executed at one moment. When the job submit rate is low (lower then one job in several seconds) tracker's working without a problem. When the job rate is high the following problem occurs:
> Inside submitJob() JT copies job jar + xml to local filesystem. After that it's doing "chmod" on those files. Hadoop does chmod by spawning child process. When JT heap is big (like several gigabytes) spawning child process takes a lot of time (because java calls fork()) — in our case it's about 1-2 seconds. So job tracker can't handle high frequency job submits.
> Except of that, as heartbeat() method is also synchronized JT stops to process heart-beat as "this" monitor is being held by submit job. That makes JT thins that a lot of TaskTrackers are down.
> Following solution could help:
> "chmod" is being called from submitJob() method under following line:
> JobInProgress job = new JobInProgress(jobId, this, this.conf);
> This block could be taken away from synchronized code:
> public JobStatus submitJob(JobID jobId) throws IOException {
> synchronized (this) {
> .... the rest
> }
> //here we're leaving this line outside syncronized code as it doesn't relate
> //on state of JobTracker. Also this line
> JobInProgress job = new JobInProgress(jobId, this, this.conf);
> synchronized (this) {
> .... the rest
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (MAPREDUCE-2235) JobTracker "over-synchronization"
makes it hang up in certain cases
Posted by "Vladimir Klimontovich (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/MAPREDUCE-2235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vladimir Klimontovich updated MAPREDUCE-2235:
---------------------------------------------
Attachment: MAPREDUCE-2235-patch1.txt
> JobTracker "over-synchronization" makes it hang up in certain cases
> --------------------------------------------------------------------
>
> Key: MAPREDUCE-2235
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2235
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: jobtracker
> Affects Versions: 0.20.1, 0.20.2, 0.21.0
> Reporter: Vladimir Klimontovich
> Attachments: MAPREDUCE-2235-patch1.txt
>
>
> There is a genaral problem in JobTracker.java code: it's using "this" synchronization everywhere so only one method could be executed at one moment. When the job submit rate is low (lower then one job in several seconds) tracker's working without a problem. When the job rate is high the following problem occurs:
> Inside submitJob() JT copies job jar + xml to local filesystem. After that it's doing "chmod" on those files. Hadoop does chmod by spawning child process. When JT heap is big (like several gigabytes) spawning child process takes a lot of time (because java calls fork()) — in our case it's about 1-2 seconds. So job tracker can't handle high frequency job submits.
> Except of that, as heartbeat() method is also synchronized JT stops to process heart-beat as "this" monitor is being held by submit job. That makes JT thins that a lot of TaskTrackers are down.
> Following solution could help:
> "chmod" is being called from submitJob() method under following line:
> JobInProgress job = new JobInProgress(jobId, this, this.conf);
> This block could be taken away from synchronized code:
> public JobStatus submitJob(JobID jobId) throws IOException {
> synchronized (this) {
> .... the rest
> }
> //here we're leaving this line outside syncronized code as it doesn't relate
> //on state of JobTracker. Also this line
> JobInProgress job = new JobInProgress(jobId, this, this.conf);
> synchronized (this) {
> .... the rest
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.