You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "dhruba borthakur (JIRA)" <ji...@apache.org> on 2009/12/30 02:28:29 UTC

[jira] Created: (MAPREDUCE-1345) JobTracker is slowed down because it forks subprocesses to do a df command

JobTracker is slowed down because it forks subprocesses to do a df command
--------------------------------------------------------------------------

                 Key: MAPREDUCE-1345
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1345
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: dhruba borthakur
            Assignee: Scott Chen


The JobTracker periodically does a df on the local directories. It forks a shell a shell to run a df command. The creation of the separate process is very slow because the process address space is copied by the OS on every subprocess creation. This becomes worse when the JT is configured to use a large heap space. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1345) JobTracker is slowed down because it forks subprocesses to do a df command

Posted by "dhruba borthakur (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795215#action_12795215 ] 

dhruba borthakur commented on MAPREDUCE-1345:
---------------------------------------------

This problem becomes acute when the JT is configured with more than 24GB of heap space and a new job arrives once every 5 seconds or so.

On most unix-y systems, one can scan /proc/diskstats to determine the amount of disk space used for each pf the local dirs.

> JobTracker is slowed down because it forks subprocesses to do a df command
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1345
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>
> The JobTracker periodically does a df on the local directories. It forks a shell a shell to run a df command. The creation of the separate process is very slow because the process address space is copied by the OS on every subprocess creation. This becomes worse when the JT is configured to use a large heap space. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1345) JobTracker is slowed down because it forks subprocesses to do a df command

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795219#action_12795219 ] 

Scott Chen commented on MAPREDUCE-1345:
---------------------------------------

Yes, HADOOP-5958 fixed this problem.

> JobTracker is slowed down because it forks subprocesses to do a df command
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1345
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>
> The JobTracker periodically does a df on the local directories. It forks a shell a shell to run a df command. The creation of the separate process is very slow because the process address space is copied by the OS on every subprocess creation. This becomes worse when the JT is configured to use a large heap space. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-1345) JobTracker is slowed down because it forks subprocesses to do a df command

Posted by "Todd Lipcon (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12795216#action_12795216 ] 

Todd Lipcon commented on MAPREDUCE-1345:
----------------------------------------

Is this covered by HADOOP-5958?

> JobTracker is slowed down because it forks subprocesses to do a df command
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1345
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>
> The JobTracker periodically does a df on the local directories. It forks a shell a shell to run a df command. The creation of the separate process is very slow because the process address space is copied by the OS on every subprocess creation. This becomes worse when the JT is configured to use a large heap space. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (MAPREDUCE-1345) JobTracker is slowed down because it forks subprocesses to do a df command

Posted by "Scott Chen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Scott Chen resolved MAPREDUCE-1345.
-----------------------------------

    Resolution: Duplicate

> JobTracker is slowed down because it forks subprocesses to do a df command
> --------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1345
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: dhruba borthakur
>            Assignee: Scott Chen
>
> The JobTracker periodically does a df on the local directories. It forks a shell a shell to run a df command. The creation of the separate process is very slow because the process address space is copied by the OS on every subprocess creation. This becomes worse when the JT is configured to use a large heap space. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.