You are viewing a plain text version of this content. The canonical link for it is here.

Posted to mapreduce-issues@hadoop.apache.org by "Koji Noguchi (JIRA)" <ji...@apache.org> on 2010/08/14 02:10:17 UTC

[jira] Created: (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup
------------------------------------------------------------------------------------

                 Key: MAPREDUCE-2011
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: distributed-cache
            Reporter: Koji Noguchi


On our cluster, we had jobs with 20 dist cache and very short-lived tasks resulting in 500 map tasks launched per second resulting in  10,000 getFileStatus calls to the namenode.  Namenode can handle this but asking to see if we can reduce this somehow.  


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898981#action_12898981 ] 

Koji Noguchi commented on MAPREDUCE-2011:
-----------------------------------------

MAPREDUCE-1901 has a detail proposal of how to handle distributed cache better for those loaded by jobclient (-libjars).
As part of it, it mentions 

{quote}
The TaskTracker, on being requested to run a task requiring CAR resource md5_F checks whether md5_F is localized.

    * If md5_F is already localized - then nothing more needs to be done. the localized version is used by the Task
    * If md5_F is not localized - then its fetched from the CAR repository
{quote}

This Jira is basically almost asking the same except for asking to use existing mtime instead of a new md5_F proposed.
Just to reduce the mtime/getFileStatus calls, mtime check is enough and can keep the change small.



> Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>            Reporter: Koji Noguchi
>
> On our cluster, we had jobs with 20 dist cache and very short-lived tasks resulting in 500 map tasks launched per second resulting in  10,000 getFileStatus calls to the namenode.  Namenode can handle this but asking to see if we can reduce this somehow.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2011) Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup

Posted by "Koji Noguchi (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898481#action_12898481 ] 

Koji Noguchi commented on MAPREDUCE-2011:
-----------------------------------------

When a task is initialized, it calls getFileStatus for every distributed cache file/archive entries it has (_dfsFileStamp_) and compare it with task's timestamp specified in the config (_confFileStamp_).
This makes sure that tasks fail *at start up* if distributed cache files were changed after the job was submitted and before the task started.

(It still doesn't guarantee that job would fail reliably since all the tasks could have been started before the modification.)


Now asking if we can change this logic to,
If exact localized cache exists ('lcacheStatus.mtime == confFileStamp ') on the TaskTracker, use that and do not call getFileStatus(_dfsFileStamp_). 

With this, no getFileStatus calls are made if TaskTracker already has the localized cache with the same timestamp.  This should reduce the amount of getFileStatus calls significantly when people submit jobs using the same distributed cache files.

This still makes sure that all the tasks use the same dist cache files specified at the job startup. (corectness)

But with this change, tasks that would have failed at start-up due to (_dfsFileStamp_ != _confFileStamp_) can now succeed.



> Reduce number of getFileStatus call made from every task(TaskDistributedCache) setup
> ------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2011
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2011
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distributed-cache
>            Reporter: Koji Noguchi
>
> On our cluster, we had jobs with 20 dist cache and very short-lived tasks resulting in 500 map tasks launched per second resulting in  10,000 getFileStatus calls to the namenode.  Namenode can handle this but asking to see if we can reduce this somehow.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.