You are viewing a plain text version of this content. The canonical link for it is here.
Posted to yarn-issues@hadoop.apache.org by "Omkar Vinit Joshi (JIRA)" <ji...@apache.org> on 2013/11/12 01:55:19 UTC

[jira] [Commented] (YARN-1338) Recover localized resource cache state upon nodemanager restart

    [ https://issues.apache.org/jira/browse/YARN-1338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819678#comment-13819678 ] 

Omkar Vinit Joshi commented on YARN-1338:
-----------------------------------------

Here are certain things which we may want to track as part of this.
* Info from LocalizedResource
** Local Disk Path 
** timestamp
** RemoteUrl (Here do we need to trust that the old and new url are identical..not changed)?
** we store the resources inside the distributed cache in an hierarchical manner (to avoid unix directory limit)... we may need to recover that too).
** checksum? 
* We will also need to track containers which are using this resource. It would be better if we isolate this from the place where we are storing LocalizedResource thereby changes to this will be minimal.
** Do we need to store the symlink we are creating?
anyone working on this actively?

> Recover localized resource cache state upon nodemanager restart
> ---------------------------------------------------------------
>
>                 Key: YARN-1338
>                 URL: https://issues.apache.org/jira/browse/YARN-1338
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: nodemanager
>    Affects Versions: 2.3.0
>            Reporter: Jason Lowe
>            Assignee: Ravi Prakash
>
> Today when node manager restarts we clean up all the distributed cache files from disk. This is definitely not ideal from 2 aspects.
> * For work preserving restart we definitely want them as running containers are using them
> * For even non work preserving restart this will be useful in the sense that we don't have to download them again if needed by future tasks.



--
This message was sent by Atlassian JIRA
(v6.1#6144)