You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2009/12/14 11:53:18 UTC

[jira] Commented: (MAPREDUCE-1186) While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12790095#action_12790095 ] 

Hemanth Yamijala commented on MAPREDUCE-1186:
---------------------------------------------

Amarsri,  Vinod and I discussed the trunk patch a bit. The current implementation attempts to work as follows:
- Before task launch, the task controller is launched to secure localized cache files. Previously, all files under $mapred-local-dir/$user/taskTracker/archive were secured. Obviously, we are trying to fix that in the context of this JIRA.
- The patch lists the directories under $mapred-local-dir/$user/taskTracker/archive, (which after MAPREDUCE-1098, is the list of random id directories that were localized).
- For each directory, if the path is not already secured, it secures it recursively.

This approach has a race condition that we identified:
- Say a task has localized a file and has launched the task controller to secure the path, and the task controller is currently under operation.
- In parallel, say another task localized another file into a different random id directory.
- The task controller could get the random id directory created by the second task when it is listing directories and set permissions for it. However, this directory does not contain fully localized files and hence it would be incompletely localized.

The key problem here is that this approach does not have a real idea of what files were localized by a task as part of the distributed cache. One way to fix that would be to pass the paths to the task controller, as a list of random id directories under $mapred-local-dir/$user/taskTracker/archive that were localized in this task. This is what I suggested in the proposal above. However, there are a few problems with this proposal as well:

- How do we get the list of these paths ? There's currently no way exposed by distributed cache about these files.
- This could be a huge list, if several tens of files are being localized in a task. How would we transfer all this info to the task-controller ? A huge command line of paths to the task controller could be unmanageable, hit some command line length limits, etc. Other approaches (like transferring the info through a file) would also be cumbersome.
- It could result in duplicate work. Say if two tasks running in parallel are sharing a file, both of them would get the random id directory to secure, and both would try and secure the path.

To solve these problems, I am proposing the following:
- Change the directory structure for localized cache files as: $mapred-local-dir/$user/taskTracker/archive/$task-id, where task-id is for the task attempt on behalf of which localization is happening. Note that a task could use localized files that have already been localized for another task-id. Since a cache entry stores the full path for a cache key, it can retrieve this information.
- Move securing the cache file path in the same code path as where localization of the cache files happens.

The last point is actually important in this new proposal, because without that, we might have a situation that a task could use files that have been localized by a prior task-id, but is not yet secured. And if we don't wait for that, we would have incompletely secured cache files in use.

One drawback I can think of this approach is that the new task-id directory in the path might give a wrong impression that the files localized under it are all the files used by the task in distributed cache. But clearly, files localized under other task-ids could be used as well.

Comments on this proposal ?

> While localizing a DistributedCache file, TT sets permissions recursively on the whole base-dir
> -----------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1186
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-1186-1.txt, patch-1186-3-ydist.txt, patch-1186-3-ydist.txt, patch-1186-ydist.txt, patch-1186-ydist.txt, patch-1186.txt
>
>
> This is a performance problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.