You are viewing a plain text version of this content. The canonical link for it is here.
Posted to mapreduce-issues@hadoop.apache.org by "Devaraj Das (JIRA)" <ji...@apache.org> on 2010/07/27 04:12:19 UTC

[jira] Updated: (MAPREDUCE-1288) DistributedCache localizes only once per cache URI

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Devaraj Das updated MAPREDUCE-1288:
-----------------------------------

    Attachment: MR-1288-bp20-1.patch

This bug surfaced on one of the secure Yahoo clusters. This is the scenario:
1. There is a file "/a/b/c/1.txt" on the hdfs which is private (one of the directories in the path leading up to the hdfs file does not have EXECUTE permissions for OTHERS).
2. A user "foo" uses this file in his job as a DistributedCache file, and the TTs localizes this file in a location owned by user "foo" (since this file is private it lands up in the protected place).
3. A second user "bar" also tries to use the same file in his job. Both users belong to the same unix group.
4. Assume some TT that localized "/a/b/c/1.txt" file before, while running foo's task, got a task of bar's job. It concludes the file was already localized since the mapping has an entry for /a/b/c/1.txt (mapping refers to the mapping between the Cache URIs and the CacheStatus objects, maintained by TT). 
5. The TT doesn't localize this file again. It instead points the tasks to the file that was localized in step (2). Since the directory where the file was localized is not readable by anyone other than "foo", the tasks of "bar"'s job fails.

I guess earlier this issue didn't arise earlier (pre-security) since the distributed cache files, even if they were private, were getting localized in directories that were readable by all users.

Attaching a patch for Y20S that addresses the issue.

> DistributedCache localizes only once per cache URI
> --------------------------------------------------
>
>                 Key: MAPREDUCE-1288
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1288
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: distributed-cache, security, tasktracker
>    Affects Versions: 0.21.0
>            Reporter: Devaraj Das
>            Priority: Critical
>         Attachments: MR-1288-bp20-1.patch
>
>
> As part of the file localization the distributed cache localizer creates a copy of the file in the corresponding user's private directory. The localization in DistributedCache assumes the key as the URI of the cachefile and if it already exists in the map, the localization is not done again. This means that another user cannot access the same distributed cache file. We should change the key to include the username so that localization is done for every user.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.