You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Hemanth Yamijala (JIRA)" <ji...@apache.org> on 2009/04/03 13:28:13 UTC
[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695339#action_12695339 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Some comments:

- Use getLocalJobDir in LinuxTaskController.localizeJob
- Whenever mkdir or mkdirs fails, we should continue from loops.
- Changes in TaskRunner seem unnecessary.
- Changes in DistributedCache to pass the baseDir seems unnecessary. Note that localizeCache already takes a CacheStatus object that has the baseDir.
- This comment is not incorporated: "Modify TaskController.launchTaskJVM to set up permissions for the log dir and the pid dir associated with that task. This will remove the call to initializeTask from the JvmManager.runChild API." I think this can be done by calling setup*FileAccess from launchTaskJVM
- localizeJob can be called initializeJob
- In setupTaskCacheFileAccess, we are setting permissions recursively from the job directory. But this is what we do in LinuxTaskController.localizeJob. So we should be setting permissions from the taskCacheDirectory only.
- writeCommand should ideally check for existence of file before it tries to change permissions in the finally clause.
- JvmManagerForType.getTaskForJvm() - "Incase of JVM reuse, tasks returned previously launched" - some grammatical mistake here.
- In the kill part, I think it will be nice to add a info level log message when we are not doing the kill - both in JVM manager and in LinuxTaskController.
- TaskLog.getLogDir() - Make this getUserLogDir(), and the javadoc need not mention TaskControllers. It should be a generic documentation that it returns the base location for the user logs.
- mapred-defaults.xml should have the config variable for the task controller along with documentation.

Comments on documentation:

- I think we should first describe the use case for the Task controllers are trying to solve - as in the requirement to run tasks as a job owners.
- It would be nice to give a little description of how the LinuxTaskController works - just saying something like we use a setuid executable, the tasktracker uses this exe to launch and kill tasks.
- We should definitely mention that until other JIRAs like H-4491 etc are fixed, we open up permissions on the intermediate, localized and log files in the Linux TaskController case.
- making the executable a setuid exe is a deployment step. It is currently added as a build step.
- The path to the taskcontroller cfg - mention that this should be the path on the cluster nodes where the deployment of the taskcontroller.cfg file will happen
- We should also mention that the LinuxTaskController is currently supported only on Linux. (though it sounds obvious)
- Should we mention about permissions regarding mapred.local.dir and hadoop.log.dir (should be 777 and path leading up to them be 755) ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.