You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Amar Kamat (JIRA)" <ji...@apache.org> on 2009/04/07 05:35:13 UTC
[jira] Commented: (HADOOP-3578) mapred.system.dir should be accessible only to hadoop daemons

    [ https://issues.apache.org/jira/browse/HADOOP-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696382#action_12696382 ] 

Amar Kamat commented on HADOOP-3578:
------------------------------------

Here is the proposal :

_Terms :_
# mapred.system.dir : the common location where the users (jobclient) uploads job files (job split and job jars). This dir will have rwx-w--w- permissions.
# mapred.system.dir/jobtracker : jobtracker's private scratch space with rwx------ permissions. This is the place where the jobtracker moves files upon successful job submission (upload + validation).

The process of job submission is as follows
# jobclient/user asks jobtracker for a new jobid
# jobclient generates a new x digit random number and upload the job files (split and jar) to mapred.system.dir/jobid-random-number
# jobclient/user pass this information and the jobconf to the jobtracker via the rpc (submitJob api). 
# jobtracker loads the conf via the rpc, does the acls check and only then the job is *accepted* (moved to mapred.system.dir/jobtracker)
# jobtracker serializes the job.xml (changing the location of split and jar file info in the conf)  to mapred.system.dir/jobtracker/jobid, moves job.jar and job.split to mapred.system.dir/jobtracker/jobid (this is imp for tasktracker rely on the information in the conf for job.jar and job.split). 
# Upon restart all the jobs that are present in mapred.system.dir/jobtracker/ will be blindly loaded and jobs in mapred.system.dir/ will be queued for cleanup.

_Benefits :_
# guessing job-dir will be hard as random number will be appended 
# separation between faulty jobs (jobs failing on access etc) and accepted jobs will be clear (helps in recovery)
# jobtracker system dir will be clean and cannot be garbled 
# jobconf need not be read from fs as it wil be passed via rpc, this helps in making quick decisions whether the job is faulty or not
# re-initing jobtracker is as simple as deleting jobtracker's system.dir (mapred.system.dir/jobtracker) without touching the mapred.system.dir

_Questions :_
# Should default api assume that the job.xml, job.jar and job.xml are still present in mapred.system.dir/jobid?

----
Thoughts? Comments?

> mapred.system.dir should be accessible only to hadoop daemons 
> --------------------------------------------------------------
>
>                 Key: HADOOP-3578
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3578
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>
> Currently the jobclient accesses the {{mapred.system.dir}} to add job details. Hence the {{mapred.system.dir}} has the permissions of {{rwx-wx-wx}}. This could be a security loophole where the job files might get overwritten/tampered after the job submission. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.