You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Arun C Murthy (JIRA)" <ji...@apache.org> on 2008/10/22 22:25:44 UTC

[jira] Created: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Map and Reduce tasks should run as the user who submitted the job
-----------------------------------------------------------------

                 Key: HADOOP-4490
                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
             Project: Hadoop Core
          Issue Type: Sub-task
            Reporter: Arun C Murthy
            Assignee: Arun C Murthy
             Fix For: 0.20.0


Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.

For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12700733#action_12700733 ] 

Hudson commented on HADOOP-4490:
--------------------------------

Integrated in Hadoop-trunk #811 (See [http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/811/])
    

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699793#action_12699793 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

The release audit warning is due to a new public API in FileUtil. This is expected. 

This patch has been tested extensively manually, and a version of this patch is also deployed in production environment for a while now. There is a plan to develop unit tests as a followup.

Given these, I am committing this patch, as it is blocking some other jiras in M/R.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642042#action_12642042 ] 

Vinod K V commented on HADOOP-4490:
-----------------------------------

Duplicate of HADOOP-4451.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: HADOOP-4490-1.patch

Fixing, findbugs warning. Plus had made mistake in localizejob in LinuxTaskController corrected it.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4490:
----------------------------------

    Component/s: security
                 mapred

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-10.patch

Attaching new patch, the patch modifies the kill task part from the previous patch. This passes on the configuration value {{mapred.tasktracker.tasks.sleeptime-before-sigkill}} to task-controller binary. Current trunk version of the binary would ignore this value. Once fix for HADOOP-5420 goes in, the binary would sleep the interval after issuing SIGTERM then pass SIGTERM to child task.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654477#action_12654477 ] 

Owen O'Malley commented on HADOOP-4490:
---------------------------------------

>From what I've heard, running under a security manager kills performance, which is pretty much a non-starter. Especially given that we need unix-level security anyways. In our environment, running as the real user is important. If I run a job, I should not be able to look at or kill your job's data or tasks, even if we are sharing a machine. Of course this feature needs to be optional, since:
  1. It requires that all cluster users have accounts on all slave nodes in the cluster
  2. It requires native code that may not work on all platforms
  3. It requires root access, which not all Hadoop admins have.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671042#action_12671042 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Merged with trunk. It was broken by a commit done yesterday night. Also removed extraneous logs. Running ant test.

test-patch gives following output:

     [exec] -1 overall.
     [exec]
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec]
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec]
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec]
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec]
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec]
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec]
     [exec]     -1 release audit.  The applied patch generated 823 release audit warnings (more than the trunk's current 820 warnings).

The -1 on release audit is because jdiff has generated some changes to public classes and packages (DistributedCache, FileUtil and the filecache package). The release audit seems to be flagging these new jdiff changed files as warnings. I've cross checked the ASF license header is included for all new files I've put up.

The -1 on tests is as explained above.

There's no change in functionality to the last patch I uploaded. Only a merge.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654392#action_12654392 ] 

Steve Loughran commented on HADOOP-4490:
----------------------------------------

Hemanth -you are right, for streaming/pipes stuff a second identity is needed. What some of the grid toolkits have done in the past is have some low-privilege user for running work; there isn't a 1:1 mapping of grid users to user accounts, instead the worker is allowed access to the relevant files of a user for a while, then at the end of the job, that data goes away. This eliminates some of the account management problems, though forces you to make sure that the worker doesnt have access to any old/shared data on the same filesystem. 

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671509#action_12671509 ] 

Hadoop QA commented on HADOOP-4490:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12399725/HADOOP-4490.patch
  against trunk revision 741776.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3812/console

This message is automatically generated.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Status: Patch Available  (was: Open)

The attached file fixes the streaming test failures. I'd made changes to FileUtil.java to run the chmod command using the ShellCommandExecutor. Previously this was using the Process class. There was no reason to change it, so I moved back to using Process and the tests passed locally.

I thought ant test runs contrib tests as well, which is why I missed these on the first patch.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671098#action_12671098 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

ant test passes locally.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696451#action_12696451 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Code changes look fine to me. Looking at the documentation changes.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: HADOOP-4490.patch

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-8.patch

* Removing an unused variable in JVM Manager. 

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: hadoop-4490-design.pdf

A lot of discussion has happened on various comments in this JIRA. The attached document collates all of them. I hope this will make it easier to follow the approach and review the changes.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658463#action_12658463 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

bq. It would also be good to have the task tracker root directories in a separate config file that can be owned by root. 

We are taking care of this point in the setuid executable. One question is to determine how the location of this secure config file will known to the executable. Following are our options:

Option 1: Read from the environment variable HADOOP_CONF_DIR
Option 2: Take a command line option to specify the location of the file.
Option 3: Have it as a build time configuration parameter, and encode into the executable (like for instance, pass it as an autoconf option).

Options 1 and 2 may allow users to launch the executable pointing to some custom path. Option 3 would completely avoid this, and make it more secure.

For the sake of deployment, I think the setuid executable should be built using a separate ant target, as it would need to be setup as owned by root etc. So, maybe it is easy to do Option 3 in that case. If we decide to go with one of the other two options, we should mandate additional checks to make sure that the configuration file is owned by the root user, as Owen mentioned.

Any comments ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12666220#action_12666220 ] 

Arun C Murthy commented on HADOOP-4490:
---------------------------------------

Hemanth, this is look good. 

Some comments:

# We should use mapred.local.dir instead of hadoop.tmp.dir in LinuxTaskController.
# Use Path's methods instead of String manipulation for all path-related manipulations.
# Pass mode, user/group to DistributedCache rather than rely on the newly introduced DistributedCache.isFreshlyLoaded which is then unnecessary.
# Move setting up of JVM-specific files e.g. task's log directory to TaskController.launchJVM.


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12670819#action_12670819 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Attached a hopefully last version of the patch. This one is extensively tested and has fixed a couple of bugs related to incorrect assumptions about multiple mapred local directories. Thanks to Sreekanth and Amar for help in testing this. We're run randomwriter, sort with and without JVM reuse, and also streaming and using distributed cache.

The test-patch results are showing a -1 on release audit which I've written to core-dev about. I am not sure why a -1 is coming, will continue to debug that. There's also a -1 on tests. It is difficult to write unit tests for this patch since it requires support for multiple users.

There are a lot of log statements in the patch which should be removed before commit. I am attaching this with the hope that someone can take a look at the changes. Arun ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651568#action_12651568 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I had some offline discussions with Arun and Sameer and here are some initial thoughts on approach. A lot of details still need to be flushed out, but I am posting this to get some early feedback.

We do want to run the daemons as non-privileged users, and yet go with a setuid based approach to run tasks as a regular user. One approach that was proposed to do this is as follows:
- We create a setuid executable, say a taskcontroller, that will be owned by root.
- This executable can take the following arguments - <user> <command> <command arguments>.
- <user> will be the job owner.
- <command> will be an action that needs to be performed, such as LAUNCH_JVM, KILL_TASK, etc.
- <command arguments> will depend on the command. For e.g. LAUNCH_JVM would have the arguments currently used to launch a JVM via the ShellCommandExecutor.
- The tasktracker will launch this executable with the appropriate command and arguments when needed.
- As the executable is a setuid exe, it will run as root, and will quickly drop privileges using setuid, to run as the user.
- Then the arguments will be used to execute the required action, for e.g. launching a VM or killing a task.
- Before dropping privileges, if needed, the executable could set up directories with appropriate ownership, etc.
- Naturally this would be platform specific. Hence, we can define a TaskController class that defines APIs to encapsulate these actions. For e.g., something like:
{code}
abstract class TaskController {

  abstract void launchTask();
  abstract void killTask(Task t);
  // etc...
}
{code}
- This could be extended by a LinuxTaskController, that converts the generic arguments into something that can be passed to executable - for e.g. maybe a process ID.
- One specific point is about the directory / file permissions. Sameer was of the opinion that the permissions should be quite strict, that is, world readable rights are not allowed. There are cases where the task as well as the daemon may need to access files. To handle this, one suggestion is to first set the permissions to the user, and then change the ownership to the daemon after the task is done.

The points above specify a broad approach. Please comment on whether this seems reasonable, reasonable in parts, or completely way off the mark. *smile*. Based on feedback, I would start implementing a prototype to flush out the details.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671234#action_12671234 ] 

Hadoop QA commented on HADOOP-4490:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12399627/HADOOP-4490.patch
  against trunk revision 741330.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    -1 release audit.  The applied patch generated 821 release audit warnings (more than the trunk's current 819 warnings).

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/testReport/
Release audit warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/artifact/trunk/current/releaseAuditDiffWarnings.txt
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/3806/console

This message is automatically generated.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-4.patch

Attaching patch modifiying a minor thing,  to check jvm env member of TaskControllerContext. The JVM env needs to be set for both controllers. So the check has been moved into JVMRunner.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-12.patch

Attaching patch fixing an issue found while testing on large clusters while task trackers are re-inited.

* Changed {{FileUtil.chmod()}} to use {{ShellCommandExecutor}} instead of building process directly and executing the same.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12695339#action_12695339 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Some comments:

- Use getLocalJobDir in LinuxTaskController.localizeJob
- Whenever mkdir or mkdirs fails, we should continue from loops.
- Changes in TaskRunner seem unnecessary.
- Changes in DistributedCache to pass the baseDir seems unnecessary. Note that localizeCache already takes a CacheStatus object that has the baseDir.
- This comment is not incorporated: "Modify TaskController.launchTaskJVM to set up permissions for the log dir and the pid dir associated with that task. This will remove the call to initializeTask from the JvmManager.runChild API." I think this can be done by calling setup*FileAccess from launchTaskJVM
- localizeJob can be called initializeJob
- In setupTaskCacheFileAccess, we are setting permissions recursively from the job directory. But this is what we do in LinuxTaskController.localizeJob. So we should be setting permissions from the taskCacheDirectory only.
- writeCommand should ideally check for existence of file before it tries to change permissions in the finally clause.
- JvmManagerForType.getTaskForJvm() - "Incase of JVM reuse, tasks returned previously launched" - some grammatical mistake here.
- In the kill part, I think it will be nice to add a info level log message when we are not doing the kill - both in JVM manager and in LinuxTaskController.
- TaskLog.getLogDir() - Make this getUserLogDir(), and the javadoc need not mention TaskControllers. It should be a generic documentation that it returns the base location for the user logs.
- mapred-defaults.xml should have the config variable for the task controller along with documentation.

Comments on documentation:

- I think we should first describe the use case for the Task controllers are trying to solve - as in the requirement to run tasks as a job owners.
- It would be nice to give a little description of how the LinuxTaskController works - just saying something like we use a setuid executable, the tasktracker uses this exe to launch and kill tasks.
- We should definitely mention that until other JIRAs like H-4491 etc are fixed, we open up permissions on the intermediate, localized and log files in the Linux TaskController case.
- making the executable a setuid exe is a deployment step. It is currently added as a build step.
- The path to the taskcontroller cfg - mention that this should be the path on the cluster nodes where the deployment of the taskcontroller.cfg file will happen
- We should also mention that the LinuxTaskController is currently supported only on Linux. (though it sounds obvious)
- Should we mention about permissions regarding mapred.local.dir and hadoop.log.dir (should be 777 and path leading up to them be 755) ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653375#action_12653375 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I have been able to make some progress and get a wordcount job to run as the job submitter. The design follows the basic approach mentioned above, minus the plugin abstraction, which I need to create yet.

- Created a setuid C executable.
- This executable currently takes the following commands:
-- SETUP_DIRS <list of directories>:
   This command sets up task specific directories to be owned by the user. The general approach I followed for handling directory permissions is that the root directories, such as hadoop.tmp.dir/mapred/local/taskTracker/jobcache/jobid would be owned by the tasktracker daemon, which creates task directories under it when needed. Then the taskcontroller exe will change the ownership and permissions of the task directory and sub folders to the user.
-- RUN_TASK <path to a file containing the M/R task to execute> 
   The file is a temp file created under the user's work directory itself - executable by the user
-- MOVE_FILES <source directory> <destination directory>
   This command is used to copy the intermediate output and task logs from the task directories to a system specific directory owned by the daemon. The servlets serving this data are modified to read from the system specific directory.
- These are called from the JvmManager class at appropriate places.

A couple of things came up when doing this:

- Task logs: Currently task logs can be viewed when the task is still executing. Further the task logs are read by the TaskLogServlet, which is running in the daemon context. We want the task logs to be owned by the user. I still need to figure out how to achieve this. Currently, I am only able to access task logs after they are done, by executing the MOVE_FILES command. Any ideas are welcome.

- JVM Reuse: Currently, I've only handled this with one JVM per task. Need to check the approach when JVM reuse is in the picture.

- Still need to work on cleanup and kill actions, as also distributed cache.

The code I have is very raw and needs lots of polishing even as a first draft. Will try to do so in a couple of days. Any comments on the approach so far ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655721#action_12655721 ] 

Owen O'Malley commented on HADOOP-4490:
---------------------------------------

That sounds reasonable. Please ensure that the pid file is owned by the user given on the command line and has permissions of 600. This avoids someone leaving this file writable and having someone point it at a different process.


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Assignee:     (was: Arun C Murthy)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Vinod K V (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12688213#action_12688213 ] 

Vinod K V commented on HADOOP-4490:
-----------------------------------


Found a problem while this patch was being tested. When a TT was being re-inited by JT after it's lost for some time, TT got the following NPE and it crashed completely.
{code}
2009-03-22 22:22:36,155 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.lang.NullPointerException
        at org.apache.hadoop.mapred.LinuxTaskController.buildTaskCommandArgs(LinuxTaskController.java:183)
        at org.apache.hadoop.mapred.LinuxTaskController.killTaskJVM(LinuxTaskController.java:237)
        at org.apache.hadoop.mapred.JvmManager$JvmManagerForType$JvmRunner.kill(JvmManager.java:401)
        at org.apache.hadoop.mapred.JvmManager$JvmManagerForType.stop(JvmManager.java:211)
        at org.apache.hadoop.mapred.JvmManager.stop(JvmManager.java:61)
        at org.apache.hadoop.mapred.TaskTracker.close(TaskTracker.java:925)
        at org.apache.hadoop.mapred.TaskTracker.run(TaskTracker.java:1815)
        at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:2899)
{code}

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: HADOOP-4490.patch

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: HADOOP-4490.patch

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699566#action_12699566 ] 

Sreekanth Ramakrishnan commented on HADOOP-4490:
------------------------------------------------

Attaching output of the {{ant test-patch}}

{noformat}
     [exec] 
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec] 
     [exec]     -1 release audit.  The applied patch generated 652 release audit warnings (more than the trunk's current 651 warnings).
     [exec] 
     [exec] 
{noformat}


All core and contrib tests passed on the local machine except TestQueueCapacities which is being handled in seperate JIRA.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655207#action_12655207 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I had an offline discussion with Devaraj regarding the implementation, and we also went over the impact this would have when clubbed with JVM reuse.

A few comments from him that I am documenting here:
- Task directories under the tasktracker system or root directory to which files (such as intermediate outputs) are copied after task completion should be in the same disk as the original user's task directories. This is to prevent across disk copies.
- Regarding the problem of serving log outputs which I've mentioned [here|#action_12653375], we discussed one approach could be to have a command in the executable to read the data and return to the TaskLogServlet on demand. This would happen reasonably rarely and does not affect any other functionality. Hence it seems like the performance overhead can be ignored.
- Another comment was to reduce the number of times the executable is launched. For e.g. *without* JVM reuse, I can setup the directories, run the task, and then move the outputs with a single launch of the executable. This is possible because all actions are per task, and there is one JVM per task. Hence the lifecycle of the task fits well with the setuid changes.

With JVM reuse though, the last point becomes problematic. We can easily setup the directories and move the output before and after the task. However, that needs to be done with a separate launch of the executable - three times actually. The performance impact this would have (and would it offset the advantage of JVM reuse) is something to measure and see.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-6.patch

Mixed up merging in the last patch. Attaching new patch which corrects the issue.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12669354#action_12669354 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I've updated the patch to trunk, incorporating most of Arun's comments above. Arun, can you please take a look.

bq. We should use mapred.local.dir instead of hadoop.tmp.dir in LinuxTaskController.
Done.

bq. Use Path's methods instead of String manipulation for all path-related manipulations.
Done.

bq. Pass mode, user/group to DistributedCache rather than rely on the newly introduced DistributedCache.isFreshlyLoaded which is then unnecessary.
Done. I've added a new overloaded API that passes the information to DistributedCache. Just to keep options open, I've defined a new public class DistributedCacheFileAccessInfo - a simple class that can be used to define permissions and ownership information for localized files in DistributedCache. Can you take a specific look at this, and let me know if this looks OK ?

bq. Move setting up of JVM-specific files e.g. task's log directory to TaskController.launchJVM
I've not done this one alone. It was not very clear what information is necessary at launch time. For e.g. if there are some localized files under the task cache directory that need to be loaded at launch time, we'll need permissions for these also. In general, it seemed a little risky to launch the JVM without giving full access to all jars etc, even if the Task will start running later only. So, I've left this as is. I think the main concern here was about the special check I had in JvmManager where I was avoiding setting the permissions again when getting the task to launch. This seems a simple enough check, and I've documented the rationale in code. Can you verify this again, and let me know your thoughts ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Arun C Murthy updated HADOOP-4490:
----------------------------------

    Status: Open  (was: Patch Available)

Some comments after a discussion with Hemanth:

# We agree that DistributedCacheFileAccessInfo isn't necessary. We'll wait for HADOOP-4493 to fix access control to the DistributedCache, until which we will just allow requisite access to all files in the cache (755).
# FileUtil.setPermissionsForPathComponents and opening of permissions to mapred.local.dir via LinuxTaskController.setConf bother me. I see 4 different hooks we that we need to provide for setup/cleanup: 
  #* per-tracker (i.e. at TaskTracker initialization e.g. setting up mapred.local.dir)
  #* per-job (job jars)
  #* per-jvm (task log files)
  #* per-task 
 Of course we might not need all the cleanup hooks.
# The TaskController itself should be stateless, the above hooks should be plugged into at appropriate places e.g. TaskTracker.localizeJob should call TaskController.initializeJob rather than having LinuxTaskController maintaining state as in the patch.
 


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: HADOOP-4490.patch

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Craig Macdonald (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649658#action_12649658 ] 

Craig Macdonald commented on HADOOP-4490:
-----------------------------------------

I think that (2) depends on how (1) is proposed to be addressed. If you assume that (1) is addressed by using seteuid() or the su command such that processes actually run on the system as the appropriate user,  then (2) is extremely difficult without being ruin as root.

If (1) is addressed just by setting the UGI in some way, then this had disadvantages compared to the  seteuid/su - which facilitates secured access to non-HDFS resources (e.g. NFS in smaller environments).



> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12661068#action_12661068 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I attached a new patch that is more comprehensive. All changes from the previous patch still hold good. This one adds the correct permissions for all relevant files and directories, except distributed cache.

The previous patch only set relevant permissions on the task and log cache directories for all users, with the intent that tasks running as any user should be able to create and use other files and directories under them. This requirement still applies. However, there are other files and directories whose access needs to be adjusted too. The new patch addresses these changes:

- It sets permissions on the job related jar files and other directories allowing access to everyone.
- It sets read and execute permissions on directory paths until the task / job cache and log directories. For e.g. if a task cache directory is created under ${mapred.local.dir}/taskTracker/jobcache, all paths in this component are attempted to be given read and execute (and no write) access for all users. This is required for looking up paths and locating / reading files created by the tasktracker.

Both the changes above are required in future as well. Except then, the permission string would be more restrictive (disallowing access to group and others).

The previous patch was working because of a subtle behavior in setuid. On the systems where we tested, the umask was set such that read and execute permissions were provided to group by default. So, any of the job files created by the tasktracker had read and execute to the group to which the tasktracker user belonged. When the setuid executable switched users, it does not clear the supplementary group information of the launcher. Hence, the new process running as the job owner still had access to the groups to which the tasktracker belonged, and hence worked. Again, in HADOOP-4491, we propose to remove all access for the group ownership also, and hence this will not be an issue.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: HADOOP-4490.patch

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Fix Version/s: 0.21.0

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-11.patch

After offline discussion with Hemanth and Vinod the changes to LinuxTaskController for resolving HADOOP-5420 can be addressed in a different JIRA.

Reverting to previous version of patch. 

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12649065#action_12649065 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Before beginning discussions on approach, I wanted to summarize my understanding of this task, and also start discussion on a few points that I have some questions on.

The following are some salient points:
# We want to run tasks as the user who submitted the job, rather than as the user running the daemon.
# I think we also don't want to run the daemon as a privileged user (such as root) in order to solve this requirement. Right ?
# The directories and files used by the task should have appropriate permissions. Currently these directories and files are mostly created by the daemons, but used by the task. A few are used/accessed by the daemons also. Some of these directories and files are the following:
## mapred.local.dir/taskTracker/archive - directories containing distributed cache archives
## mapred.local.dir/taskTracker/jobcache/$jobid/ - Include work (which is a scratch space), jars (containing the job jars), job.xml.
## mapred.local.dir/taskTracker/jobcache/$jobid/$taskid - Include job.xml, output (intermediate files), work (current working dir) and temp (work/tmp) directories for the task.
## mapred.local.dir/taskTracker/pids/$taskid - Written by the shell launching the task, but read by the daemons.
# What should 'appropriate' permissions mean ? I guess read/write/execute (on directories) for the owner of the job is required. What should the permissions be for others ? If the task is the only consumer, then the permissions for others can be turned off. However, there are cases where the daemon / other processes might read the files. For instance:
## The distributed cache files can be shared across jobs.
## Jetty seems to require read permissions on the intermediate files to serve them to the reducers.
In the above cases, can we make these world readable ?
## Task logs are currently generated under ${hadoop.log.dir}/userlogs/$taskid. These are served from the TaskLogServlet of the TaskTracker.
# Apart from launching the task itself, we may need some other actions to be performed as the job owner. For instance:
## Killing of a task
## Maybe setting up and cleaning up of the directories / files
## Running the debug script - {{mapred.map|reduce.task.debug.script}}

Is there anything that I am missing ? Comments on the questions of shared directories / files - distributed cache, intermediate outputs, log files ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653327#action_12653327 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Steve, I do agree that the security manager approach is simpler and portable. However, I think the requirement is also to support features like streaming. Given that, I think the security manager approach would not work. Am I right ?

Also, I agree that authentication and authorization is a pre-requisite for this. It is being handled by the other tasks under the jira HADOOP-4487. Here, I am focussing only on the mechanisms to make tasks run as the users who submitted the jobs - a small part of the larger framework.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: HADOOP-4490.patch

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12699071#action_12699071 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

bq. Check before TaskController.initializeTask() in JvmManager.getTaskForJVM() to see to that there is no double initalization of same task being done.
I think we should move the call to initializeTask into JvmRunner.runChild(). This way it is more explicit why the check is required.

Other than this +1. Please make this PA.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: HADOOP-4490-1.patch

Attaching patch incorporating Comments from Hemanth and Arun. The only change which has been done which move away from the above are:

* No seperate permissions being set for pid directories, as they are not being used in the current trunk.
* Assumption that path to jobcache directory, log directory the user who runs task would have read and execute permission along the path.



> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-4490-1.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Arun C Murthy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674504#action_12674504 ] 

Arun C Murthy commented on HADOOP-4490:
---------------------------------------

bq. If it is felt that this should be per TaskController, then we can easily move this to the TaskController API.
On second thoughts - I'd go there right-away, since it means that we don't need to open it up to 777 for mapred.local.dir if the DefaultTaskController is in effect.

bq. Introduce a TaskController.initializeJob [...] 
+1

bq. Modify TaskController.launchTaskJVM to set up permissions [...]
+1

bq. Modify TaskController.initializeTask to set up permissions for [...]
+1

bq. Modify DistributedCache.localizeCache to set up permissions for the localized files.  [...]
+1


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671049#action_12671049 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I also have to mention that some additional work would be required in the LinuxTaskController to sync up with changes from HADOOP-4759 that was committed yesterday, as it introduced a new task (TaskCleanup task) which would create directories like task-attempt-id-cleanup.

However, the default path (covered by the DefaultTaskController) is not affected, and hence there will be no regression. I am proposing that apart from this, if other things are fine, we should commit HADOOP-4490 and then in a follow up JIRA fix the changes with respect to HADOOP-4759. Otherwise I am seeing that HADOOP-4490 is becoming a moving target (*smile*). Does this make sense ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Nigel Daley (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-4490:
--------------------------------

    Attachment: hadoop-4490-br20-3.2.patch

Another patch needed if applying this to the 0.20 branch.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-br20-3.2.patch, hadoop-4490-br20-3.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Allen Wittenauer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654530#action_12654530 ] 

Allen Wittenauer commented on HADOOP-4490:
------------------------------------------

The user who submits the job should be the user who runs the code on the compute nodes due to issues that surround the environment outside Haddop.  For example, it is possible to submit a job that writes junk data to the low priv user's home dir.  Without tracking who submitted that job, ops would never know who to go bonk on the head.

... and then there is streaming.

 I can think of instances where it might be useful to have generic accounts run stuff.  In those instances, it is still much better to have that handled outside Hadoop. [Either through setuid scripts, roles, sudo, kinit a special keytab prior to job submit, whatever.]  Let the OS/tool/ops team/whatever deal with the accounting in those situations.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Status: Open  (was: Patch Available)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654482#action_12654482 ] 

Owen O'Malley commented on HADOOP-4490:
---------------------------------------

Hemanth,
  This sounds good, but from a security standpoint, I think that it would be better to make the tasks more specific. So something like:

CREATE_TASK_DIR owen task_20080101_0001_m_000001_1
MOVE_TASK_OUTPUT owen task_20080101_0001_m_000001_1
REMOVE_TASK_DIR owen task_20080101_0001_m_000001_1

It would also be good to have the task tracker root directories in a separate config file that can be owned by root. My goal is to make this executable as limited as possible. It should also block root as the user. What I do *not* want to see is having this work:

MOVE_FILES root /tmp/foo /etc/passwd


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-14.patch

Making the initalizeTask call explicit in runChild.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652015#action_12652015 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Thanks for the comments, Owen.

bq. It should be written in C, not Java to ensure it has enough access to the platform to actually be secure. In particular, it has to clear both real and effective user ids.
Yes, I had that in mind. Specifically, I was planning to do something like setuid(getpwnam(user_name)->pw_uid). Since this would be done by a program running as superuser (the setuid exe), it would clear both the real and effective uids.

bq. I'd like to see the proposed list of commands for the setuid program.
Sure, I will work on that and post the list here. In order to be reasonably complete, I think I should have a version that's working. So, I will start prototyping on the lines I described above.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12679158#action_12679158 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Mahadev, thanks for debugging this issue. The problem was that the permissions being used in LinuxTaskController were making an assumption that files submitted to the cluster, say via streaming jobs, or in distributed cache, will *already* be executable if they need to be executed as part of the tasks. However, there seem to be scenarios in which this is not mandated, and the framework handles this in the default case (when user tasks are run as the tasktracker itself). The right fix is to modify the permissions without this assumption. If we do that, I guess the change in RunJar is not necessary. One more concern is that the RunJar change would fix the problem only for jar files, but not for other types of archives. We are testing this (with the failed streaming test case) with the modified permissions in LinuxTaskController. If that works, I suggest we just use the new version of the LinuxTaskController itself.

Again, thanks for debugging this issue !

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: HADOOP-4490-3.patch

Attaching a patch which resolves the NullPointerException raised when TT is re-inted. The problem was with race condition in which JVMRunner was issued kill before it launched the task. The patch now checks if the inital context is set. i.e task is launched if not, it merely updates the data structures.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-6.patch

Latest Merged patch.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Attachment: HADOOP-4490.patch

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697375#action_12697375 ] 

Sreekanth Ramakrishnan commented on HADOOP-4490:
------------------------------------------------

The contrib test failures are not related to the patch.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Robert Chansler (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Chansler updated HADOOP-4490:
------------------------------------

    Attachment: hadoop-4490-br20-3.patch

Attached example for earlier version not to be committed.


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-br20-3.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654802#action_12654802 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

bq.CREATE_TASK_DIR owen task_20080101_0001_m_000001_1

Owen, sure this makes sense. Just one point is that I might need the job id along with the task id. But that's still within the same spirit.

I will also make the other changes you have suggested like blocking root user.

I've also added a CLEANUP_TASK_DIR command to the executable which is now able to cleanup the directories after task is completed. This is called from the CleanupQueue thread in task tracker.



> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12663257#action_12663257 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

The latest patch also adds correct permissions to files localized as part of the distributed cache. In order to do this, I introduced a new API in distributed cache to indicate whether the files were localized freshly (i.e. as part of the current task's localization), or whether they are already existing in cache. I use this API to avoid setting permissions on the same cache files repeatedly for each task. If there's a better way to do this, I would be glad to know that.

All changes made in the previous patch still hold good, except for minor refactoring.

This patch is now complete to the best of my knowledge. Please do offer your comments.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

      Resolution: Fixed
    Release Note: Added the ability to run tasks as users who submitted the job.
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

I just committed this. Thanks, Sreekanth !

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12696883#action_12696883 ] 

Sreekanth Ramakrishnan commented on HADOOP-4490:
------------------------------------------------

Attaching output of the ant test patch:

{noformat}
     [exec] 
     [exec] -1 overall.  
     [exec] 
     [exec]     +1 @author.  The patch does not contain any @author tags.
     [exec] 
     [exec]     -1 tests included.  The patch doesn't appear to include any new or modified tests.
     [exec]                         Please justify why no tests are needed for this patch.
     [exec] 
     [exec]     +1 javadoc.  The javadoc tool did not generate any warning messages.
     [exec] 
     [exec]     +1 javac.  The applied patch does not increase the total number of javac compiler warnings.
     [exec] 
     [exec]     +1 findbugs.  The patch does not introduce any new Findbugs warnings.
     [exec] 
     [exec]     +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
     [exec] 
     [exec]     +1 release audit.  The applied patch does not increase the total number of release audit warnings.
{noformat}


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Mahadev konar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mahadev konar updated HADOOP-4490:
----------------------------------

    Attachment: HADOOP-4490_streaming.patch

we were running with this patch and had problems with streaming scripts being set to 666. This patch fixes that to make all the files unjarred to be  readable and executable by all.


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Doug Cutting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12654496#action_12654496 ] 

Doug Cutting commented on HADOOP-4490:
--------------------------------------

> Steve: have some low-privilege user for running work; there isn't a 1:1 mapping of grid users to user accounts
> Owen: running as the real user is important. If I run a job, I should not be able to look at or kill your job's data or tasks

Might it be possible to have a pool of low-privileged users, to remove the requirement that every user has an account on every machine?  Or maybe that requirement's not that onerous, with PAM/LDAP?


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-7.patch

Attaching patch addressing Review comments by Hemanth.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Status: Patch Available  (was: Open)

Running thro' Hudson.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642043#action_12642043 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I think HADOOP-4451 is either related or a duplicate of this. Correct ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658744#action_12658744 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

The attached patch implements changes in the tasktracker to launch tasks using the setuid executable defined in HADOOP-4930. By doing so, it runs tasks as job owners. The CLI for the setuid exe is:
{code}
 task-controller <user-name> <command-enum-value> <job-id> <task-id> <tasktracker-root>
{code}

As mentioned in comments above, this patch only handles launching and killing of tasks, and does not handle file and directory permissions securely. In fact, it opens up the permissions so that both the tasktracker and task can share files and directories. However, this change is only done when the feature is enabled, and does not affect the default Hadoop behavior. When HADOOP-4491 and other issues are fixed, secure permissions will be replaced.

The changes in the patch include:
- A TaskController class that defines abstract methods for launching and killing tasks
- A DefaultTaskController where a little code from JvmManager has been moved
- A LinuxTaskController which implements the methods by calling the setuid executable of HADOOP-4930.
- A new configuration variable mapred.task.tracker.task-controller to define the specific type of TaskController to use. Defaults to DefaultTaskController.

Tested this on a single node cluster, along with the setuid executable of HADOOP-4930. Will follow-up with testing on larger clusters.

I request a review for the same.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12674476#action_12674476 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Thanks for the review Arun.

I had a discussion with Sreekanth about the changes, and we are proposing the following:

- Introduce a {{TaskTracker.initializeSystemDirs}}. This will create $mapred.local.dir/taskTracker/jobCache/, $mapred.local.dir/taskTracker/archives, and $hadoop.log.dir/userlogs on all relevant disks.
Currently, as per Arun's comments, we'll have this API in TaskTracker, which will be called at Tracker initialization time. If it is felt that this should be per TaskController, then we can easily move this to the TaskController API. I think this may need 777 on the $mapred.local.dir/taskTracker/jobCache/ directory currently because the files would be created both by the task and the tracker - for e.g. the task could create the output directories on a new disk which has yet not been touched by the tracker.

- Introduce a {{TaskController.initializeJob}}. This will be called from {{TaskTracker.localizeJob}}, with the jobid as parameter. This will set up the access for $mapred.local.dir/taskTracker/jobCache/jobid directories on all disks which have been touched by localization.

- Modify {{TaskController.launchTaskJVM}} to set up permissions for the log dir and the pid dir associated with that task. This will remove the call to {{initializeTask}} from the {{JvmManager.runChild}} API

- Modify {{TaskController.initializeTask}} to set up permissions for the log dir, pid dir, and task cache dir for the task. There is no need to set up things for the job, because it's been done in {{initializeJob}} already. We will need to repeat the permission setting for the log dir and pid dir.

- Modify {{DistributedCache.localizeCache}} to set up permissions for the localized files. We propose to recursively set up 755 permissions (hardcoded) for all files under the $mapred.local.dir/taskTracker/archive/ directory for now. This might repeatedly set up permissions for files that are already correctly setup. However, it will keep things simple. If there's a performance issue, it is easy to address it, by setting it only for the files being localized, and by walking up its parent paths. Please let me know if this seems a bad choice.

The above changes will mean we can remove:

- DistributedCacheFileAccessInfo
- FileUtil.setPermissionsForPathComponents
- TaskController.cleanup
- The runningJobs state maintained by LinuxTaskController.

Arun, does this tie in with your expectations ?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Status: Open  (was: Patch Available)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Andrzej Bialecki (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12652028#action_12652028 ] 

Andrzej Bialecki  commented on HADOOP-4490:
-------------------------------------------

What about Cygwin / Windows users?

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655922#action_12655922 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I had an offline discussion with Sameer about how to get this patch in. To make it easier for reviewing, maybe it makes sense to split the task up into multiple sub tasks. Atleast 3 that are identified are:
- Launch and kill tasks (this would involve RUN_TASK and KILL_TASK commands)
- Handle local data securely (this would involve SETUP_TASK and MOVE_TASK_OUTPUT and CLEANUP_TASK commands)
- Handle distributed cache.

In order to get a working launch and kill tasks patch though, the file and directory permissions will need to be opened up to allow access to all users. Each of the other patches will make it more secure. 

Please note that we have discussed the approach of how we will address directory and file permissions (such as intermediate outputs) in this JIRA already. This proposal is only to make it simpler to get some incremental patches in. Would this work ? If yes, I will use this JIRA to handle the first of the three tasks, then use HADOOP-4491 and HADOOP-4493 for the others.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: cluster_setup.pdf

Uploading built pdf documentation for the change made by the patch for review.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-9.patch

Attaching new patch with documentation changes.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hadoop QA (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12697345#action_12697345 ] 

Hadoop QA commented on HADOOP-4490:
-----------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12404914/hadoop-4490-9.patch
  against trunk revision 763247.

    +1 @author.  The patch does not contain any @author tags.

    -1 tests included.  The patch doesn't appear to include any new or modified tests.
                        Please justify why no tests are needed for this patch.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

    +1 release audit.  The applied patch does not increase the total number of release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    -1 contrib tests.  The patch failed contrib unit tests.

Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/166/testReport/
Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/166/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/166/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/166/console

This message is automatically generated.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Owen O'Malley (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651616#action_12651616 ] 

Owen O'Malley commented on HADOOP-4490:
---------------------------------------

+1 for a setuid program.

It should be written in C, not Java to ensure it has enough access to the platform to actually be secure. In particular, it has to clear both real and effective user ids.

I'd like to see the proposed list of commands for the setuid program.

No user-specified strings should be included on the command line, to avoid special character attacks.

I agree with Sameer that we should have very tight permissions on the map output and task directories. One of the subcommands should probably be to move the outputs from somewhere like $task/output to somewhere like $tt/output/$job/$task.

Having a plugin that lets us switch between the current pure-java implementation that doesn't change user ids and a setuid implementation sounds reasonable. We should continue to support the non-user-switch by default for clusters run by a single non-root user.


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Status: Patch Available  (was: Open)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Status: Patch Available  (was: Open)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, hadoop-4490-14.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: HADOOP-4490-2.patch

Attaching patch incorporating Hemanth's Comment. Tested the patch and exception was not thrown.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Status: Open  (was: Patch Available)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Steve Loughran (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12653267#action_12653267 ] 

Steve Loughran commented on HADOOP-4490:
----------------------------------------

You are about to take on one of the big problems they hit in the grid world: identity. all the grid tools (condor, platform, etc) have lots of effort put in at the OS level to create new users on target machines, manage the disk and cpu usage limits of that user, etc. But you also need to propagate identity over the wire, which gets you into SAML and other things. Because right now the JobTracker trusts you to be who you say you are -having caller authentication would be a prerequisite to doing back-end user switching.

If you are interested in running pure Java apps under different rights, this could be done via a security manager. Every task would be started with an explicit security manager/policy that limited what it could do, file and network operations would be checked against the policy. This would be portable and easier to test. It also eliminates the need to run the TT as root, to keep the unix user database in sync with the hadoop user list, etc.




> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment:     (was: hadoop-4490-6.patch)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671561#action_12671561 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

The failed contrib test is in Chukwa and is the same as HADOOP-5172.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655673#action_12655673 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

I have a version now that runs with JVM reuse enabled also. The main changes to get this to work was to correct the way I was figuring out the current task ids in the JvmManager class. 

Also added a KILL_TASK command. This will look as follows:

KILL_TASK user_name job_id task_attempt_id

This will be called from JvmRunner.kill(), which in turn is called whenever a TT gets a kill task action. Since the JVM process is running as the job owner (different from the TT), we can't directly destroy the JVM process. Instead, what I've done is the following:
- Write a (hidden) .pid file into the task directory when a task is executed. This is owned by the job owner and not readable by anyone else. The pid file contains the JVM's pid.
- When the JVM needs to be killed, we call the taskcontroller executable with the job_id and task_id. 
- The taskcontroller drops privileges to the job owner, then reads the pid file and gets the pid of the jvm.
- Then the taskcontroller issues a kill(pid, SIGTERM) to kill the jvm.

Any concerns with this approach ?

Currently other than distributed cache, all other aspects of the task life cycle are functioning. I'll probably upload a single writeup (as Arun had done for HADOOP-4348) that will capture all the information in comments above for easy reference.

And of course, follow that up with the first patch. *smile*

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Fix Version/s:     (was: 0.20.0)

Reseting fix for version, as this will not make the feature freeze.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-13.patch

Attaching patch with following changes:

* FileUtil now uses {{ShellCommandExecutor}} for chmod operations. It supresses the {{IOException}} thrown while doing chmod and logs it if debug enabled.
* Check  before {{TaskController.initializeTask()}}  in  {{JvmManager.getTaskForJVM()}}  to see to that there is no double initalization of same task being done.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-10.patch, hadoop-4490-11.patch, hadoop-4490-12.patch, hadoop-4490-13.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-6.patch, hadoop-4490-7.patch, hadoop-4490-8.patch, hadoop-4490-9.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12681701#action_12681701 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

One more change required. I've noticed that when the JVM Manager calls a killTaskJVM, the current working directory for the LinuxTaskController is set to a task attempt directory. This will fail, because this directory will no longer exist. While this is not creating problems as well behaved JVMs will exit themselves, it is not something to be relied on.

The relevant piece of code is this:

{code}
    private ShellCommandExecutor buildTaskControllerExecutor(TaskCommands command, 
                                            String userName, 
                                            List<String> cmdArgs, JvmEnv env) 
                                              throws IOException {

      ...
      if (env != null) {
       shExec = new ShellCommandExecutor(taskControllerCmd,
            env.workDir, env.env);
      }else {
        shExec = new ShellCommandExecutor(taskControllerCmd);
      }
    }
{code}

Setting env.workDir will set the working directory to that particular task attempt directory which may no longer exist. I get the following error when this happens:

2009-03-13 10:39:52,750 WARN org.apache.hadoop.mapred.LinuxTaskController: IOException in killing task: Cannot run program "/path/to/bin/task-controller" (in directory      "/path/to/mapred-local/taskTracker/jobcache/job_200903130908_0051/attempt_200903130908_0051_m_001012_1000/work"): java.io.IOException: error=2, No such file or directory


    


> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: HADOOP-4490-1.patch, HADOOP-4490-1.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala reassigned HADOOP-4490:
----------------------------------------

    Assignee: Hemanth Yamijala

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.20.0
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655883#action_12655883 ] 

Hemanth Yamijala commented on HADOOP-4490:
------------------------------------------

Thanks, Owen. I will take care of that.

Some discussion is required for handling distributed cache (HADOOP-4493). Firstly, localized files from distributed cache are not localized per job. Since anything can be passed through distributed cache, I think it should support the same level of access control as the rest of the files. That is, they should be changed to be localized per job and subject to the same access control mechanisms we are using for the rest of the files - like output directories etc.

I don't think this is a big impact for users as they can't assume the cache to contain the files they want on the nodes where the task is running. However, from the system perspective, probably if a lot of users (say working on the same project that requires the same data files) want to share this but across multiple jobs, we would be copying only once per node, saving both space and time. If we modify this to be localized per job, we could lose that advantage, no ?

Any thoughts on this trade off ?



> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Hemanth Yamijala (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hemanth Yamijala updated HADOOP-4490:
-------------------------------------

    Status: Patch Available  (was: Open)

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>         Attachments: hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HADOOP-4490) Map and Reduce tasks should run as the user who submitted the job

Posted by "Sreekanth Ramakrishnan (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sreekanth Ramakrishnan updated HADOOP-4490:
-------------------------------------------

    Attachment: hadoop-4490-5.patch
                cluster_setup.pdf

Attaching latest patch with following changes:

* Merged the patch with the trunk.
* Added documentation to the cluster_setup page. Created a subsection in Site configuration to list task controller configuration after real world cluster configuration. Attaching the cluster setup document pdf for checking output and text of documentation.

> Map and Reduce tasks should run as the user who submitted the job
> -----------------------------------------------------------------
>
>                 Key: HADOOP-4490
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4490
>             Project: Hadoop Core
>          Issue Type: Sub-task
>          Components: mapred, security
>            Reporter: Arun C Murthy
>            Assignee: Hemanth Yamijala
>             Fix For: 0.21.0
>
>         Attachments: cluster_setup.pdf, HADOOP-4490-1.patch, HADOOP-4490-1.patch, HADOOP-4490-2.patch, HADOOP-4490-3.patch, hadoop-4490-4.patch, hadoop-4490-5.patch, hadoop-4490-design.pdf, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490.patch, HADOOP-4490_streaming.patch
>
>
> Currently the TaskTracker spawns the map/reduce tasks, resulting in them running as the user who started the TaskTracker.
> For security and accounting purposes the tasks should be run as the job-owner.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.