You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@slider.apache.org by Manoj Samel <ma...@gmail.com> on 2016/05/02 23:25:47 UTC

Update : Run each component of application as different user working - except stop command

Hello,

I wanted provide a update on progress I have made so far and the pending
issue ...

1. I have logged https://issues.apache.org/jira/browse/SLIDER-1114 to
describe the use case in detail as a new feature request. Josh - regarding
your suggestion on launching each component as separate app; I have added a
comment in Jira on why that is not feasible - hope that explains the reason
for ask

2. I didn't find any out of box option to run each component as separate
user. I have achieved this as follows

* All original components are same Java class. Each does get different
parameters using -D as part of the Execute() command. i.e. "java -Dp1 -Dp2
-cp <my class>. One of the parameter is the component target user
* I developed a C program that takes similar arguments as the original
program as arguments. The binary is installed as root owned and with setuid
root
* At start, the program does sanity checks and drops the user to the
intended component user. Then it does a execle() on the java component.
* Component runs as the users effective uid and gid; achieving the security
requirement

3. This way, I can start slider AM as some admin user say "admin" and start
each component as a different non-admin user e.g. "user_A", "user_B" etc.
etc. This seems to be working fine

4. After the component comes up successfully, I found that slider was
somehow not detecting running component. It kept starting additional
components (the issue I reported last week). The issue was as follows

In the class status command, I was using the out of box function
check_process_status(pid_file). This function detects a running component
by sending "kill 0" (a no op) to the process ID. There are two issues with
the "kill(0, pid)" approach

* The "kill 0" (or any signal) can only be sent to process with same user
(unless you are root). The issue is the agent is running as "admin" user
and component is ** NOW ** running as a different user e.g. "User A" etc.
Thus the kill 0 fails because there are no permissions to send signal to
process. Thus slider agent fails to detect running component, starting a
new ones rapidly. I changed the status() command not to use the
check_process_status and instead implemented my own checks. It works not -
detecting both running components and stopped component . This is progress
since my mail ...

* The other potential issue with check_process_status(pid_file) as it is
implemented right now is it just checks if the process with PID listed in
PID file exists. However, it may happen that the component process died and
another process (running by same user) was allocated the same (now unused)
PID. Then check_process_status() will wrongly report that component is
running. In the status() command I have implemented, I have more checks
based on "ps --pid" command to ensure that the process is indeed running
the Java component and not some other random command

5. Now the component starts and status command works. However, the "slider
stop" command is failing to stop the component. I suspect it is
same/similar issue that I hit in status command - agent and component
running as separate user hence failing to stop; but haven't had chance yet
to investigate the stop issue. If anyone can think of any pointers, they
are most welcome.


Thanks again for your feedback and support so far !

Manoj


On Mon, Apr 25, 2016 at 7:51 AM, Billie Rinaldi <bi...@gmail.com>
wrote:

> Hi Manoj,
>
> The "/usr/bin/python -S <path to component script.py> START ..." command
> looks normal; that is how the component scripts are executed.  I would
> expect this to be the parent of the "/bin/bash --login -c <Execute
> command>" process.  If you're seeing the python as the parent of the
> Execute command, it is probably not why your process is failing.  One thing
> you can do is add logoutput=True to your Execute command; maybe that will
> show you the error that is happening.
>
> I wonder if the problem is related to the use of execle.  I understand that
> exec commands replace the current process rather than creating a child
> process, like the system command would.  Perhaps if you used the system
> command in your C code, that would produce a different result.
>
> Billie
>
> On Fri, Apr 22, 2016 at 12:17 PM, Manoj Samel <ma...@gmail.com>
> wrote:
>
> > Hello Again !
> >
> > One more observation .. hopefully that triggers some feedback from this
> > forum ...
> >
> > 1) Without the setuid option, the component Execute() command is "java
> -Dxx
> > -cp yy abc" etc. This runs fine. On the node running the component, I can
> > see this process as well its parent process as "/bin/bash --login -c java
> > -Dxx -cp yy abc" etc. So all is good and parent process is the shell as
> > expected
> >
> > 2) With the setuid option, the component Execute() command is not java
> but
> > the path to my C executable and its parameters e.g. "/a/b/processlauncher
> > arg1 arg2". When I run this, the parent of this dies quickly -- but I was
> > able to capture the parent process before it dies. The parent is NOT
> > "/bin/bash --login -c " as I was expecting but is "/usr/bin/python -S
> <path
> > to component script.py> START
> >
> >
> /xxx/application_1461117905837_0276/container_e13_1461117905837_0276_01_000002/command-2.json
> >
> >
> /xxx/appcache/application_1461117905837_0276/filecache/11/spas-1.0.0.zip/package
> >
> >
> /xyz/application_1461117905837_0276/container_e13_1461117905837_0276_01_000002/structured-out-2.json
> > INFO
> >
> >
> /foo/application_1461117905837_0276/container_e13_1461117905837_0276_01_000002
> >
> > It appears that when the component is a executable, rather than Java (as
> > was in case 1), it is run as Python script !  Any idea why ? Could this
> be
> > reason why the parent process is dying quickly ?
> >
> > I also tried this with a simple C program as component that does nothing
> > but loops infinitely. I.e. without it being setuid or doing other
> execle()
> > etc. Even with the simple C binary, I see above behavior. So something
> > different about using a executable rather than Java command as component
> > ??? Should I execute the C binary component in different manner ?
> >
> > Any guidance on this will be really appreciated !!!!
> >
> >
> > Thanks,
> >
> > Manoj
> >
> > ---------- Forwarded message ----------
> > From: Manoj Samel <ma...@gmail.com>
> > Date: Thu, Apr 21, 2016 at 2:40 PM
> > Subject: Need Help !: Run each component of application as different user
> > To: dev@slider.incubator.apache.org
> >
> >
> > Hi,
> >
> > See use case background below
> >
> > I have implemented option 2 mentioned below (as a C program deployed on
> > nodes as setuid root binary). Need help in debugging issue I am seeing
> >
> > Without the setuid option, the execution is
> >
> > 1. Launch Slider AM as user "A"
> > 2. Launch java component using command like "java -cp ....". These run as
> > user "A" as well. Things run well
> >
> > With setuid root option, the execution is
> >
> > 1. Launch slider AM as user "A" as before
> > 2. Instead of launching java program as the component, launch the setuid
> > program as a component. The program gets the end user name , say "B" as
> > parameter. It does a setuid() and setgid() to user "B" (remember, its
> > running as setuid root) and does a "execle()" for the java component,
> > setting java parameters etc.
> >
> > The component comes up fine but I noticed that the "status" command fails
> > ... Digging further, it seems that the parent process dies when I use the
> > setuid
> >
> > With the normal execution, I noticed that there are two processes
> launched
> > for a component on a node. The first process is "/bin/bash --login -c
> java
> > ..." coming from my Execute() (which is traced to sliders
> > resource_management/core/shell.py. The child process then is "java xxx".
> > User for both processes is user "A"
> >
> > With the setuid execution, the parent process dies quickly. The child
> > process gets orphaned and gets parent process ID as 1 (and is running as
> > user "B")
> >
> > Any help in identifying why is the parent process dying ?
> >
> > Thanks in advance !!
> >
> > Manoj
> >
> > PS : Please ignore my last mail sent with same title few minutes back. I
> > hit return by mistake when it was not complete :(
> >
> > On Fri, Apr 8, 2016 at 10:30 AM, Manoj Samel <ma...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > Environment is slider .80 on Hadoop 2.6 secured cluster
> > >
> > > A component is launched for each distinct user of the service (via
> > > upgrade). E.g. when user A accesses service, do a "upgrade" and create
> a
> > > component for user A. When user B comes, create another component for
> > user
> > > B etc.
> > >
> > > At present, all of these components are launched & run as single linux
> > > user. What are the options to run each component as different user ?
> > >
> > > Following are couple of options I can think of, each involving starting
> > as
> > > root and then setting the uid / gid to the desired user
> > >
> > > 1. Launch the java command using "sudo". Then at the start, the Java
> > > program sets its real uid to the target user (passed as option to
> > program)
> > > using a small C function used as JNI call. From then on, it runs as
> that
> > > effective user for rest of its life. One open research question is -
> > Since
> > > sudo has to be run by a non-root user, then the sudoer need to be
> updated
> > > to allow this without password. Not yet sure what command should the
> > sudoer
> > > should contain, as this is launched by python class.
> > >
> > > 2. Have a C executable that is setUID root. Launch this executable as
> > > component (with user as one of the parameter). The first thing it does
> is
> > > drops its UID to the target user and then does a exec "java xxx",
> running
> > > java as the target user
> > >
> > > Any other out-of-box options ?
> > > In resource_management/core/resources/system.py, I noticed that class
> > > Execute can take a parameter "user" <  user = ResourceArgument() >. Its
> > not
> > > clear if and how this could be used. In core/shell.py, the logic around
> > > "user" is commented out with comment " Do not su to the supplied user"
> ..
> > >
> > > Any feedback on approach / pros / cons / potential issues will be
> > > appreciated !
> > >
> > > Thanks,
> > >
> > > Manoj
> > >
> > >
> > >
> >
>

Re: Update : Run each component of application as different user working - except stop command

Posted by Manoj Samel <ma...@gmail.com>.
Hi Josh,

Mapreduce in secured cluster already works like this - the jobs are run
using end user logins - not as some admin login. This is key security
feature and ensures no rogue jobs can interfere with each other or do some
linux level access.

So this feature will bring slider component security in parity with what
the traditional map-reduce does. I believe this will be important feature
as slider gets more adoption for running custom services ( beyond Hbase
etc. which could be run as single user)

Thanks,

Manoj

On Wed, May 4, 2016 at 7:28 AM, Josh Elser <jo...@gmail.com> wrote:

> Manoj Samel wrote:
>
>> 1. I have loggedhttps://issues.apache.org/jira/browse/SLIDER-1114  to
>> describe the use case in detail as a new feature request. Josh - regarding
>> your suggestion on launching each component as separate app; I have added
>> a
>> comment in Jira on why that is not feasible - hope that explains the
>> reason
>> for ask
>>
>
> Thanks for the details, Manoj.
>
> I'm still a little worried about the scope of the change since it's not at
> all what Slide was originally intending to do, but don't let me stop you
> from working it out!
>

Re: Update : Run each component of application as different user working - except stop command

Posted by Josh Elser <jo...@gmail.com>.
Manoj Samel wrote:
> 1. I have loggedhttps://issues.apache.org/jira/browse/SLIDER-1114  to
> describe the use case in detail as a new feature request. Josh - regarding
> your suggestion on launching each component as separate app; I have added a
> comment in Jira on why that is not feasible - hope that explains the reason
> for ask

Thanks for the details, Manoj.

I'm still a little worried about the scope of the change since it's not 
at all what Slide was originally intending to do, but don't let me stop 
you from working it out!