You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Ovidiu Predescu (JIRA)" <ji...@apache.org> on 2015/05/03 22:55:05 UTC

[jira] [Created] (AURORA-1303) Thermos runner broken with non-root account

Ovidiu Predescu created AURORA-1303:
---------------------------------------

             Summary: Thermos runner broken with non-root account
                 Key: AURORA-1303
                 URL: https://issues.apache.org/jira/browse/AURORA-1303
             Project: Aurora
          Issue Type: Bug
          Components: Executor
    Affects Versions: 0.7.0
            Reporter: Ovidiu Predescu


This happens with the latest code from github.

I'm trying to schedule the hello_world example using a non-root role. The thermos_runner crashes when it tries to write the checkpoint in the fetch_package process.

It looks like what is happening is the runner is executing as the non-root user, but the checkpoint is owned by root.

Unfortunately the error handling in Aurora is not very good. The exception thrown by the runner is silently swallowed, and the fetch_package process is running without showing any failures in the log files. I was able to figure out what's going on by manually running the command.

As a workaround I added user 'ovidiu' to group 'root', since the directory containing the checkpoint has 'rwx' permissions for the group.

This is the command:

/usr/bin/python2.7 /var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex --setuid=ovidiu --thermos_json=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/task.json --sandbox=/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/sandbox --log_dir=. --task_id=1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1 --log_to_disk=DEBUG --checkpoint_root=/var/run/thermos --hostname=m1a.dc

And here is the output:

Writing log files to disk in .
ERROR] Found existing runner, cannot take control.
ERROR] Unknown exception: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner
ERROR] Traceback (most recent call last):
ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/bin/thermos_runner.py", line 176, in proxy_main
ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py", line 859, in run
ERROR]     with self.control(force):
ERROR]   File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
ERROR]     return self.gen.next()
ERROR]   File "/var/lib/mesos/slaves/20150502-132057-838930604-5050-17297-S23/frameworks/20150502-132057-838930604-5050-17297-0000/executors/thermos-1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runs/68c1af87-c531-424f-9fdb-0840cde02815/thermos_runner.pex/apache/thermos/core/runner.py", line 552, in control
ERROR]     raise self.PermissionError('Unable to open checkpoint %s' % ckpt_file)
ERROR] PermissionError: Unable to open checkpoint /var/run/thermos/checkpoints/1430629905212-ovidiu-devel-hello_world-0-bc87c672-9cb2-4e4b-84c1-2b7d0e8726c1/runner




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)