You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Artem Harutyunyan (JIRA)" <ji...@apache.org> on 2016/05/27 22:31:12 UTC
[jira] [Updated] (MESOS-5195) Docker executor: task logs lost on
shutdown
[ https://issues.apache.org/jira/browse/MESOS-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Artem Harutyunyan updated MESOS-5195:
-------------------------------------
Fix Version/s: 1.0.0
> Docker executor: task logs lost on shutdown
> -------------------------------------------
>
> Key: MESOS-5195
> URL: https://issues.apache.org/jira/browse/MESOS-5195
> Project: Mesos
> Issue Type: Bug
> Components: containerization, docker
> Affects Versions: 0.27.2
> Environment: Linux 4.4.2 "Ubuntu 14.04.2 LTS"
> Reporter: Steven Schlansker
> Fix For: 1.0.0
>
>
> When you try to kill a task running in the Docker executor (in our case via Singularity), the task shuts down cleanly but the last logs to standard out / standard error are lost in teardown.
> For example, we run dumb-init. With debugging on, you can see it should write:
> {noformat}
> DEBUG("Forwarded signal %d to children.\n", signum);
> {noformat}
> If you attach strace to the process, you can see it clearly writes the text to stderr. But that message is lost and never is written to the sandbox 'stderr' file.
> We believe the issue starts here, in Docker executor.cpp:
> {code}
> void shutdown(ExecutorDriver* driver)
> {
> cout << "Shutting down" << endl;
> if (run.isSome() && !killed) {
> // The docker daemon might still be in progress starting the
> // container, therefore we kill both the docker run process
> // and also ask the daemon to stop the container.
> // Making a mutable copy of the future so we can call discard.
> Future<Nothing>(run.get()).discard();
> stop = docker->stop(containerName, stopTimeout);
> killed = true;
> }
> }
> {code}
> Notice how the "run" future is discarded *before* the Docker daemon is told to stop -- now what will discarding it do?
> {code}
> void commandDiscarded(const Subprocess& s, const string& cmd)
> {
> VLOG(1) << "'" << cmd << "' is being discarded";
> os::killtree(s.pid(), SIGKILL);
> }
> {code}
> Oops, just sent SIGKILL to the entire process tree...
> You can see another (harmless?) side effect in the Docker daemon logs, it never gets a chance to kill the task:
> {noformat}
> ERROR Handler for DELETE /v1.22/containers/mesos-f3bb39fe-8fd9-43d2-80a6-93df6a76807e-S2.0c509380-c326-4ff7-bb68-86a37b54f233 returned error: No such container: mesos-f3bb39fe-8fd9-43d2-80a6-93df6a76807e-S2.0c509380-c326-4ff7-bb68-86a37b54f233
> {noformat}
> I suspect that the fix is wait for 'docker->stop()' to complete before discarding the 'run' future.
> Happy to provide more information if necessary.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)