You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Anand Mazumdar (JIRA)" <ji...@apache.org> on 2017/01/26 22:44:24 UTC

[jira] [Updated] (MESOS-6989) Docker executor segfaults in ~MesosExecutorDriver()

     [ https://issues.apache.org/jira/browse/MESOS-6989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Anand Mazumdar updated MESOS-6989:
----------------------------------
    Priority: Blocker  (was: Major)

Moving it to blocker since it does result in a stack trace in the task's stdout. Note that our existing tests might not be catching this because they might not be validating the executor's exit status code to be non-zero for docker/default executor.

> Docker executor segfaults in ~MesosExecutorDriver()
> ---------------------------------------------------
>
>                 Key: MESOS-6989
>                 URL: https://issues.apache.org/jira/browse/MESOS-6989
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>            Reporter: Jan-Philip Gehrcke
>            Assignee: Joseph Wu
>            Priority: Blocker
>              Labels: mesosphere
>
> With the current Mesos master state (commit 42e515bc5c175a318e914d34473016feda4db6ff), the Docker executor segfaults during shutdown. 
> Steps to reproduce:
> 1) Start master:
> {code}
> $ ./bin/mesos-master.sh --ip=127.0.0.1 --work_dir=/tmp/jp/mesos
> WARNING: Logging before InitGoogleLogging() is written to STDERR
> I0125 13:41:15.963775 14744 main.cpp:278] Build: 2017-01-25 13:37:42 by jp
> I0125 13:41:15.963868 14744 main.cpp:279] Version: 1.2.0
> I0125 13:41:15.963877 14744 main.cpp:286] Git SHA: 42e515bc5c175a318e914d34473016feda4db6ff
> {code}
> (note that building it at 13:37 is not part of the repro)
> 2) Start agent:
> {code}
> $ ./bin/mesos-slave.sh --containerizers=mesos,docker --master=127.0.0.1:5050 --work_dir=/tmp/jp/mesos
> {code}
> 3) Run {{mesos-execute}} with the Docker containerizer:
> {code}
> $ ./src/mesos-execute --master=127.0.0.1:5050 --name=testcommand --containerizer=docker --docker_image=debian --command=env
> I0125 13:43:59.704973 14951 scheduler.cpp:184] Version: 1.2.0
> I0125 13:43:59.706425 14952 scheduler.cpp:470] New master detected at master@127.0.0.1:5050
> Subscribed with ID 57596743-06f4-45f1-a975-348cf70589b1-0000
> Submitted task 'testcommand' to agent '57596743-06f4-45f1-a975-348cf70589b1-S0'
> Received status update TASK_RUNNING for task 'testcommand'
>   source: SOURCE_EXECUTOR
> Received status update TASK_FINISHED for task 'testcommand'
>   message: 'Container exited with status 0'
>   source: SOURCE_EXECUTOR
> {code}
> Relevant agent output that shows the executor segfault:
> {code}
> [...]
> I0125 13:44:16.249191 14823 slave.cpp:4328] Got exited event for executor(1)@192.99.40.208:33529
> I0125 13:44:16.347095 14830 docker.cpp:2358] Executor for container 396282a9-7bf0-48ee-ba07-3ff2ca801d53 has exited
> I0125 13:44:16.347127 14830 docker.cpp:2052] Destroying container 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.347439 14830 docker.cpp:2179] Running docker stop on container 396282a9-7bf0-48ee-ba07-3ff2ca801d53
> I0125 13:44:16.349215 14826 slave.cpp:4691] Executor 'testcommand' of framework 57596743-06f4-45f1-a975-348cf70589b1-0000 terminated with signal Segmentation fault (core dumped)
> [...]
> {code}
> The complete task stderr:
> {code}
> $ cat /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-0000/executors/testcommand/runs/latest/stderr 
> I0125 13:44:12.850073 15030 exec.cpp:162] Version: 1.2.0
> I0125 13:44:12.864229 15050 exec.cpp:237] Executor registered on agent 57596743-06f4-45f1-a975-348cf70589b1-S0
> I0125 13:44:12.865842 15054 docker.cpp:850] Running docker -H unix:///var/run/docker.sock run --cpu-shares 1024 --memory 134217728 --env-file /tmp/xFZ8G9 -v /tmp/jp/mesos/slaves/57596743-06f4-45f1-a975-348cf70589b1-S0/frameworks/57596743-06f4-45f1-a975-348cf70589b1-0000/executors/testcommand/runs/396282a9-7bf0-48ee-ba07-3ff2ca801d53:/mnt/mesos/sandbox --net host --entrypoint /bin/sh --name mesos-57596743-06f4-45f1-a975-348cf70589b1-S0.396282a9-7bf0-48ee-ba07-3ff2ca801d53 debian -c env
> I0125 13:44:15.248721 15064 exec.cpp:410] Executor asked to shutdown
> *** Aborted at 1485369856 (unix time) try "date -d @1485369856" if you are using GNU date ***
> PC: @     0x7fb38f153dd0 (unknown)
> *** SIGSEGV (@0x68) received by PID 15030 (TID 0x7fb3961a88c0) from PID 104; stack trace: ***
>     @     0x7fb38f15b5c0 (unknown)
>     @     0x7fb38f153dd0 (unknown)
>     @     0x7fb39332c607 __gthread_mutex_lock()
>     @     0x7fb39332c657 __gthread_recursive_mutex_lock()
>     @     0x7fb39332edca std::recursive_mutex::lock()
>     @     0x7fb393337bd8 _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENKUlPS0_E_clES5_
>     @     0x7fb393337bf8 _ZZ11synchronizeISt15recursive_mutexE12SynchronizedIT_EPS2_ENUlPS0_E_4_FUNES5_
>     @     0x7fb39333ba6b Synchronized<>::Synchronized()
>     @     0x7fb393337cac synchronize<>()
>     @     0x7fb39492f15c process::ProcessManager::wait()
>     @     0x7fb3949353f0 process::wait()
>     @     0x55fd63f31fe5 process::wait()
>     @     0x7fb39332ce3c mesos::MesosExecutorDriver::~MesosExecutorDriver()
>     @     0x55fd63f2bd86 main
>     @     0x7fb38e4fc401 __libc_start_main
>     @     0x55fd63f2ab5a _start
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)