You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Milan Baran (JIRA)" <ji...@apache.org> on 2017/02/23 14:19:44 UTC

[jira] [Commented] (MESOS-6909) ABORT execvpe() crash when binaries from launcher_dir cannot be found

    [ https://issues.apache.org/jira/browse/MESOS-6909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15880497#comment-15880497 ] 

Milan Baran commented on MESOS-6909:
------------------------------------

I got similiar issue.

I'd suggest extending ABORT log with *argv* nad *envp* for better debuging. 

{code}
  os::execvpe(path.c_str(), argv, envp);

  ABORT("Failed to os::execvpe on path '" + path + "': " + os::strerror(errno));
{code}

My problem is with locating docker.

Fullstack trace:
{code}
I0223 14:04:21.628685   644 docker.cpp:1022] Starting container 'af5156f0-b204-4d5b-9c10-f45dc386c8c2' for task 'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd' (and executor 'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd') of framework cb0578e4-e2ed-4a7e-9a8c-ad946194f49b-0001
E0223 14:04:21.752529   644 slave.cpp:4423] Container 'af5156f0-b204-4d5b-9c10-f45dc386c8c2' for executor 'xxxx.f9a088c7-f9d0-11e6-88f0-00505689f8fd' of framework cb0578e4-e2ed-4a7e-9a8c-ad946194f49b-0001 failed to start: Failed to run '/usr/bin/docker --tls -H unix:///var/run/docker.sock pull busybox:latest': terminated with signal Aborted; stderr='ABORT: (../../../../../..//tmp/mesos-build/mesos-repo/3rdparty/libprocess/include/process/posix/subprocess.hpp:214): Failed to os::execvpe on path '/usr/bin/docker --tls': No such file or directory
*** Aborted at 1487858661 (unix time) try "date -d @1487858661" if you are using GNU date ***
PC: @     0x7f336d1dac37 (unknown)
*** SIGABRT (@0x2c8) received by PID 712 (TID 0x7f3365912700) from PID 712; stack trace: ***
    @     0x7f336d579330 (unknown)
    @     0x7f336d1dac37 (unknown)
    @     0x7f336d1de028 (unknown)
    @           0x4131ac _Abort()
    @           0x4131ec _Abort()
    @     0x7f336f437a3f process::internal::childMain()
    @     0x7f336f436c3c std::_Function_handler<>::_M_invoke()
    @     0x7f336f436bf3 process::defaultClone()
    @     0x7f336f4388be process::internal::cloneChild()
    @     0x7f336f436120 process::subprocess()
    @     0x7f336e8962ea Docker::__pull()
    @     0x7f336e898ad7 Docker::_pull()
    @     0x7f336e8a679f std::_Function_handler<>::_M_invoke()
    @     0x7f336e8bfebe process::internal::thenf<>()
    @     0x7f336e931b26 _ZN7process8internal3runISt8functionIFvRKNS_6FutureI6OptionIiEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
    @     0x7f336e9341f7 process::Future<>::_set<>()
    @     0x7f336f436a3c process::internal::cleanup()
    @     0x7f336e931b26 _ZN7process8internal3runISt8functionIFvRKNS_6FutureI6OptionIiEEEEEJRS6_EEEvRKSt6vectorIT_SaISD_EEDpOT0_
    @     0x7f336e9341f7 process::Future<>::_set<>()
    @     0x7f336ed7d796 _ZN7process8internal3runISt8functionIFvRK6OptionIiEEEJRS4_EEEvRKSt6vectorIT_SaISB_EEDpOT0_
    @     0x7f336ed7d880 process::Future<>::_set<>()
    @     0x7f336f4312f4 process::ReaperProcess::notify()
    @     0x7f336f4314c2 process::ReaperProcess::wait()
    @     0x7f336f402451 process::ProcessManager::resume()
    @     0x7f336f402757 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
    @     0x7f336da4ea60 (unknown)
    @     0x7f336d571184 start_thread
    @     0x7f336d29e37d (unknown)
{code}

> ABORT execvpe() crash when binaries from launcher_dir cannot be found
> ---------------------------------------------------------------------
>
>                 Key: MESOS-6909
>                 URL: https://issues.apache.org/jira/browse/MESOS-6909
>             Project: Mesos
>          Issue Type: Bug
>          Components: agent
>    Affects Versions: 1.1.0
>            Reporter: Aaron Wood
>            Assignee: Kevin Klues
>
> When running the Mesos agent either without --launcher_dir or with a --launcher_dir not pointing to the right place tasks are launched you'll get a crash:
> {code}
> E0111 10:50:56.665149 20924 slave.cpp:4423] Container '6cdd0c9b-cb29-42b0-b6cf-51f410df0f31' for executor '99D50FCB-ADB0-6B2A-3FC3-8A47FF178C10' of framework d3bc8031-29b6-4c2f-9fe3-a73c1b8b6360-0007 failed to start: Collect failed: Failed to setup hostname and network files: ABORT: (../../../3rdparty/libprocess/include/process/posix/subprocess.hpp:214): Failed to os::execvpe on path '/usr/local/libexec/mesos/mesos-containerizer': No such file or directory
> Aborted at 1484149856 (unix time) try "date -d @1484149856" if you are using GNU date ***
> PC: @     0x7fc3bd418428 (unknown)
> SIGABRT (@0x51d8) received by PID 20952 (TID 0x7fc3b6007700) from PID 20952; stack trace: ***
>     @     0x7fc3bd7bd390 (unknown)
>     @     0x7fc3bd418428 (unknown)
>     @     0x7fc3bd41a02a (unknown)
>     @           0x47fafc _Abort()
>     @           0x47fb2a _Abort()
>     @     0x7fc3c385f092 process::internal::childMain()
>     @     0x7fc3c3864227 _ZNSt5_BindIFPFiRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPPcS9_RKN7process10Subprocess2IO20InputFileDescriptorsERKNSC_21OutputFileDescriptorsESI_bPiRKSt6vectorINSB_9ChildHookESaISL_EEES5_S9_S9_SD_SG_SG_bSJ_SN_EE6__callIiJEJLm0ELm1ELm2ELm3ELm4ELm5ELm6ELm7ELm8EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE
>     @     0x7fc3c38635d3 std::_Bind<>::operator()<>()
>     @     0x7fc3c3862682 std::_Function_handler<>::_M_invoke()
>     @           0x48a4b8 std::function<>::operator()()
>     @     0x7fc3c247de67 process::defaultClone()
>     @     0x7fc3c3861c40 std::_Function_handler<>::_M_invoke()
>     @     0x7fc3c3861411 std::function<>::operator()()
>     @     0x7fc3c385f8f5 process::internal::cloneChild()
>     @     0x7fc3c385d50e process::subprocess()
>     @     0x7fc3c30d318f mesos::internal::slave::NetworkCniIsolatorProcess::__isolate()
>     @     0x7fc3c30cf909 mesos::internal::slave::NetworkCniIsolatorProcess::isolate()
>     @     0x7fc3c2d4db56 _ZZN7process8dispatchI7NothingN5mesos8internal5slave20MesosIsolatorProcessERKNS2_11ContainerIDEiS6_iEENS_6FutureIT_EERKNS_3PIDIT0_EEMSD_FSB_T1_T2_ET3_T4_ENKUlPNS_11ProcessBaseEE_clESO_
>     @     0x7fc3c2d50eb8 _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave20MesosIsolatorProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
>     @     0x7fc3c380a1dd std::function<>::operator()()
>     @     0x7fc3c37eb094 process::ProcessBase::visit()
>     @     0x7fc3c37f3b26 process::DispatchEvent::visit()
>     @     0x7fc3c2244a08 process::ProcessBase::serve()
>     @     0x7fc3c37e6f50 process::ProcessManager::resume()
>     @     0x7fc3c37e3a78 _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
>     @     0x7fc3c37f3148 _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIJEEEvSt12_Index_tupleIJXspT_EEE
>     @     0x7fc3c37f309e _ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
>     @     0x7fc3c37f302e _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
>     @     0x7fc3bdc97c80 (unknown)
>     @     0x7fc3bd7b36ba start_thread
>     @     0x7fc3bd4e982d (unknown)
> {code}
> Note that this does not crash hard so the agent stays running.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)