You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2019/02/06 16:30:00 UTC

[jira] [Commented] (MESOS-8096) Enqueueing events in MockHTTPScheduler can lead to segfaults.

    [ https://issues.apache.org/jira/browse/MESOS-8096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16761903#comment-16761903 ] 

Vinod Kone commented on MESOS-8096:
-----------------------------------

Observed this with LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TaskGroupsSharingViaSandboxVolumes/2

{code}
...
...
I0206 05:23:37.884572 19578 task_status_update_manager.cpp:383] Forwarding task status update TASK_FINISHED (Status UUID: 2612f9b7-a190-4924-b40a-8193bced2dd8) for task producer o
f framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 to the agent
I0206 05:23:37.884624 19578 slave.cpp:5808] Forwarding the update TASK_FINISHED (Status UUID: 2612f9b7-a190-4924-b40a-8193bced2dd8) for task producer of framework ffd3400c-13b0-4d
40-b63a-f4d3efc720de-0000 to master@172.16.10.36:45979
I0206 05:23:37.884678 19578 slave.cpp:5701] Task status update manager successfully handled status update TASK_FINISHED (Status UUID: 2612f9b7-a190-4924-b40a-8193bced2dd8) for tas
k producer of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.884764 19578 master.cpp:8516] Status update TASK_FINISHED (Status UUID: 2612f9b7-a190-4924-b40a-8193bced2dd8) for task producer of framework ffd3400c-13b0-4d40-b63a
-f4d3efc720de-0000 from agent ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 at slave(1170)@172.16.10.36:45979 (ip-172-16-10-36.ec2.internal)
I0206 05:23:37.884784 19578 master.cpp:8573] Forwarding status update TASK_FINISHED (Status UUID: 2612f9b7-a190-4924-b40a-8193bced2dd8) for task producer of framework ffd3400c-13b
0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.884881 19578 master.cpp:11210] Updating the state of task producer of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (latest state: TASK_FINISHED, status updat
e state: TASK_FINISHED)
I0206 05:23:37.885048 19577 hierarchical.cpp:1230] Recovered cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)
(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 (total: cpus:1.7; mem:928; disk
:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3; mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYN
AMIC,default-role,test-principal)]):95; disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1, allocated: disk(allocated: default-role)(rese
rvations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):63; mem(a
llocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):64; cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.2) on age
nt ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 from framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.885195 19572 scheduler.cpp:845] Enqueuing event UPDATE received from http://172.16.10.36:45979/master/api/v1/scheduler
I0206 05:23:37.885380 19571 scheduler.cpp:248] Sending ACKNOWLEDGE call to http://172.16.10.36:45979/master/api/v1/scheduler
I0206 05:23:37.885645 19572 task_status_update_manager.cpp:328] Received task status update TASK_FINISHED (Status UUID: 2dd9e000-d74f-4d94-ad72-0b3177773492) for task consumer of 
framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.885682 19572 task_status_update_manager.cpp:383] Forwarding task status update TASK_FINISHED (Status UUID: 2dd9e000-d74f-4d94-ad72-0b3177773492) for task consumer o
f framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 to the agent
I0206 05:23:37.885735 19572 slave.cpp:5808] Forwarding the update TASK_FINISHED (Status UUID: 2dd9e000-d74f-4d94-ad72-0b3177773492) for task consumer of framework ffd3400c-13b0-4d
40-b63a-f4d3efc720de-0000 to master@172.16.10.36:45979
I0206 05:23:37.885792 19572 slave.cpp:5701] Task status update manager successfully handled status update TASK_FINISHED (Status UUID: 2dd9e000-d74f-4d94-ad72-0b3177773492) for tas
k consumer of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.885802 19578 process.cpp:3588] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
I0206 05:23:37.885885 19578 master.cpp:8516] Status update TASK_FINISHED (Status UUID: 2dd9e000-d74f-4d94-ad72-0b3177773492) for task consumer of framework ffd3400c-13b0-4d40-b63a
-f4d3efc720de-0000 from agent ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 at slave(1170)@172.16.10.36:45979 (ip-172-16-10-36.ec2.internal)
I0206 05:23:37.885905 19578 master.cpp:8573] Forwarding status update TASK_FINISHED (Status UUID: 2dd9e000-d74f-4d94-ad72-0b3177773492) for task consumer of framework ffd3400c-13b
0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.885991 19578 master.cpp:11210] Updating the state of task consumer of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (latest state: TASK_FINISHED, status updat
e state: TASK_FINISHED)
I0206 05:23:37.886134 19578 hierarchical.cpp:1230] Recovered cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)
(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 (total: cpus:1.7; mem:928; disk
:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3; mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYN
AMIC,default-role,test-principal)]):95; disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1, allocated: disk(allocated: default-role)(rese
rvations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):31; mem(a
llocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1) on age
nt ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 from framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.886597 19577 scheduler.cpp:845] Enqueuing event UPDATE received from http://172.16.10.36:45979/master/api/v1/scheduler
I0206 05:23:37.887490 19573 master.cpp:1384] Framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (default) disconnected
I0206 05:23:37.887507 19573 master.cpp:3255] Deactivating framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (default)
I0206 05:23:37.887521 19573 master.cpp:3232] Disconnecting framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (default)
I0206 05:23:37.887529 19573 master.cpp:1399] Giving framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (default) 0ns to failover
I0206 05:23:37.887567 19573 hierarchical.cpp:419] Deactivated framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.887969 19575 master.cpp:9509] Framework failover timeout, removing framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (default)
I0206 05:23:37.887985 19575 master.cpp:10449] Removing framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (default)
I0206 05:23:37.888017 19575 master.cpp:11210] Updating the state of task consumer of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (latest state: TASK_FINISHED, status updat
e state: TASK_KILLED)
I0206 05:23:37.888039 19575 master.cpp:11308] Removing task consumer with resources cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 on agent ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 at slave(1170)@172.16.10.36:45979 (ip-172-16-10-36.ec2.internal)
I0206 05:23:37.888121 19575 master.cpp:11210] Updating the state of task producer of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (latest state: TASK_FINISHED, status update state: TASK_KILLED)
I0206 05:23:37.888139 19575 master.cpp:11308] Removing task producer with resources cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 on agent ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 at slave(1170)@172.16.10.36:45979 (ip-172-16-10-36.ec2.internal)
I0206 05:23:37.888237 19575 master.cpp:11345] Removing executor 'default' with resources cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):31; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1 of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 on agent ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 at slave(1170)@172.16.10.36:45979 (ip-172-16-10-36.ec2.internal)
I0206 05:23:37.888494 19575 slave.cpp:3912] Asked to shut down framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 by master@172.16.10.36:45979
I0206 05:23:37.888509 19575 slave.cpp:3937] Shutting down framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.888519 19575 slave.cpp:6735] Shutting down executor 'default' of framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000 (via HTTP)
I0206 05:23:37.888630 19575 hierarchical.cpp:1230] Recovered cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):31; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1 (total: cpus:1.7; mem:928; disk:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3; mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYNAMIC,default-role,test-principal)]):95; disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1, allocated: {}) on agent ffd3400c-13b0-4d40-b63a-f4d3efc720de-S0 from framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.888712 19575 hierarchical.cpp:358] Removed framework ffd3400c-13b0-4d40-b63a-f4d3efc720de-0000
I0206 05:23:37.888762 19575 process.cpp:3650] Failed to process request for '/master/api/v1/scheduler': discarded
*** Aborted at 1549430617 (unix time) try "date -d @1549430617" if you are using GNU date ***
PC: @     0x7fa52525e3f3 mesos::v1::scheduler::Mesos::send()
*** SIGSEGV (@0x0) received by PID 28957 (TID 0x7fa517c2d700) from PID 0; stack trace: ***
    @     0x7fa4ee96c40d (unknown)
    @     0x7fa4ee970f19 (unknown)
    @     0x7fa4ee964d88 (unknown)
    @     0x7fa521e43330 (unknown)
    @     0x7fa52525e3f3 mesos::v1::scheduler::Mesos::send()
    @     0x7fa527b7a596 _ZNK5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE17gmock_PerformImplISC_SF_N7testing8internal12ExcessiveArgESL_SL_SL_SL_SL_SL_SL_EEvRKSt5tupleIJSC_SF_EET_T0_T1_T2_T3_T4_T5_T6_T7_T8_
    @     0x7fa527b7a717 _ZN5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE7PerformERKSt5tupleIJSC_SF_EE
    @     0x7fa527a8c45f _ZN7testing8internal12DoBothActionI17PromiseArgActionPILi1EPN7process7PromiseIN5mesos2v19scheduler12Event_UpdateEEEENS5_8internal5tests2v19scheduler23SendAcknowledgeActionP2INS6_11FrameworkIDENS6_7AgentIDEEEE4ImplIFvPNS7_5MesosERKS8_EE7PerformERKSt5tupleIJSN_SP_EE
    @     0x7fa527ab4ff4 testing::internal::FunctionMockerBase<>::UntypedPerformAction()
    @     0x7fa528e89717 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
    @     0x7fa527b7c1e2 mesos::internal::tests::scheduler::MockHTTPScheduler<>::events()
    @     0x7fa527b13af1 std::_Function_handler<>::_M_invoke()
    @     0x7fa525262c78 process::AsyncExecutorProcess::execute<>()
    @     0x7fa52526da9d _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISH_SaISH_EEEEESL_SR_RSL_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSX_FSU_T1_T2_EOT3_OT4_EUlSt10unique_ptrINS1_7PromiseISA_EESt14default_deleteIS1B_EEOSP_OSL_S3_E_IS1E_SP_SL_St12_PlaceholderILi1EEEEEEclEOS3_
    @     0x7fa525fcc261 process::ProcessBase::consume()
    @     0x7fa525fdf48c process::ProcessManager::resume()
    @     0x7fa525fe4e56 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
    @     0x7fa52261ea60 (unknown)
    @     0x7fa521e3b184 start_thread
    @     0x7fa521b6803d (unknown)
timeout: the monitored command dumped core
The test binary has crashed OR the timeout has been exceeded!
{code}

> Enqueueing events in MockHTTPScheduler can lead to segfaults.
> -------------------------------------------------------------
>
>                 Key: MESOS-8096
>                 URL: https://issues.apache.org/jira/browse/MESOS-8096
>             Project: Mesos
>          Issue Type: Bug
>          Components: scheduler driver, test
>         Environment: Fedora 23, Ubuntu 14.04, Ubuntu 16
>            Reporter: Alexander Rukletsov
>            Assignee: Alexander Rukletsov
>            Priority: Critical
>              Labels: flaky-test, integration, mesosphere
>         Attachments: AsyncExecutorProcess-badrun-1.txt, AsyncExecutorProcess-badrun-2.txt, AsyncExecutorProcess-badrun-3.txt, mesos-8096-1.txt, mesos-8096-2.txt, mesos-8096-3.txt, scheduler-shutdown-invalid-driver-2.txt, scheduler-shutdown-invalid-driver.txt
>
>
> Various tests segfault due to a yet unknown reason. Comparing logs (attached) hints that the problem might be in the scheduler's event queue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)