You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Greg Mann (JIRA)" <ji...@apache.org> on 2018/02/14 00:44:00 UTC

[jira] [Commented] (MESOS-8540) ROOT_TasksSharingViaSandboxVolumes is flaky.

    [ https://issues.apache.org/jira/browse/MESOS-8540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16363288#comment-16363288 ] 

Greg Mann commented on MESOS-8540:
----------------------------------

I just saw a similar stacktrace occur during test teardown on a different test: {{MesosContainerizer/DefaultExecutorTest.TaskWithFileURI/0}}. This was on Debian 9 with SSL and GRPC enabled. Log attached as {{TaskWithFileURI.txt}}.

> ROOT_TasksSharingViaSandboxVolumes is flaky.
> --------------------------------------------
>
>                 Key: MESOS-8540
>                 URL: https://issues.apache.org/jira/browse/MESOS-8540
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: Ubuntu 17.04
> autotools, ssl build
>            Reporter: Till Toenshoff
>            Priority: Major
>              Labels: flaky, flaky-test
>         Attachments: TaskWithFileURI.txt
>
>
> {noformat}
> 22:50:20 [ RUN      ] LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TasksSharingViaSandboxVolumes/2
> [...]
> 22:50:20 I0202 22:50:20.661401  4573 default_executor.cpp:191] Received ACKNOWLEDGED event
> 22:50:20 I0202 22:50:20.661367 18013 hierarchical.cpp:1192] Recovered cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 (total: cpus:1.7; mem:928; disk:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3; mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYNAMIC,default-role,test-principal)]):95; disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1, allocated: disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):63; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):64; cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.2) on agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0 from framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.661736 18006 scheduler.cpp:739] Enqueuing event UPDATE received from http://172.16.10.208:39629/master/api/v1/scheduler
> 22:50:20 I0202 22:50:20.661929 18011 scheduler.cpp:247] Sending ACKNOWLEDGE call to http://172.16.10.208:39629/master/api/v1/scheduler
> 22:50:20 I0202 22:50:20.662333 18010 process.cpp:3554] Handling HTTP event for process 'master' with path: '/master/api/v1/scheduler'
> 22:50:20 I0202 22:50:20.697870 18007 containerizer.cpp:2791] Container 8acd5ca5-98f0-4373-9914-2118c9c74bc4.e463748e-59c9-46fe-8f0a-24c33a506068 has exited
> 22:50:20 I0202 22:50:20.697901 18007 containerizer.cpp:2338] Destroying container 8acd5ca5-98f0-4373-9914-2118c9c74bc4.e463748e-59c9-46fe-8f0a-24c33a506068 in RUNNING state
> 22:50:20 I0202 22:50:20.697909 18007 containerizer.cpp:2952] Transitioning the state of container 8acd5ca5-98f0-4373-9914-2118c9c74bc4.e463748e-59c9-46fe-8f0a-24c33a506068 from RUNNING to DESTROYING
> 22:50:20 I0202 22:50:20.697983 18007 linux_launcher.cpp:514] Asked to destroy container 8acd5ca5-98f0-4373-9914-2118c9c74bc4.e463748e-59c9-46fe-8f0a-24c33a506068
> 22:50:20 I0202 22:50:20.698537 18007 linux_launcher.cpp:560] Using freezer to destroy cgroup mesos/8acd5ca5-98f0-4373-9914-2118c9c74bc4/mesos/e463748e-59c9-46fe-8f0a-24c33a506068
> 22:50:20 I0202 22:50:20.699381 18008 cgroups.cpp:3060] Freezing cgroup /sys/fs/cgroup/freezer/mesos/8acd5ca5-98f0-4373-9914-2118c9c74bc4/mesos/e463748e-59c9-46fe-8f0a-24c33a506068
> 22:50:20 I0202 22:50:20.700407 18008 cgroups.cpp:1415] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/8acd5ca5-98f0-4373-9914-2118c9c74bc4/mesos/e463748e-59c9-46fe-8f0a-24c33a506068 after 992us
> 22:50:20 I0202 22:50:20.701438 18008 cgroups.cpp:3078] Thawing cgroup /sys/fs/cgroup/freezer/mesos/8acd5ca5-98f0-4373-9914-2118c9c74bc4/mesos/e463748e-59c9-46fe-8f0a-24c33a506068
> 22:50:20 I0202 22:50:20.702519 18008 cgroups.cpp:1444] Successfully thawed cgroup /sys/fs/cgroup/freezer/mesos/8acd5ca5-98f0-4373-9914-2118c9c74bc4/mesos/e463748e-59c9-46fe-8f0a-24c33a506068 after 1.05088ms
> 22:50:20 I0202 22:50:20.703819 18010 http.cpp:1185] HTTP POST for /master/api/v1/scheduler from 172.16.10.208:45566
> 22:50:20 I0202 22:50:20.704056 18010 master.cpp:5775] Processing ACKNOWLEDGE call for status 5c032406-9bb1-45dd-a053-1af6f064ccfa for task producer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (default) on agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0
> 22:50:20 I0202 22:50:20.704083 18011 provisioner.cpp:598] Ignoring destroy request for unknown container 8acd5ca5-98f0-4373-9914-2118c9c74bc4.e463748e-59c9-46fe-8f0a-24c33a506068
> 22:50:20 I0202 22:50:20.704332 18011 containerizer.cpp:2628] Checkpointing termination state to nested container's runtime directory '/tmp/LauncherAndIsolationParam_PersistentVolumeDefaultExecutor_ROOT_TasksSharingViaSandboxVolumes_2_BfTlNd/containers/8acd5ca5-98f0-4373-9914-2118c9c74bc4/containers/e463748e-59c9-46fe-8f0a-24c33a506068/termination'
> 22:50:20 I0202 22:50:20.704195 18010 master.cpp:10303] Removing task producer with resources cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 on agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0 at slave(1025)@172.16.10.208:39629 (ip-172-16-10-208)
> 22:50:20 I0202 22:50:20.704879 18011 task_status_update_manager.cpp:401] Received task status update acknowledgement (UUID: 5c032406-9bb1-45dd-a053-1af6f064ccfa) for task producer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.705235 18011 task_status_update_manager.cpp:538] Cleaning up status update stream for task producer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.705646 18011 slave.cpp:4057] Task status update manager successfully handled status update acknowledgement (UUID: 5c032406-9bb1-45dd-a053-1af6f064ccfa) for task producer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.705756  4573 default_executor.cpp:889] Child container 8acd5ca5-98f0-4373-9914-2118c9c74bc4.e463748e-59c9-46fe-8f0a-24c33a506068 of task 'consumer' completed in state TASK_FINISHED: Command exited with status 0
> 22:50:20 I0202 22:50:20.705790  4573 default_executor.cpp:1018] Terminating after 1secs
> 22:50:20 I0202 22:50:20.705934 18011 slave.cpp:9069] Completing task producer
> 22:50:20 I0202 22:50:20.706215 18011 process.cpp:3554] Handling HTTP event for process 'slave(1025)' with path: '/slave(1025)/api/v1/executor'
> 22:50:20 I0202 22:50:20.748010 18006 http.cpp:1185] HTTP POST for /slave(1025)/api/v1/executor from 172.16.10.208:45572
> 22:50:20 I0202 22:50:20.748122 18006 slave.cpp:4809] Handling status update TASK_FINISHED (Status UUID: 0a1559c6-5681-44c1-a043-6cf06beaa78c) for task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.748740 18006 task_status_update_manager.cpp:328] Received task status update TASK_FINISHED (Status UUID: 0a1559c6-5681-44c1-a043-6cf06beaa78c) for task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.748795 18006 task_status_update_manager.cpp:383] Forwarding task status update TASK_FINISHED (Status UUID: 0a1559c6-5681-44c1-a043-6cf06beaa78c) for task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 to the agent
> 22:50:20 I0202 22:50:20.748862 18006 slave.cpp:5291] Forwarding the update TASK_FINISHED (Status UUID: 0a1559c6-5681-44c1-a043-6cf06beaa78c) for task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 to master@172.16.10.208:39629
> 22:50:20 I0202 22:50:20.748937 18006 slave.cpp:5184] Task status update manager successfully handled status update TASK_FINISHED (Status UUID: 0a1559c6-5681-44c1-a043-6cf06beaa78c) for task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.749053 18006 master.cpp:7850] Status update TASK_FINISHED (Status UUID: 0a1559c6-5681-44c1-a043-6cf06beaa78c) for task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 from agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0 at slave(1025)@172.16.10.208:39629 (ip-172-16-10-208)
> 22:50:20 I0202 22:50:20.749079 18006 master.cpp:7906] Forwarding status update TASK_FINISHED (Status UUID: 0a1559c6-5681-44c1-a043-6cf06beaa78c) for task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.749179 18006 master.cpp:10204] Updating the state of task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (latest state: TASK_FINISHED, status update state: TASK_FINISHED)
> 22:50:20 I0202 22:50:20.749598 18013 scheduler.cpp:739] Enqueuing event UPDATE received from http://172.16.10.208:39629/master/api/v1/scheduler
> 22:50:20 I0202 22:50:20.749830 18006 hierarchical.cpp:1192] Recovered cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 (total: cpus:1.7; mem:928; disk:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3; mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYNAMIC,default-role,test-principal)]):95; disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1, allocated: disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):31; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1) on agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0 from framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 *** Aborted at 1517611820 (unix time) try "date -d @1517611820" if you are using GNU date ***
> 22:50:20 PC: @     0x7f5fdf402163 mesos::v1::scheduler::Mesos::send()
> 22:50:20 I0202 22:50:20.753651 18007 master.cpp:1422] Framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (default) disconnected
> 22:50:20 I0202 22:50:20.753679 18007 master.cpp:3239] Deactivating framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (default)
> 22:50:20 I0202 22:50:20.753697 18007 master.cpp:3216] Disconnecting framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (default)
> 22:50:20 I0202 22:50:20.753707 18007 master.cpp:1437] Giving framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (default) 0ns to failover
> 22:50:20 I0202 22:50:20.753773 18007 master.cpp:8623] Framework failover timeout, removing framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (default)
> 22:50:20 I0202 22:50:20.753793 18007 master.cpp:9500] Removing framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (default)
> 22:50:20 I0202 22:50:20.753829 18007 master.cpp:10204] Updating the state of task consumer of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (latest state: TASK_FINISHED, status update state: TASK_KILLED)
> 22:50:20 I0202 22:50:20.753865 18007 master.cpp:10303] Removing task consumer with resources cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32 of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 on agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0 at slave(1025)@172.16.10.208:39629 (ip-172-16-10-208)
> 22:50:20 I0202 22:50:20.754012 18007 master.cpp:10332] Removing executor 'default' with resources [{"allocation_info":{"role":"default-role"},"name":"cpus","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":0.1},"type":"SCALAR"},{"allocation_info":{"role":"default-role"},"name":"mem","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":32.0},"type":"SCALAR"},{"allocation_info":{"role":"default-role"},"name":"disk","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":31.0},"type":"SCALAR"},{"allocation_info":{"role":"default-role"},"disk":{"persistence":{"id":"executor","principal":"test-principal"},"volume":{"container_path":"executor_volume_path","mode":"RW"}},"name":"disk","reservations":[{"principal":"test-principal","role":"default-role","type":"DYNAMIC"}],"scalar":{"value":1.0},"type":"SCALAR"}] of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 on agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0 at slave(1025)@172.16.10.208:39629 (ip-172-16-10-208)
> 22:50:20 I0202 22:50:20.754346 18007 hierarchical.cpp:405] Deactivated framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.754549 18007 hierarchical.cpp:1192] Recovered cpus(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):0.1; mem(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):32; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)]):31; disk(allocated: default-role)(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1 (total: cpus:1.7; mem:928; disk:928; ports:[31000-32000]; cpus(reservations: [(DYNAMIC,default-role,test-principal)]):0.3; mem(reservations: [(DYNAMIC,default-role,test-principal)]):96; disk(reservations: [(DYNAMIC,default-role,test-principal)]):95; disk(reservations: [(DYNAMIC,default-role,test-principal)])[executor:executor_volume_path]:1, allocated: {}) on agent a3d02bc8-96b0-4c84-a92b-1845429070a0-S0 from framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.754637 18007 hierarchical.cpp:344] Removed framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.754680 18007 slave.cpp:3454] Asked to shut down framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 by master@172.16.10.208:39629
> 22:50:20 I0202 22:50:20.754700 18007 slave.cpp:3479] Shutting down framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000
> 22:50:20 I0202 22:50:20.754710 18007 slave.cpp:6178] Shutting down executor 'default' of framework a3d02bc8-96b0-4c84-a92b-1845429070a0-0000 (via HTTP)
> 22:50:20 *** SIGSEGV (@0x0) received by PID 30314 (TID 0x7f5fd0b4a700) from PID 0; stack trace: ***
> 22:50:20     @     0x7f5fcce095d2 os::Linux::chained_handler()
> 22:50:20     @     0x7f5fcce0e299 JVM_handle_linux_signal
> 22:50:20     @     0x7f5fcce01ff8 signalHandler()
> 22:50:20     @     0x7f5fdcc79670 (unknown)
> 22:50:20     @     0x7f5fdf402163 mesos::v1::scheduler::Mesos::send()
> 22:50:20     @     0x559267eb57c6 _ZNK5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE17gmock_PerformImplISC_SF_N7testing8internal12ExcessiveArgESL_SL_SL_SL_SL_SL_SL_EEvRKSt5tupleIJSC_SF_EET_T0_T1_T2_T3_T4_T5_T6_T7_T8_
> 22:50:20     @     0x559267eb593a _ZN5mesos8internal5tests2v19scheduler23SendAcknowledgeActionP2INS_2v111FrameworkIDENS5_7AgentIDEE10gmock_ImplIFvPNS5_9scheduler5MesosERKNSA_12Event_UpdateEEE7PerformERKSt5tupleIJSC_SF_EE
> 22:50:20     @     0x559267dc547e _ZN7testing8internal12DoBothActionI17PromiseArgActionPILi1EPN7process7PromiseIN5mesos2v19scheduler12Event_UpdateEEEENS5_8internal5tests2v19scheduler23SendAcknowledgeActionP2INS6_11FrameworkIDENS6_7AgentIDEEEE4ImplIFvPNS7_5MesosERKS8_EE7PerformERKSt5tupleIJSN_SP_EE
> 22:50:20     @     0x559267df459b testing::internal::FunctionMockerBase<>::UntypedPerformAction()
> 22:50:20     @     0x559269038869 testing::internal::UntypedFunctionMockerBase::UntypedInvokeWith()
> 22:50:20     @     0x559267ecbc0a mesos::internal::tests::scheduler::MockHTTPScheduler<>::events()
> 22:50:20     @     0x559267e4c2c3 std::_Function_handler<>::_M_invoke()
> 22:50:20     @     0x7f5fdf405e18 process::AsyncExecutorProcess::execute<>()
> 22:50:20     @     0x7f5fdf40d77d _ZNO6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8dispatchI7NothingNS1_20AsyncExecutorProcessERKSt8functionIFvRKSt5queueIN5mesos2v19scheduler5EventESt5dequeISH_SaISH_EEEEESL_SR_RSL_EENS1_6FutureIT_EERKNS1_3PIDIT0_EEMSX_FSU_T1_T2_EOT3_OT4_EUlSt10unique_ptrINS1_7PromiseISA_EESt14default_deleteIS1B_EEOSP_OSL_S3_E_JS1E_SP_SL_St12_PlaceholderILi1EEEEEEclEOS3_
> 22:50:20     @     0x7f5fdffc4df1 process::ProcessBase::consume()
> 22:50:20     @     0x7f5fdffd79a2 process::ProcessManager::resume()
> 22:50:20     @     0x7f5fdffdb6b6 _ZNSt6thread11_State_implISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv
> 22:50:20     @     0x7f5fdd15883f (unknown)
> 22:50:20     @     0x7f5fdcc6f6da start_thread
> 22:50:20     @     0x7f5fdc9a9d7f (unknown)
> 22:50:21 timeout: the monitored command dumped core
> 22:50:21 ./mesos-ci/mesos_ci: line 151: 30312 Segmentation fault      (core dumped) $SUDO env GTEST_FILTER="$GTEST_FILTER" GLOG_v=1 PATH="$PATH" $TIMEOUT 60m "$test_binary" --verbose --gtest_output="xml:${xml}"
> 22:50:21 The test binary has crashed OR the timeout has been exceeded!
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)