You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mesos.apache.org by Nils De Moor <ni...@gmail.com> on 2014/10/16 10:37:00 UTC

Staging docker task KILLED after 1 minute

Hi,

Environment:
- Clean vagrant install, 1 master, 1 slave (same behaviour on production
cluster with 3 masters, 6 slaves)
- Mesos 0.20.1
- Marathon 0.7.3
- Docker 1.2.0

Slave config:
- containerizers: "docker,mesos"
- executor_registration_timeout: 5mins

When is start docker container tasks, they start being pulled from the HUB,
but after 1 minute mesos kills them.
In the background though the pull is still finishing and when everything is
pulled in the docker container is started, without mesos knowing about it.
When I start the same task in mesos again (after I know the pull of the
image is done), they run normally.

So this leaves slaves with 'dirty' docker containers, as mesos has no
knowledge about them.

>From the logs I get this:
---
I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
20140904-160348-185204746-5050-27588-0000
I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
20140904-160348-185204746-5050-27588-0000
I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
'20140904-160348-185204746-5050-27588-0000
I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
'25ac3310-71e4-4d10-8a4b-38add4537308' for task
'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
'20140904-160348-185204746-5050-27588-0000'

I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000
I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
20140904-160348-185204746-5050-27588-0000 because it has no tasks
E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources for
container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
test-app.23755452-4fc9-11e4-839b-080027c4337a running task
test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for terminal
task, destroying container: No container found
I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received status
update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000
I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received status
update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000

I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
20140904-160348-185204746-5050-27588-0000 because it did not register
within 5mins
---

I already posted my question on the marathon board, as I first thought it
was an issue on marathon's end:
https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY


Kind regards,
Nils

Re: Staging docker task KILLED after 1 minute

Posted by Tim Chen <ti...@mesosphere.io>.

The case where Mesos loses track about these killed containers is going to
be fixed soon, have a reviewboard up and once it merged we shouldn't have
untracked containers.

Tim

On Fri, Oct 17, 2014 at 3:14 PM, Dick Davies <di...@hellooperator.net> wrote:

> good catch! Sorry, the docs are right I just had a brain fart :)
>
> On 17 October 2014 13:46, Nils De Moor <ni...@gmail.com> wrote:
> > Hi guys,
> >
> > Thanks for the swift feedback. I can confirm that tweaking the
> > task_launch_timeout setting in marathon and setting it to a value bigger
> > that the executor_registration_timeout setting in mesos fixed our
> problem.
> >
> > One sidenote though: the task_launch_timeout setting is in
> milli-seconds, so
> > for 5 minutes it's 300000 (vs 300 in seconds).
> > It will save you some hair pulling when seeing your tasks being killed
> > immediately after being launched. ;)
> >
> > Thanks again!
> >
> > Kr,
> > Nils
> >
> > On Thu, Oct 16, 2014 at 4:27 PM, Michael Babineau
> > <mi...@gmail.com> wrote:
> >>
> >> See also https://issues.apache.org/jira/browse/MESOS-1915
> >>
> >> On Thu, Oct 16, 2014 at 2:59 AM, Dick Davies <di...@hellooperator.net>
> >> wrote:
> >>>
> >>> One gotcha - the marathon timeout is in seconds, so pass '300' in your
> >>> case.
> >>>
> >>> let us know if it works, I spotted this the other day and anecdotally
> >>> it addresses
> >>> the issue for some users, be good to get more feedback.
> >>>
> >>> On 16 October 2014 09:49, Grzegorz Graczyk <gr...@gmail.com>
> wrote:
> >>> > Make sure you have --task_launch_timeout in marathon set to same
> value
> >>> > as
> >>> > executor_registration_timeout.
> >>> >
> >>> >
> https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon
> >>> >
> >>> > On 16 October 2014 10:37, Nils De Moor <ni...@gmail.com>
> wrote:
> >>> >>
> >>> >> Hi,
> >>> >>
> >>> >> Environment:
> >>> >> - Clean vagrant install, 1 master, 1 slave (same behaviour on
> >>> >> production
> >>> >> cluster with 3 masters, 6 slaves)
> >>> >> - Mesos 0.20.1
> >>> >> - Marathon 0.7.3
> >>> >> - Docker 1.2.0
> >>> >>
> >>> >> Slave config:
> >>> >> - containerizers: "docker,mesos"
> >>> >> - executor_registration_timeout: 5mins
> >>> >>
> >>> >> When is start docker container tasks, they start being pulled from
> the
> >>> >> HUB, but after 1 minute mesos kills them.
> >>> >> In the background though the pull is still finishing and when
> >>> >> everything
> >>> >> is pulled in the docker container is started, without mesos knowing
> >>> >> about
> >>> >> it.
> >>> >> When I start the same task in mesos again (after I know the pull of
> >>> >> the
> >>> >> image is done), they run normally.
> >>> >>
> >>> >> So this leaves slaves with 'dirty' docker containers, as mesos has
> no
> >>> >> knowledge about them.
> >>> >>
> >>> >> From the logs I get this:
> >>> >> ---
> >>> >> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
> >>> >> 20140904-160348-185204746-5050-27588-0000
> >>> >> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
> >>> >> 20140904-160348-185204746-5050-27588-0000
> >>> >> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
> >>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >>> >> '20140904-160348-185204746-5050-27588-0000
> >>> >> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
> >>> >> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
> >>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
> >>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
> >>> >> '20140904-160348-185204746-5050-27588-0000'
> >>> >>
> >>> >> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >>> >> 20140904-160348-185204746-5050-27588-0000
> >>> >> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
> >>> >> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >>> >> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
> >>> >> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
> >>> >> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of
> framework
> >>> >> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
> >>> >> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update
> resources
> >>> >> for
> >>> >> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for
> >>> >> terminal
> >>> >> task, destroying container: No container found
> >>> >> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received
> >>> >> status
> >>> >> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
> >>> >> task
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >>> >> 20140904-160348-185204746-5050-27588-0000
> >>> >> I1009 15:31:07.035210  1413 status_update_manager.cpp:373]
> Forwarding
> >>> >> status update TASK_KILLED (UUID:
> a8ec88a1-1809-4108-b2ed-056a725ecd41)
> >>> >> for
> >>> >> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >>> >> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
> >>> >> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received
> >>> >> status
> >>> >> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
> >>> >> for task
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >>> >> 20140904-160348-185204746-5050-27588-0000
> >>> >>
> >>> >> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
> >>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >>> >> 20140904-160348-185204746-5050-27588-0000 because it did not
> register
> >>> >> within
> >>> >> 5mins
> >>> >> ---
> >>> >>
> >>> >> I already posted my question on the marathon board, as I first
> thought
> >>> >> it
> >>> >> was an issue on marathon's end:
> >>> >>
> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
> >>> >>
> >>> >>
> >>> >> Kind regards,
> >>> >> Nils
> >>> >>
> >>> >
> >>
> >>
> >
>

Re: Staging docker task KILLED after 1 minute

Posted by Dick Davies <di...@hellooperator.net>.

good catch! Sorry, the docs are right I just had a brain fart :)

On 17 October 2014 13:46, Nils De Moor <ni...@gmail.com> wrote:
> Hi guys,
>
> Thanks for the swift feedback. I can confirm that tweaking the
> task_launch_timeout setting in marathon and setting it to a value bigger
> that the executor_registration_timeout setting in mesos fixed our problem.
>
> One sidenote though: the task_launch_timeout setting is in milli-seconds, so
> for 5 minutes it's 300000 (vs 300 in seconds).
> It will save you some hair pulling when seeing your tasks being killed
> immediately after being launched. ;)
>
> Thanks again!
>
> Kr,
> Nils
>
> On Thu, Oct 16, 2014 at 4:27 PM, Michael Babineau
> <mi...@gmail.com> wrote:
>>
>> See also https://issues.apache.org/jira/browse/MESOS-1915
>>
>> On Thu, Oct 16, 2014 at 2:59 AM, Dick Davies <di...@hellooperator.net>
>> wrote:
>>>
>>> One gotcha - the marathon timeout is in seconds, so pass '300' in your
>>> case.
>>>
>>> let us know if it works, I spotted this the other day and anecdotally
>>> it addresses
>>> the issue for some users, be good to get more feedback.
>>>
>>> On 16 October 2014 09:49, Grzegorz Graczyk <gr...@gmail.com> wrote:
>>> > Make sure you have --task_launch_timeout in marathon set to same value
>>> > as
>>> > executor_registration_timeout.
>>> >
>>> > https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon
>>> >
>>> > On 16 October 2014 10:37, Nils De Moor <ni...@gmail.com> wrote:
>>> >>
>>> >> Hi,
>>> >>
>>> >> Environment:
>>> >> - Clean vagrant install, 1 master, 1 slave (same behaviour on
>>> >> production
>>> >> cluster with 3 masters, 6 slaves)
>>> >> - Mesos 0.20.1
>>> >> - Marathon 0.7.3
>>> >> - Docker 1.2.0
>>> >>
>>> >> Slave config:
>>> >> - containerizers: "docker,mesos"
>>> >> - executor_registration_timeout: 5mins
>>> >>
>>> >> When is start docker container tasks, they start being pulled from the
>>> >> HUB, but after 1 minute mesos kills them.
>>> >> In the background though the pull is still finishing and when
>>> >> everything
>>> >> is pulled in the docker container is started, without mesos knowing
>>> >> about
>>> >> it.
>>> >> When I start the same task in mesos again (after I know the pull of
>>> >> the
>>> >> image is done), they run normally.
>>> >>
>>> >> So this leaves slaves with 'dirty' docker containers, as mesos has no
>>> >> knowledge about them.
>>> >>
>>> >> From the logs I get this:
>>> >> ---
>>> >> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
>>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> '20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
>>> >> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
>>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
>>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
>>> >> '20140904-160348-185204746-5050-27588-0000'
>>> >>
>>> >> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
>>> >> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
>>> >> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
>>> >> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
>>> >> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources
>>> >> for
>>> >> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for
>>> >> terminal
>>> >> task, destroying container: No container found
>>> >> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received
>>> >> status
>>> >> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
>>> >> task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >> I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
>>> >> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
>>> >> for
>>> >> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
>>> >> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received
>>> >> status
>>> >> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
>>> >> for task
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000
>>> >>
>>> >> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
>>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>>> >> 20140904-160348-185204746-5050-27588-0000 because it did not register
>>> >> within
>>> >> 5mins
>>> >> ---
>>> >>
>>> >> I already posted my question on the marathon board, as I first thought
>>> >> it
>>> >> was an issue on marathon's end:
>>> >> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
>>> >>
>>> >>
>>> >> Kind regards,
>>> >> Nils
>>> >>
>>> >
>>
>>
>

Re: Staging docker task KILLED after 1 minute

Posted by Nils De Moor <ni...@gmail.com>.

Hi guys,

Thanks for the swift feedback. I can confirm that tweaking
the task_launch_timeout setting in marathon and setting it to a value
bigger that the executor_registration_timeout setting in mesos fixed our
problem.

One sidenote though: the task_launch_timeout setting is in milli-seconds,
so for 5 minutes it's 300000 (vs 300 in seconds).
It will save you some hair pulling when seeing your tasks being killed
immediately after being launched. ;)

Thanks again!

Kr,
Nils

On Thu, Oct 16, 2014 at 4:27 PM, Michael Babineau <
michael.babineau@gmail.com> wrote:

> See also https://issues.apache.org/jira/browse/MESOS-1915
>
> On Thu, Oct 16, 2014 at 2:59 AM, Dick Davies <di...@hellooperator.net>
> wrote:
>
>> One gotcha - the marathon timeout is in seconds, so pass '300' in your
>> case.
>>
>> let us know if it works, I spotted this the other day and anecdotally
>> it addresses
>> the issue for some users, be good to get more feedback.
>>
>> On 16 October 2014 09:49, Grzegorz Graczyk <gr...@gmail.com> wrote:
>> > Make sure you have --task_launch_timeout in marathon set to same value
>> as
>> > executor_registration_timeout.
>> >
>> https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon
>> >
>> > On 16 October 2014 10:37, Nils De Moor <ni...@gmail.com> wrote:
>> >>
>> >> Hi,
>> >>
>> >> Environment:
>> >> - Clean vagrant install, 1 master, 1 slave (same behaviour on
>> production
>> >> cluster with 3 masters, 6 slaves)
>> >> - Mesos 0.20.1
>> >> - Marathon 0.7.3
>> >> - Docker 1.2.0
>> >>
>> >> Slave config:
>> >> - containerizers: "docker,mesos"
>> >> - executor_registration_timeout: 5mins
>> >>
>> >> When is start docker container tasks, they start being pulled from the
>> >> HUB, but after 1 minute mesos kills them.
>> >> In the background though the pull is still finishing and when
>> everything
>> >> is pulled in the docker container is started, without mesos knowing
>> about
>> >> it.
>> >> When I start the same task in mesos again (after I know the pull of the
>> >> image is done), they run normally.
>> >>
>> >> So this leaves slaves with 'dirty' docker containers, as mesos has no
>> >> knowledge about them.
>> >>
>> >> From the logs I get this:
>> >> ---
>> >> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>> >> 20140904-160348-185204746-5050-27588-0000
>> >> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>> >> 20140904-160348-185204746-5050-27588-0000
>> >> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> >> '20140904-160348-185204746-5050-27588-0000
>> >> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
>> >> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
>> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
>> >> '20140904-160348-185204746-5050-27588-0000'
>> >>
>> >> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> >> 20140904-160348-185204746-5050-27588-0000
>> >> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
>> >> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> >> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
>> >> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
>> >> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
>> >> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
>> >> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources
>> for
>> >> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for
>> terminal
>> >> task, destroying container: No container found
>> >> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received
>> status
>> >> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
>> task
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> >> 20140904-160348-185204746-5050-27588-0000
>> >> I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
>> >> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
>> for
>> >> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> >> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
>> >> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received
>> status
>> >> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
>> for task
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> >> 20140904-160348-185204746-5050-27588-0000
>> >>
>> >> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
>> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> >> 20140904-160348-185204746-5050-27588-0000 because it did not register
>> within
>> >> 5mins
>> >> ---
>> >>
>> >> I already posted my question on the marathon board, as I first thought
>> it
>> >> was an issue on marathon's end:
>> >> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
>> >>
>> >>
>> >> Kind regards,
>> >> Nils
>> >>
>> >
>>
>
>

Re: Staging docker task KILLED after 1 minute

Posted by Michael Babineau <mi...@gmail.com>.

See also https://issues.apache.org/jira/browse/MESOS-1915

On Thu, Oct 16, 2014 at 2:59 AM, Dick Davies <di...@hellooperator.net> wrote:

> One gotcha - the marathon timeout is in seconds, so pass '300' in your
> case.
>
> let us know if it works, I spotted this the other day and anecdotally
> it addresses
> the issue for some users, be good to get more feedback.
>
> On 16 October 2014 09:49, Grzegorz Graczyk <gr...@gmail.com> wrote:
> > Make sure you have --task_launch_timeout in marathon set to same value as
> > executor_registration_timeout.
> >
> https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon
> >
> > On 16 October 2014 10:37, Nils De Moor <ni...@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> Environment:
> >> - Clean vagrant install, 1 master, 1 slave (same behaviour on production
> >> cluster with 3 masters, 6 slaves)
> >> - Mesos 0.20.1
> >> - Marathon 0.7.3
> >> - Docker 1.2.0
> >>
> >> Slave config:
> >> - containerizers: "docker,mesos"
> >> - executor_registration_timeout: 5mins
> >>
> >> When is start docker container tasks, they start being pulled from the
> >> HUB, but after 1 minute mesos kills them.
> >> In the background though the pull is still finishing and when everything
> >> is pulled in the docker container is started, without mesos knowing
> about
> >> it.
> >> When I start the same task in mesos again (after I know the pull of the
> >> image is done), they run normally.
> >>
> >> So this leaves slaves with 'dirty' docker containers, as mesos has no
> >> knowledge about them.
> >>
> >> From the logs I get this:
> >> ---
> >> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
> >> 20140904-160348-185204746-5050-27588-0000
> >> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
> >> 20140904-160348-185204746-5050-27588-0000
> >> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >> '20140904-160348-185204746-5050-27588-0000
> >> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
> >> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
> >> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
> >> '20140904-160348-185204746-5050-27588-0000'
> >>
> >> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >> 20140904-160348-185204746-5050-27588-0000
> >> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
> >> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
> >> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
> >> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
> >> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
> >> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources
> for
> >> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for
> terminal
> >> task, destroying container: No container found
> >> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received
> status
> >> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >> 20140904-160348-185204746-5050-27588-0000
> >> I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
> >> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41)
> for
> >> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
> >> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received
> status
> >> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
> task
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >> 20140904-160348-185204746-5050-27588-0000
> >>
> >> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
> >> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> >> 20140904-160348-185204746-5050-27588-0000 because it did not register
> within
> >> 5mins
> >> ---
> >>
> >> I already posted my question on the marathon board, as I first thought
> it
> >> was an issue on marathon's end:
> >> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
> >>
> >>
> >> Kind regards,
> >> Nils
> >>
> >
>

Re: Staging docker task KILLED after 1 minute

Posted by Dick Davies <di...@hellooperator.net>.

One gotcha - the marathon timeout is in seconds, so pass '300' in your case.

let us know if it works, I spotted this the other day and anecdotally
it addresses
the issue for some users, be good to get more feedback.

On 16 October 2014 09:49, Grzegorz Graczyk <gr...@gmail.com> wrote:
> Make sure you have --task_launch_timeout in marathon set to same value as
> executor_registration_timeout.
> https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon
>
> On 16 October 2014 10:37, Nils De Moor <ni...@gmail.com> wrote:
>>
>> Hi,
>>
>> Environment:
>> - Clean vagrant install, 1 master, 1 slave (same behaviour on production
>> cluster with 3 masters, 6 slaves)
>> - Mesos 0.20.1
>> - Marathon 0.7.3
>> - Docker 1.2.0
>>
>> Slave config:
>> - containerizers: "docker,mesos"
>> - executor_registration_timeout: 5mins
>>
>> When is start docker container tasks, they start being pulled from the
>> HUB, but after 1 minute mesos kills them.
>> In the background though the pull is still finishing and when everything
>> is pulled in the docker container is started, without mesos knowing about
>> it.
>> When I start the same task in mesos again (after I know the pull of the
>> image is done), they run normally.
>>
>> So this leaves slaves with 'dirty' docker containers, as mesos has no
>> knowledge about them.
>>
>> From the logs I get this:
>> ---
>> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
>> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> '20140904-160348-185204746-5050-27588-0000
>> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
>> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
>> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
>> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
>> '20140904-160348-185204746-5050-27588-0000'
>>
>> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
>> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
>> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
>> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
>> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
>> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources for
>> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
>> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for terminal
>> task, destroying container: No container found
>> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received status
>> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000
>> I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
>> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
>> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
>> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received status
>> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000
>>
>> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
>> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
>> 20140904-160348-185204746-5050-27588-0000 because it did not register within
>> 5mins
>> ---
>>
>> I already posted my question on the marathon board, as I first thought it
>> was an issue on marathon's end:
>> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
>>
>>
>> Kind regards,
>> Nils
>>
>

Re: Staging docker task KILLED after 1 minute

Posted by Grzegorz Graczyk <gr...@gmail.com>.

Make sure you have --task_launch_timeout in marathon set to same value as
executor_registration_timeout.
https://github.com/mesosphere/marathon/blob/master/docs/docs/native-docker.md#configure-marathon

On 16 October 2014 10:37, Nils De Moor <ni...@gmail.com> wrote:

> Hi,
>
> Environment:
> - Clean vagrant install, 1 master, 1 slave (same behaviour on production
> cluster with 3 masters, 6 slaves)
> - Mesos 0.20.1
> - Marathon 0.7.3
> - Docker 1.2.0
>
> Slave config:
> - containerizers: "docker,mesos"
> - executor_registration_timeout: 5mins
>
> When is start docker container tasks, they start being pulled from the
> HUB, but after 1 minute mesos kills them.
> In the background though the pull is still finishing and when everything
> is pulled in the docker container is started, without mesos knowing about
> it.
> When I start the same task in mesos again (after I know the pull of the
> image is done), they run normally.
>
> So this leaves slaves with 'dirty' docker containers, as mesos has no
> knowledge about them.
>
> From the logs I get this:
> ---
> I1009 15:30:02.990291  1414 slave.cpp:1002] Got assigned task
> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
> 20140904-160348-185204746-5050-27588-0000
> I1009 15:30:02.990979  1414 slave.cpp:1112] Launching task
> test-app.23755452-4fc9-11e4-839b-080027c4337a for framework
> 20140904-160348-185204746-5050-27588-0000
> I1009 15:30:02.993341  1414 slave.cpp:1222] Queuing task
> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' for executor
> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> '20140904-160348-185204746-5050-27588-0000
> I1009 15:30:02.995818  1409 docker.cpp:743] Starting container
> '25ac3310-71e4-4d10-8a4b-38add4537308' for task
> 'test-app.23755452-4fc9-11e4-839b-080027c4337a' (and executor
> 'test-app.23755452-4fc9-11e4-839b-080027c4337a') of framework
> '20140904-160348-185204746-5050-27588-0000'
>
> I1009 15:31:07.033287  1413 slave.cpp:1278] Asked to kill task
> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> 20140904-160348-185204746-5050-27588-0000
> I1009 15:31:07.034742  1413 slave.cpp:2088] Handling status update
> TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> 20140904-160348-185204746-5050-27588-0000 from @0.0.0.0:0
> W1009 15:31:07.034881  1413 slave.cpp:1354] Killing the unregistered
> executor 'test-app.23755452-4fc9-11e4-839b-080027c4337a' of framework
> 20140904-160348-185204746-5050-27588-0000 because it has no tasks
> E1009 15:31:07.034945  1413 slave.cpp:2205] Failed to update resources for
> container 25ac3310-71e4-4d10-8a4b-38add4537308 of executor
> test-app.23755452-4fc9-11e4-839b-080027c4337a running task
> test-app.23755452-4fc9-11e4-839b-080027c4337a on status update for terminal
> task, destroying container: No container found
> I1009 15:31:07.035133  1413 status_update_manager.cpp:320] Received status
> update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for task
> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> 20140904-160348-185204746-5050-27588-0000
> I1009 15:31:07.035210  1413 status_update_manager.cpp:373] Forwarding
> status update TASK_KILLED (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> 20140904-160348-185204746-5050-27588-0000 to master@10.0.10.11:5050
> I1009 15:31:07.046167  1408 status_update_manager.cpp:398] Received status
> update acknowledgement (UUID: a8ec88a1-1809-4108-b2ed-056a725ecd41) for
> task test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> 20140904-160348-185204746-5050-27588-0000
>
> I1009 15:35:02.993736  1414 slave.cpp:3010] Terminating executor
> test-app.23755452-4fc9-11e4-839b-080027c4337a of framework
> 20140904-160348-185204746-5050-27588-0000 because it did not register
> within 5mins
> ---
>
> I already posted my question on the marathon board, as I first thought it
> was an issue on marathon's end:
> https://groups.google.com/forum/#!topic/marathon-framework/NT7_YIZnNoY
>
>
> Kind regards,
> Nils
>
>