You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Brenden Matthews <br...@airbedandbreakfast.com> on 2013/07/05 23:38:05 UTC

Tasks stuck in 'STAGING'

Hey guys,

I'm currently having a problem where tasks will get stuck in the staging
state, though according to the logs they should have been terminated.  They
hang indefinitely, or until I restart the slave.  Below is a screenshot +
logs.  Also interesting is the 'Failed to collect resource usage ...'
messages.

[image: Inline image 2]

I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> n for framework chronos
> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> or framework chronos
> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor directory
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
>
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> f2-4d1ce60d618f'
> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> or executor
> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> framework 'c
> hronos
> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached file
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%. Max
> allowed age: 22.955009563956388hrs
> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> framework chronos because it did not register within 1mins
> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%. Max
> allowed age: 22.955009342481944hrs
> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded
> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_8023' of framework
> '201307030043-2037266954-5050-15277-0006': Future discarded

Re: Tasks stuck in 'STAGING'

Posted by Brenden Matthews <br...@airbedandbreakfast.com>.
This might have been a misconfiguration.  I'll report back if I see it
again.


On Mon, Jul 8, 2013 at 1:55 PM, Benjamin Mahler
<be...@gmail.com>wrote:

> Are these the un-edited logs? I'm expecting to see some logs from the
> process_isolator or cgroups_isolator in there.
>
>
> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> brenden.matthews@airbedandbreakfast.com> wrote:
>
> > Hey guys,
> >
> > I'm currently having a problem where tasks will get stuck in the staging
> > state, though according to the logs they should have been terminated.
>  They
> > hang indefinitely, or until I restart the slave.  Below is a screenshot +
> > logs.  Also interesting is the 'Failed to collect resource usage ...'
> > messages.
> >
> > [image: Inline image 2]
> >
> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> >> n for framework chronos
> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> >> or framework chronos
> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor directory
> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> >>
> >>
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> >> f2-4d1ce60d618f'
> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> >> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> >> or executor
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> framework 'c
> >> hronos
> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached file
> >>
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%. Max
> >> allowed age: 22.955009563956388hrs
> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> framework chronos because it did not register within 1mins
> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%. Max
> >> allowed age: 22.955009342481944hrs
> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >
> >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by 王国栋 <wa...@gmail.com>.
Task_Tracker_242 is not running now. I restarted the slave to recover the
node. But as i saw from the work dir of Task_Tracker_242, the directory is
made successfully. But there was no log in the working dir. The executor
failed to start at the beginning.

Guodong


On Wed, Jul 10, 2013 at 1:51 PM, Vinod Kone <vi...@gmail.com> wrote:

> when the executor is unhealthy, the slave sends a kill command to the
> executor. only when the executor terminates, does the slave transitions all
> its tasks to LOST and inform the master and hence the scheduler. from your
> observation, sounds like the kill signal was either being trapped by the
> executor or was not being properly sent to the executor.  is the
> task/executor Task_Tracker_242  still running on the slave? you can check
> this with 'ps'. what does the executor logs say. did it receive any signal?
>
>
> On Tue, Jul 9, 2013 at 10:37 PM, 王国栋 <wa...@gmail.com> wrote:
>
> > No log about killing Task_Tracker_242. But I can see Task_Tracker_242 is
> in
> > STAGING from master/slave web UI. And it is stuck. The hadoop framework
> > consider that Task_Tracker_242 is launched, though it is not running.
> >
> > I think  if slave deems Task_Tracker_242 unhealthy, it should report to
> > master that this task is Lost. But I am not sure why it can not report
> > this. I am trying to go through the code.
> >
> > Guodong
> >
> >
> > On Wed, Jul 10, 2013 at 2:51 AM, Vinod Kone <vi...@gmail.com> wrote:
> >
> > > Hey Guodong,
> > >
> > > So, looks like Task_Tracker_242 did not register with the slave within
> 1
> > > minute and the slave decided to kill it because it was deemed
> unhealthy.
> > At
> > > this point the executor should've received a kill signal from the
> slave.
> > Do
> > > you see anything of that sort in the slave or executor logs?
> > >
> > >
> > > On Mon, Jul 8, 2013 at 11:30 PM, 王国栋 <wa...@gmail.com> wrote:
> > >
> > > > Hi vinod.
> > > >
> > > > I am using the code from the trunk. I think the latest commit is at
> Jul
> > > > 1st. I will grep some master log in another mail.
> > > >
> > > > The Task "Task_Tracker_242" is stuck in STAGING. I think
> > > "Task_Tracker_224"
> > > > and "Task_Tracker_230" exit sucessfully. But it is strange that there
> > > are a
> > > > lot of "Fail to collect resource..." warnings.
> > > >
> > > > I0709 00:46:11.288698 11002 slave.cpp:739] Got assigned task
> > > > Task_Tracker_242 for framework 201307040929-252063498-5050-27411-0000
> > > > I0709 00:46:11.289136 11002 slave.cpp:837] Launching task
> > > Task_Tracker_242
> > > > for framework 201307040929-252063498-5050-27411-0000
> > > > I0709 00:46:11.291296 11002 paths.hpp:303] Created executor directory
> > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> > > > I0709 00:46:11.291647 11002 slave.cpp:948] Queuing task
> > > 'Task_Tracker_242'
> > > > for executor executor_Task_Tracker_242 of framework
> > > > '201307040929-252063498-5050-27411-0000
> > > > I0709 00:46:11.292162 11002 slave.cpp:511] Successfully attached file
> > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> > > > W0709 00:46:12.197242 10992 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:16.100548 10994 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:17.197463 11001 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:21.101570 11002 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:22.198303 11005 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:26.102522 11002 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:27.199403 10998 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:31.103610 10998 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:32.200248 11001 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:36.104547 11004 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:37.201236 10991 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:41.105523 10997 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:42.202250 10991 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > I0709 00:46:45.283098 11002 slave.cpp:2511] Current usage 57.43%. Max
> > > > allowed age: 2.279812884766227days
> > > > W0709 00:46:46.106760 10994 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:47.203474 10993 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:51.107544 11006 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:52.204280 10997 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:56.108530 10995 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:46:57.205417 10997 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:47:01.109284 10997 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:47:02.206368 11002 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > I0709 00:47:05.288517 11002 slave.cpp:2463] Terminating executor
> > > > executor_Task_Tracker_238 of framework
> > > > 201307040929-252063498-5050-27411-0000 because it did not register
> > within
> > > > 1mins
> > > > W0709 00:47:06.110532 11005 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:47:07.207320 10997 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:47:11.111778 10996 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > I0709 00:47:11.292485 10991 slave.cpp:2463] Terminating executor
> > > > executor_Task_Tracker_242 of framework
> > > > 201307040929-252063498-5050-27411-0000 because it did not register
> > within
> > > > 1mins
> > > >
> > > >
> > > > Guodong
> > > >
> > > >
> > > > On Tue, Jul 9, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com>
> > wrote:
> > > >
> > > > > hey guodong, which of these task(s) is stuck in STAGING? also, the
> > > > > corresponding master's logs would also be helpful here. also which
> > > > version
> > > > > of mesos are you running?
> > > > >
> > > > >
> > > > > On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wa...@gmail.com> wrote:
> > > > >
> > > > > > It is very interesting that there are these logs.
> > > > > >
> > > > > > I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
> > > > > > Task_Tracker_224 of framework
> > 201307040929-252063498-5050-27411-0000
> > > > > > I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
> > > > > > Task_Tracker_230 of framework
> > 201307040929-252063498-5050-27411-0000
> > > > > > I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status
> update
> > > > > > TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for
> task
> > > > > > Task_Tracker_230 of framework
> > 201307040929-252063498-5050-27411-0000
> > > > > > from executor(1)@10.47.6.21:27786
> > > > > > I0709 00:33:43.973132 10994 status_update_manager.cpp:290]
> Received
> > > > > status
> > > > > > update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> > for
> > > > > task
> > > > > > Task_Tracker_230 of framework 201307040929-252063498-5
> > > > > > 050-27411-0000 with checkpoint=false
> > > > > > I0709 00:33:43.973192 10994 status_update_manager.cpp:336]
> > Forwarding
> > > > > > status update TASK_FINISHED (UUID:
> > > > 372081cc-edf2-4183-a461-9345ab6d279c)
> > > > > > for task Task_Tracker_230 of framework 201307040929-252063498
> > > > > > -5050-27411-0000 to master@10.47.6.15:5050
> > > > > > I0709 00:33:43.973846 11005 slave.cpp:1809] Sending
> acknowledgement
> > > for
> > > > > > status update TASK_FINISHED (UUID:
> > > > 372081cc-edf2-4183-a461-9345ab6d279c)
> > > > > > for task Task_Tracker_230 of framework 201307040929-2520634
> > > > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > > > > I0709 00:33:43.974591 11000 status_update_manager.cpp:360]
> Received
> > > > > status
> > > > > > update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for
> > task
> > > > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > > > > 411-0000
> > > > > > I0709 00:33:43.974652 11000 status_update_manager.cpp:481]
> Cleaning
> > > up
> > > > > > status update stream for task Task_Tracker_230 of framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status
> update
> > > > > > TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for
> task
> > > > > > Task_Tracker_224 of framework
> > 201307040929-252063498-5050-27411-0000
> > > > > > from executor(1)@10.47.6.21:2310
> > > > > > I0709 00:33:44.090860 11003 status_update_manager.cpp:290]
> Received
> > > > > status
> > > > > > update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> > for
> > > > > task
> > > > > > Task_Tracker_224 of framework 201307040929-252063498-5
> > > > > > 050-27411-0000 with checkpoint=false
> > > > > > I0709 00:33:44.090973 11003 status_update_manager.cpp:336]
> > Forwarding
> > > > > > status update TASK_FINISHED (UUID:
> > > > 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > > > > for task Task_Tracker_224 of framework 201307040929-252063498
> > > > > > -5050-27411-0000 to master@10.47.6.15:5050
> > > > > > I0709 00:33:44.091279 11003 slave.cpp:1809] Sending
> acknowledgement
> > > for
> > > > > > status update TASK_FINISHED (UUID:
> > > > 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > > > > for task Task_Tracker_224 of framework 201307040929-2520634
> > > > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > > > > I0709 00:33:44.093286 11003 status_update_manager.cpp:360]
> Received
> > > > > status
> > > > > > update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for
> > task
> > > > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > > > > 411-0000
> > > > > > I0709 00:33:44.093359 11003 status_update_manager.cpp:481]
> Cleaning
> > > up
> > > > > > status update stream for task Task_Tracker_224 of framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%.
> > Max
> > > > > > allowed age: 2.279168852469954days
> > > > > > W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > >
> > > > > >
> > > > > >
> > > > > > Guodong
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:
> > > > > >
> > > > > > > Hi ben,
> > > > > > >
> > > > > > > I ran into the same issue here.
> > > > > > >
> > > > > > > This also happens in our hadoop framework. The slave log is
> like
> > > > these.
> > > > > > At
> > > > > > > that time, I think the work load of the node is very high.
> > > > > > >
> > > > > > > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> > > > > > > Task_Tracker_224 for framework
> > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> > > > > > > for removal
> > > > > > > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task
> > > > > > Task_Tracker_224
> > > > > > > for framework 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor
> > > directory
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > >
> cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > > > > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> > > > > > > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor)
> in
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> > > > > > > with resources cpus=1; mem=1280' for framework
> > > > > > > 201307040929-252063498-5050-27411-0
> > > > > > > 000
> > > > > > > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task
> > > > > > 'Task_Tracker_224'
> > > > > > > for executor executor_Task_Tracker_224 of framework
> > > > > > > '201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked
> > > executor
> > > > > at
> > > > > > > 2220
> > > > > > > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully
> attached
> > > file
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > >
> cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > > > > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage
> 57.21%.
> > > Max
> > > > > > > allowed age: 2.295155852123924days
> > > > > > > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration
> for
> > > > > executor
> > > > > > > 'executor_Task_Tracker_224' of framework
> > > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued
> task
> > > > > > > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of
> > > > framework
> > > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status
> > update
> > > > > > > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
> > task
> > > > > > > Task_Tracker_224 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > > f
> > > > > > > rom executor(1)@10.47.6.21:2310
> > > > > > > I0708 23:36:46.144745 11006 status_update_manager.cpp:290]
> > Received
> > > > > > status
> > > > > > > update TASK_RUNNING (UUID:
> 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > for
> > > > > task
> > > > > > > Task_Tracker_224 of framework 201307040929-252063498-50
> > > > > > > 50-27411-0000 with checkpoint=false
> > > > > > > I0708 23:36:46.144821 11006 status_update_manager.cpp:450]
> > Creating
> > > > > > > StatusUpdate stream for task Task_Tracker_224 of framework
> > > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:46.145076 11006 status_update_manager.cpp:336]
> > > Forwarding
> > > > > > > status update TASK_RUNNING (UUID:
> > > > 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > > > > for
> > > > > > > task Task_Tracker_224 of framework 201307040929-252063498-
> > > > > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > > > > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending
> > acknowledgement
> > > > for
> > > > > > > status update TASK_RUNNING (UUID:
> > > > 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > > > > for
> > > > > > > task Task_Tracker_224 of framework 201307040929-25206349
> > > > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > > > > > I0708 23:36:46.146870 10993 status_update_manager.cpp:360]
> > Received
> > > > > > status
> > > > > > > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for
> > > task
> > > > > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > > > > > 411-0000
> > > > > > > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> > > > > > > Task_Tracker_230 for framework
> > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task
> > > > > > Task_Tracker_230
> > > > > > > for framework 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor
> > > directory
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > >
> cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > > > > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task
> > > > > > 'Task_Tracker_230'
> > > > > > > for executor executor_Task_Tracker_230 of framework
> > > > > > > '201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> > > > > > > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor)
> in
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> > > > > > > with resources cpus=1; mem=1280' for framework
> > > > > > > 201307040929-252063498-5050-27411-0
> > > > > > > 000
> > > > > > > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully
> attached
> > > file
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > >
> cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > > > > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked
> > > executor
> > > > > at
> > > > > > > 2851
> > > > > > > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration
> for
> > > > > executor
> > > > > > > 'executor_Task_Tracker_230' of framework
> > > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued
> task
> > > > > > > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of
> > > > framework
> > > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status
> > update
> > > > > > > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
> > task
> > > > > > > Task_Tracker_230 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > > f
> > > > > > > rom executor(1)@10.47.6.21:27786
> > > > > > > I0708 23:36:54.618275 10994 status_update_manager.cpp:290]
> > Received
> > > > > > status
> > > > > > > update TASK_RUNNING (UUID:
> 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > for
> > > > > task
> > > > > > > Task_Tracker_230 of framework 201307040929-252063498-50
> > > > > > > 50-27411-0000 with checkpoint=false
> > > > > > > I0708 23:36:54.618326 10994 status_update_manager.cpp:450]
> > Creating
> > > > > > > StatusUpdate stream for task Task_Tracker_230 of framework
> > > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > > I0708 23:36:54.618443 10994 status_update_manager.cpp:336]
> > > Forwarding
> > > > > > > status update TASK_RUNNING (UUID:
> > > > 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > > > > for
> > > > > > > task Task_Tracker_230 of framework 201307040929-252063498-
> > > > > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > > > > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending
> > acknowledgement
> > > > for
> > > > > > > status update TASK_RUNNING (UUID:
> > > > 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > > > > for
> > > > > > > task Task_Tracker_230 of framework 201307040929-25206349
> > > > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > > > > > I0708 23:36:54.637682 10994 status_update_manager.cpp:360]
> > Received
> > > > > > status
> > > > > > > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for
> > > task
> > > > > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > > > > > 411-0000
> > > > > > > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage
> 57.23%.
> > > Max
> > > > > > > allowed age: 2.293704423241597days
> > > > > > > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage
> 57.23%.
> > > Max
> > > > > > > allowed age: 2.293703916528542days
> > > > > > > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage
> 57.23%.
> > > Max
> > > > > > > allowed age: 2.293639867998055days
> > > > > > > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage
> 57.24%.
> > > Max
> > > > > > > allowed age: 2.292921551567535days
> > > > > > > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage
> 57.26%.
> > > Max
> > > > > > > allowed age: 2.291521098018820days
> > > > > > > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage
> 57.23%.
> > > Max
> > > > > > > allowed age: 2.293668041244063days
> > > > > > > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage
> 57.24%.
> > > Max
> > > > > > > allowed age: 2.292935638190544days
> > > > > > > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage
> 57.24%.
> > > Max
> > > > > > > allowed age: 2.292916079066516days
> > > > > > > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage
> 57.26%.
> > > Max
> > > > > > > allowed age: 2.291485324076945days
> > > > > > > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage
> 57.23%.
> > > Max
> > > > > > > allowed age: 2.293641894850289days
> > > > > > > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage
> 57.23%.
> > > Max
> > > > > > > allowed age: 2.293629429709074days
> > > > > > > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage
> 57.24%.
> > > Max
> > > > > > > allowed age: 2.293525350847025days
> > > > > > > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage
> 57.24%.
> > > Max
> > > > > > > allowed age: 2.292909289111539days
> > > > > > > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage
> 57.27%.
> > > Max
> > > > > > > allowed age: 2.291438098419977days
> > > > > > > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage
> 57.23%.
> > > Max
> > > > > > > allowed age: 2.293635104895313days
> > > > > > > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage
> 57.24%.
> > > Max
> > > > > > > allowed age: 2.292983775931019days
> > > > > > > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage
> 57.33%.
> > > Max
> > > > > > > allowed age: 2.286910009194236days
> > > > > > > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage
> 57.33%.
> > > Max
> > > > > > > allowed age: 2.286909502481169days
> > > > > > > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage
> 57.37%.
> > > Max
> > > > > > > allowed age: 2.284414244700093days
> > > > > > > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage
> 57.42%.
> > > Max
> > > > > > > allowed age: 2.280636901540567days
> > > > > > > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage
> 57.44%.
> > > Max
> > > > > > > allowed age: 2.279481899796968days
> > > > > > > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage
> 57.48%.
> > > Max
> > > > > > > allowed age: 2.276566475548496days
> > > > > > > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage
> 57.49%.
> > > Max
> > > > > > > allowed age: 2.275690368671817days
> > > > > > > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage
> 57.50%.
> > > Max
> > > > > > > allowed age: 2.275057180034989days
> > > > > > > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage
> 57.51%.
> > > Max
> > > > > > > allowed age: 2.273999467198449days
> > > > > > > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage
> 57.52%.
> > > Max
> > > > > > > allowed age: 2.273472384275891days
> > > > > > > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage
> 57.39%.
> > > Max
> > > > > > > allowed age: 2.282894612240220days
> > > > > > > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage
> 57.40%.
> > > Max
> > > > > > > allowed age: 2.281966516603831days
> > > > > > > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage
> 57.40%.
> > > Max
> > > > > > > allowed age: 2.281962260214144days
> > > > > > > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage
> 57.40%.
> > > Max
> > > > > > > allowed age: 2.281791801941551days
> > > > > > > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage
> 57.40%.
> > > Max
> > > > > > > allowed age: 2.281715288269849days
> > > > > > > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage
> 57.40%.
> > > Max
> > > > > > > allowed age: 2.281699782850289days
> > > > > > > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage
> 57.42%.
> > > Max
> > > > > > > allowed age: 2.280776044946192days
> > > > > > > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage
> 57.42%.
> > > Max
> > > > > > > allowed age: 2.280772193926956days
> > > > > > > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage
> 57.44%.
> > > Max
> > > > > > > allowed age: 2.279204525069213days
> > > > > > > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage
> 57.47%.
> > > Max
> > > > > > > allowed age: 2.277132676719109days
> > > > > > > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage
> 57.43%.
> > > Max
> > > > > > > allowed age: 2.280012428368322days
> > > > > > > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage
> 57.48%.
> > > Max
> > > > > > > allowed age: 2.276733690857512days
> > > > > > > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage
> 57.53%.
> > > Max
> > > > > > > allowed age: 2.272715152282546days
> > > > > > > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage
> 57.57%.
> > > Max
> > > > > > > allowed age: 2.270354274804352days
> > > > > > > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage
> 57.62%.
> > > Max
> > > > > > > allowed age: 2.266927678423322days
> > > > > > > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage
> 57.65%.
> > > Max
> > > > > > > allowed age: 2.264218182361482days
> > > > > > > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage
> 57.69%.
> > > Max
> > > > > > > allowed age: 2.261509598383137days
> > > > > > > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage
> 57.72%.
> > > Max
> > > > > > > allowed age: 2.259379478031400days
> > > > > > > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage
> 57.77%.
> > > Max
> > > > > > > allowed age: 2.255819920144039days
> > > > > > > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage
> 57.81%.
> > > Max
> > > > > > > allowed age: 2.253314528101817days
> > > > > > > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage
> 57.85%.
> > > Max
> > > > > > > allowed age: 2.250524870034248days
> > > > > > > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage
> 57.90%.
> > > Max
> > > > > > > allowed age: 2.246784618270532days
> > > > > > > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage
> 57.97%.
> > > Max
> > > > > > > allowed age: 2.242399422127049days
> > > > > > > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage
> 58.00%.
> > > Max
> > > > > > > allowed age: 2.240250654734792days
> > > > > > > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage
> 57.99%.
> > > Max
> > > > > > > allowed age: 2.240516983117894days
> > > > > > > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage
> 58.06%.
> > > Max
> > > > > > > allowed age: 2.235834143724352days
> > > > > > > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage
> 58.10%.
> > > Max
> > > > > > > allowed age: 2.233297436815162days
> > > > > > > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect
> > > > resource
> > > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage
> 58.11%.
> > > Max
> > > > > > > allowed age: 2.232469873049410days
> > > > > > >
> > > > > > >
> > > > > > > Guodong
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <
> > > > > > benjamin.mahler@gmail.com
> > > > > > > > wrote:
> > > > > > >
> > > > > > >> Are these the un-edited logs? I'm expecting to see some logs
> > from
> > > > the
> > > > > > >> process_isolator or cgroups_isolator in there.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> > > > > > >> brenden.matthews@airbedandbreakfast.com> wrote:
> > > > > > >>
> > > > > > >> > Hey guys,
> > > > > > >> >
> > > > > > >> > I'm currently having a problem where tasks will get stuck in
> > the
> > > > > > staging
> > > > > > >> > state, though according to the logs they should have been
> > > > > terminated.
> > > > > > >>  They
> > > > > > >> > hang indefinitely, or until I restart the slave.  Below is a
> > > > > > screenshot
> > > > > > >> +
> > > > > > >> > logs.  Also interesting is the 'Failed to collect resource
> > usage
> > > > > ...'
> > > > > > >> > messages.
> > > > > > >> >
> > > > > > >> > [image: Inline image 2]
> > > > > > >> >
> > > > > > >> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> > > > > > >> >>
> > > ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> > > > > > >> >> n for framework chronos
> > > > > > >> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> > > > > > >> >>
> > > > ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> > > > > > >> >> or framework chronos
> > > > > > >> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor
> > > > > directory
> > > > > > >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> > > > > > >> >>
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> > > > > > >> >> f2-4d1ce60d618f'
> > > > > > >> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> > > > > > >> >>
> > > > >
> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> > > > > > >> >> or executor
> > > > > > >> >>
> > > ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > > > > of
> > > > > > >> >> framework 'c
> > > > > > >> >> hronos
> > > > > > >> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully
> > > attached
> > > > > file
> > > > > > >> >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> > > > > > >> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage
> > > 42.18%.
> > > > > Max
> > > > > > >> >> allowed age: 22.955009563956388hrs
> > > > > > >> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to
> > collect
> > > > > > >> resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating
> > > executor
> > > > > > >> >>
> > > ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > > > > of
> > > > > > >> >> framework chronos because it did not register within 1mins
> > > > > > >> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage
> > > 42.18%.
> > > > > Max
> > > > > > >> >> allowed age: 22.955009342481944hrs
> > > > > > >> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to
> > collect
> > > > > > resource
> > > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of
> framework
> > > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by Vinod Kone <vi...@gmail.com>.
when the executor is unhealthy, the slave sends a kill command to the
executor. only when the executor terminates, does the slave transitions all
its tasks to LOST and inform the master and hence the scheduler. from your
observation, sounds like the kill signal was either being trapped by the
executor or was not being properly sent to the executor.  is the
task/executor Task_Tracker_242  still running on the slave? you can check
this with 'ps'. what does the executor logs say. did it receive any signal?


On Tue, Jul 9, 2013 at 10:37 PM, 王国栋 <wa...@gmail.com> wrote:

> No log about killing Task_Tracker_242. But I can see Task_Tracker_242 is in
> STAGING from master/slave web UI. And it is stuck. The hadoop framework
> consider that Task_Tracker_242 is launched, though it is not running.
>
> I think  if slave deems Task_Tracker_242 unhealthy, it should report to
> master that this task is Lost. But I am not sure why it can not report
> this. I am trying to go through the code.
>
> Guodong
>
>
> On Wed, Jul 10, 2013 at 2:51 AM, Vinod Kone <vi...@gmail.com> wrote:
>
> > Hey Guodong,
> >
> > So, looks like Task_Tracker_242 did not register with the slave within 1
> > minute and the slave decided to kill it because it was deemed unhealthy.
> At
> > this point the executor should've received a kill signal from the slave.
> Do
> > you see anything of that sort in the slave or executor logs?
> >
> >
> > On Mon, Jul 8, 2013 at 11:30 PM, 王国栋 <wa...@gmail.com> wrote:
> >
> > > Hi vinod.
> > >
> > > I am using the code from the trunk. I think the latest commit is at Jul
> > > 1st. I will grep some master log in another mail.
> > >
> > > The Task "Task_Tracker_242" is stuck in STAGING. I think
> > "Task_Tracker_224"
> > > and "Task_Tracker_230" exit sucessfully. But it is strange that there
> > are a
> > > lot of "Fail to collect resource..." warnings.
> > >
> > > I0709 00:46:11.288698 11002 slave.cpp:739] Got assigned task
> > > Task_Tracker_242 for framework 201307040929-252063498-5050-27411-0000
> > > I0709 00:46:11.289136 11002 slave.cpp:837] Launching task
> > Task_Tracker_242
> > > for framework 201307040929-252063498-5050-27411-0000
> > > I0709 00:46:11.291296 11002 paths.hpp:303] Created executor directory
> > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> > > I0709 00:46:11.291647 11002 slave.cpp:948] Queuing task
> > 'Task_Tracker_242'
> > > for executor executor_Task_Tracker_242 of framework
> > > '201307040929-252063498-5050-27411-0000
> > > I0709 00:46:11.292162 11002 slave.cpp:511] Successfully attached file
> > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> > > W0709 00:46:12.197242 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:16.100548 10994 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:17.197463 11001 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:21.101570 11002 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:22.198303 11005 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:26.102522 11002 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:27.199403 10998 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:31.103610 10998 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:32.200248 11001 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:36.104547 11004 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:37.201236 10991 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:41.105523 10997 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:42.202250 10991 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > I0709 00:46:45.283098 11002 slave.cpp:2511] Current usage 57.43%. Max
> > > allowed age: 2.279812884766227days
> > > W0709 00:46:46.106760 10994 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:47.203474 10993 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:51.107544 11006 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:52.204280 10997 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:56.108530 10995 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:46:57.205417 10997 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:47:01.109284 10997 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:47:02.206368 11002 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > I0709 00:47:05.288517 11002 slave.cpp:2463] Terminating executor
> > > executor_Task_Tracker_238 of framework
> > > 201307040929-252063498-5050-27411-0000 because it did not register
> within
> > > 1mins
> > > W0709 00:47:06.110532 11005 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:47:07.207320 10997 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:47:11.111778 10996 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > I0709 00:47:11.292485 10991 slave.cpp:2463] Terminating executor
> > > executor_Task_Tracker_242 of framework
> > > 201307040929-252063498-5050-27411-0000 because it did not register
> within
> > > 1mins
> > >
> > >
> > > Guodong
> > >
> > >
> > > On Tue, Jul 9, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com>
> wrote:
> > >
> > > > hey guodong, which of these task(s) is stuck in STAGING? also, the
> > > > corresponding master's logs would also be helpful here. also which
> > > version
> > > > of mesos are you running?
> > > >
> > > >
> > > > On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wa...@gmail.com> wrote:
> > > >
> > > > > It is very interesting that there are these logs.
> > > > >
> > > > > I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
> > > > > Task_Tracker_224 of framework
> 201307040929-252063498-5050-27411-0000
> > > > > I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
> > > > > Task_Tracker_230 of framework
> 201307040929-252063498-5050-27411-0000
> > > > > I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update
> > > > > TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
> > > > > Task_Tracker_230 of framework
> 201307040929-252063498-5050-27411-0000
> > > > > from executor(1)@10.47.6.21:27786
> > > > > I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received
> > > > status
> > > > > update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> for
> > > > task
> > > > > Task_Tracker_230 of framework 201307040929-252063498-5
> > > > > 050-27411-0000 with checkpoint=false
> > > > > I0709 00:33:43.973192 10994 status_update_manager.cpp:336]
> Forwarding
> > > > > status update TASK_FINISHED (UUID:
> > > 372081cc-edf2-4183-a461-9345ab6d279c)
> > > > > for task Task_Tracker_230 of framework 201307040929-252063498
> > > > > -5050-27411-0000 to master@10.47.6.15:5050
> > > > > I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement
> > for
> > > > > status update TASK_FINISHED (UUID:
> > > 372081cc-edf2-4183-a461-9345ab6d279c)
> > > > > for task Task_Tracker_230 of framework 201307040929-2520634
> > > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > > > I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received
> > > > status
> > > > > update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for
> task
> > > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > > > 411-0000
> > > > > I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning
> > up
> > > > > status update stream for task Task_Tracker_230 of framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update
> > > > > TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
> > > > > Task_Tracker_224 of framework
> 201307040929-252063498-5050-27411-0000
> > > > > from executor(1)@10.47.6.21:2310
> > > > > I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received
> > > > status
> > > > > update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> for
> > > > task
> > > > > Task_Tracker_224 of framework 201307040929-252063498-5
> > > > > 050-27411-0000 with checkpoint=false
> > > > > I0709 00:33:44.090973 11003 status_update_manager.cpp:336]
> Forwarding
> > > > > status update TASK_FINISHED (UUID:
> > > 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > > > for task Task_Tracker_224 of framework 201307040929-252063498
> > > > > -5050-27411-0000 to master@10.47.6.15:5050
> > > > > I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement
> > for
> > > > > status update TASK_FINISHED (UUID:
> > > 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > > > for task Task_Tracker_224 of framework 201307040929-2520634
> > > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > > > I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received
> > > > status
> > > > > update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for
> task
> > > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > > > 411-0000
> > > > > I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning
> > up
> > > > > status update stream for task Task_Tracker_224 of framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%.
> Max
> > > > > allowed age: 2.279168852469954days
> > > > > W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > >
> > > > >
> > > > >
> > > > > Guodong
> > > > >
> > > > >
> > > > > On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:
> > > > >
> > > > > > Hi ben,
> > > > > >
> > > > > > I ran into the same issue here.
> > > > > >
> > > > > > This also happens in our hadoop framework. The slave log is like
> > > these.
> > > > > At
> > > > > > that time, I think the work load of the node is very high.
> > > > > >
> > > > > > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> > > > > > Task_Tracker_224 for framework
> > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> > > > > > for removal
> > > > > > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task
> > > > > Task_Tracker_224
> > > > > > for framework 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor
> > directory
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > > > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> > > > > > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
> > > > > >
> > > > >
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > > > >
> > > > >
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> > > > > > with resources cpus=1; mem=1280' for framework
> > > > > > 201307040929-252063498-5050-27411-0
> > > > > > 000
> > > > > > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task
> > > > > 'Task_Tracker_224'
> > > > > > for executor executor_Task_Tracker_224 of framework
> > > > > > '201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked
> > executor
> > > > at
> > > > > > 2220
> > > > > > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached
> > file
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > > > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%.
> > Max
> > > > > > allowed age: 2.295155852123924days
> > > > > > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for
> > > > executor
> > > > > > 'executor_Task_Tracker_224' of framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
> > > > > > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of
> > > framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status
> update
> > > > > > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
> task
> > > > > > Task_Tracker_224 of framework
> > 201307040929-252063498-5050-27411-0000
> > > f
> > > > > > rom executor(1)@10.47.6.21:2310
> > > > > > I0708 23:36:46.144745 11006 status_update_manager.cpp:290]
> Received
> > > > > status
> > > > > > update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > for
> > > > task
> > > > > > Task_Tracker_224 of framework 201307040929-252063498-50
> > > > > > 50-27411-0000 with checkpoint=false
> > > > > > I0708 23:36:46.144821 11006 status_update_manager.cpp:450]
> Creating
> > > > > > StatusUpdate stream for task Task_Tracker_224 of framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:46.145076 11006 status_update_manager.cpp:336]
> > Forwarding
> > > > > > status update TASK_RUNNING (UUID:
> > > 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > > > for
> > > > > > task Task_Tracker_224 of framework 201307040929-252063498-
> > > > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > > > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending
> acknowledgement
> > > for
> > > > > > status update TASK_RUNNING (UUID:
> > > 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > > > for
> > > > > > task Task_Tracker_224 of framework 201307040929-25206349
> > > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > > > > I0708 23:36:46.146870 10993 status_update_manager.cpp:360]
> Received
> > > > > status
> > > > > > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for
> > task
> > > > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > > > > 411-0000
> > > > > > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> > > > > > Task_Tracker_230 for framework
> > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task
> > > > > Task_Tracker_230
> > > > > > for framework 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor
> > directory
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > > > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task
> > > > > 'Task_Tracker_230'
> > > > > > for executor executor_Task_Tracker_230 of framework
> > > > > > '201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> > > > > > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
> > > > > >
> > > > >
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > > > >
> > > > >
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> > > > > > with resources cpus=1; mem=1280' for framework
> > > > > > 201307040929-252063498-5050-27411-0
> > > > > > 000
> > > > > > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached
> > file
> > > > > >
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > > > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked
> > executor
> > > > at
> > > > > > 2851
> > > > > > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for
> > > > executor
> > > > > > 'executor_Task_Tracker_230' of framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
> > > > > > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of
> > > framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status
> update
> > > > > > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
> task
> > > > > > Task_Tracker_230 of framework
> > 201307040929-252063498-5050-27411-0000
> > > f
> > > > > > rom executor(1)@10.47.6.21:27786
> > > > > > I0708 23:36:54.618275 10994 status_update_manager.cpp:290]
> Received
> > > > > status
> > > > > > update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > for
> > > > task
> > > > > > Task_Tracker_230 of framework 201307040929-252063498-50
> > > > > > 50-27411-0000 with checkpoint=false
> > > > > > I0708 23:36:54.618326 10994 status_update_manager.cpp:450]
> Creating
> > > > > > StatusUpdate stream for task Task_Tracker_230 of framework
> > > > > > 201307040929-252063498-5050-27411-0000
> > > > > > I0708 23:36:54.618443 10994 status_update_manager.cpp:336]
> > Forwarding
> > > > > > status update TASK_RUNNING (UUID:
> > > 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > > > for
> > > > > > task Task_Tracker_230 of framework 201307040929-252063498-
> > > > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > > > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending
> acknowledgement
> > > for
> > > > > > status update TASK_RUNNING (UUID:
> > > 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > > > for
> > > > > > task Task_Tracker_230 of framework 201307040929-25206349
> > > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > > > > I0708 23:36:54.637682 10994 status_update_manager.cpp:360]
> Received
> > > > > status
> > > > > > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for
> > task
> > > > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > > > > 411-0000
> > > > > > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%.
> > Max
> > > > > > allowed age: 2.293704423241597days
> > > > > > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%.
> > Max
> > > > > > allowed age: 2.293703916528542days
> > > > > > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%.
> > Max
> > > > > > allowed age: 2.293639867998055days
> > > > > > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%.
> > Max
> > > > > > allowed age: 2.292921551567535days
> > > > > > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%.
> > Max
> > > > > > allowed age: 2.291521098018820days
> > > > > > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%.
> > Max
> > > > > > allowed age: 2.293668041244063days
> > > > > > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%.
> > Max
> > > > > > allowed age: 2.292935638190544days
> > > > > > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%.
> > Max
> > > > > > allowed age: 2.292916079066516days
> > > > > > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%.
> > Max
> > > > > > allowed age: 2.291485324076945days
> > > > > > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%.
> > Max
> > > > > > allowed age: 2.293641894850289days
> > > > > > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%.
> > Max
> > > > > > allowed age: 2.293629429709074days
> > > > > > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%.
> > Max
> > > > > > allowed age: 2.293525350847025days
> > > > > > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%.
> > Max
> > > > > > allowed age: 2.292909289111539days
> > > > > > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%.
> > Max
> > > > > > allowed age: 2.291438098419977days
> > > > > > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%.
> > Max
> > > > > > allowed age: 2.293635104895313days
> > > > > > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%.
> > Max
> > > > > > allowed age: 2.292983775931019days
> > > > > > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%.
> > Max
> > > > > > allowed age: 2.286910009194236days
> > > > > > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%.
> > Max
> > > > > > allowed age: 2.286909502481169days
> > > > > > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%.
> > Max
> > > > > > allowed age: 2.284414244700093days
> > > > > > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%.
> > Max
> > > > > > allowed age: 2.280636901540567days
> > > > > > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%.
> > Max
> > > > > > allowed age: 2.279481899796968days
> > > > > > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%.
> > Max
> > > > > > allowed age: 2.276566475548496days
> > > > > > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%.
> > Max
> > > > > > allowed age: 2.275690368671817days
> > > > > > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%.
> > Max
> > > > > > allowed age: 2.275057180034989days
> > > > > > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%.
> > Max
> > > > > > allowed age: 2.273999467198449days
> > > > > > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%.
> > Max
> > > > > > allowed age: 2.273472384275891days
> > > > > > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%.
> > Max
> > > > > > allowed age: 2.282894612240220days
> > > > > > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%.
> > Max
> > > > > > allowed age: 2.281966516603831days
> > > > > > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%.
> > Max
> > > > > > allowed age: 2.281962260214144days
> > > > > > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%.
> > Max
> > > > > > allowed age: 2.281791801941551days
> > > > > > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%.
> > Max
> > > > > > allowed age: 2.281715288269849days
> > > > > > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%.
> > Max
> > > > > > allowed age: 2.281699782850289days
> > > > > > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%.
> > Max
> > > > > > allowed age: 2.280776044946192days
> > > > > > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%.
> > Max
> > > > > > allowed age: 2.280772193926956days
> > > > > > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%.
> > Max
> > > > > > allowed age: 2.279204525069213days
> > > > > > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%.
> > Max
> > > > > > allowed age: 2.277132676719109days
> > > > > > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%.
> > Max
> > > > > > allowed age: 2.280012428368322days
> > > > > > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%.
> > Max
> > > > > > allowed age: 2.276733690857512days
> > > > > > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%.
> > Max
> > > > > > allowed age: 2.272715152282546days
> > > > > > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%.
> > Max
> > > > > > allowed age: 2.270354274804352days
> > > > > > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%.
> > Max
> > > > > > allowed age: 2.266927678423322days
> > > > > > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%.
> > Max
> > > > > > allowed age: 2.264218182361482days
> > > > > > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%.
> > Max
> > > > > > allowed age: 2.261509598383137days
> > > > > > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%.
> > Max
> > > > > > allowed age: 2.259379478031400days
> > > > > > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%.
> > Max
> > > > > > allowed age: 2.255819920144039days
> > > > > > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%.
> > Max
> > > > > > allowed age: 2.253314528101817days
> > > > > > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%.
> > Max
> > > > > > allowed age: 2.250524870034248days
> > > > > > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%.
> > Max
> > > > > > allowed age: 2.246784618270532days
> > > > > > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%.
> > Max
> > > > > > allowed age: 2.242399422127049days
> > > > > > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%.
> > Max
> > > > > > allowed age: 2.240250654734792days
> > > > > > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%.
> > Max
> > > > > > allowed age: 2.240516983117894days
> > > > > > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%.
> > Max
> > > > > > allowed age: 2.235834143724352days
> > > > > > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%.
> > Max
> > > > > > allowed age: 2.233297436815162days
> > > > > > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect
> > > resource
> > > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%.
> > Max
> > > > > > allowed age: 2.232469873049410days
> > > > > >
> > > > > >
> > > > > > Guodong
> > > > > >
> > > > > >
> > > > > > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <
> > > > > benjamin.mahler@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > >> Are these the un-edited logs? I'm expecting to see some logs
> from
> > > the
> > > > > >> process_isolator or cgroups_isolator in there.
> > > > > >>
> > > > > >>
> > > > > >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> > > > > >> brenden.matthews@airbedandbreakfast.com> wrote:
> > > > > >>
> > > > > >> > Hey guys,
> > > > > >> >
> > > > > >> > I'm currently having a problem where tasks will get stuck in
> the
> > > > > staging
> > > > > >> > state, though according to the logs they should have been
> > > > terminated.
> > > > > >>  They
> > > > > >> > hang indefinitely, or until I restart the slave.  Below is a
> > > > > screenshot
> > > > > >> +
> > > > > >> > logs.  Also interesting is the 'Failed to collect resource
> usage
> > > > ...'
> > > > > >> > messages.
> > > > > >> >
> > > > > >> > [image: Inline image 2]
> > > > > >> >
> > > > > >> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> > > > > >> >>
> > ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> > > > > >> >> n for framework chronos
> > > > > >> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> > > > > >> >>
> > > ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> > > > > >> >> or framework chronos
> > > > > >> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor
> > > > directory
> > > > > >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> > > > > >> >>
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> > > > > >> >> f2-4d1ce60d618f'
> > > > > >> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> > > > > >> >>
> > > > 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> > > > > >> >> or executor
> > > > > >> >>
> > ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > > > of
> > > > > >> >> framework 'c
> > > > > >> >> hronos
> > > > > >> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully
> > attached
> > > > file
> > > > > >> >>
> > > > > >>
> > > > >
> > > >
> > >
> >
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> > > > > >> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage
> > 42.18%.
> > > > Max
> > > > > >> >> allowed age: 22.955009563956388hrs
> > > > > >> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to
> collect
> > > > > >> resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating
> > executor
> > > > > >> >>
> > ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > > > of
> > > > > >> >> framework chronos because it did not register within 1mins
> > > > > >> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage
> > 42.18%.
> > > > Max
> > > > > >> >> allowed age: 22.955009342481944hrs
> > > > > >> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to
> collect
> > > > > resource
> > > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > > >> >
> > > > > >> >
> > > > > >> >
> > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by 王国栋 <wa...@gmail.com>.
No log about killing Task_Tracker_242. But I can see Task_Tracker_242 is in
STAGING from master/slave web UI. And it is stuck. The hadoop framework
consider that Task_Tracker_242 is launched, though it is not running.

I think  if slave deems Task_Tracker_242 unhealthy, it should report to
master that this task is Lost. But I am not sure why it can not report
this. I am trying to go through the code.

Guodong


On Wed, Jul 10, 2013 at 2:51 AM, Vinod Kone <vi...@gmail.com> wrote:

> Hey Guodong,
>
> So, looks like Task_Tracker_242 did not register with the slave within 1
> minute and the slave decided to kill it because it was deemed unhealthy. At
> this point the executor should've received a kill signal from the slave. Do
> you see anything of that sort in the slave or executor logs?
>
>
> On Mon, Jul 8, 2013 at 11:30 PM, 王国栋 <wa...@gmail.com> wrote:
>
> > Hi vinod.
> >
> > I am using the code from the trunk. I think the latest commit is at Jul
> > 1st. I will grep some master log in another mail.
> >
> > The Task "Task_Tracker_242" is stuck in STAGING. I think
> "Task_Tracker_224"
> > and "Task_Tracker_230" exit sucessfully. But it is strange that there
> are a
> > lot of "Fail to collect resource..." warnings.
> >
> > I0709 00:46:11.288698 11002 slave.cpp:739] Got assigned task
> > Task_Tracker_242 for framework 201307040929-252063498-5050-27411-0000
> > I0709 00:46:11.289136 11002 slave.cpp:837] Launching task
> Task_Tracker_242
> > for framework 201307040929-252063498-5050-27411-0000
> > I0709 00:46:11.291296 11002 paths.hpp:303] Created executor directory
> >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> > I0709 00:46:11.291647 11002 slave.cpp:948] Queuing task
> 'Task_Tracker_242'
> > for executor executor_Task_Tracker_242 of framework
> > '201307040929-252063498-5050-27411-0000
> > I0709 00:46:11.292162 11002 slave.cpp:511] Successfully attached file
> >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> > W0709 00:46:12.197242 10992 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:16.100548 10994 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:17.197463 11001 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:21.101570 11002 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:22.198303 11005 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:26.102522 11002 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:27.199403 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:31.103610 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:32.200248 11001 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:36.104547 11004 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:37.201236 10991 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:41.105523 10997 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:42.202250 10991 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > I0709 00:46:45.283098 11002 slave.cpp:2511] Current usage 57.43%. Max
> > allowed age: 2.279812884766227days
> > W0709 00:46:46.106760 10994 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:47.203474 10993 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:51.107544 11006 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:52.204280 10997 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:56.108530 10995 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:46:57.205417 10997 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:47:01.109284 10997 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:47:02.206368 11002 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > I0709 00:47:05.288517 11002 slave.cpp:2463] Terminating executor
> > executor_Task_Tracker_238 of framework
> > 201307040929-252063498-5050-27411-0000 because it did not register within
> > 1mins
> > W0709 00:47:06.110532 11005 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:47:07.207320 10997 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:47:11.111778 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > I0709 00:47:11.292485 10991 slave.cpp:2463] Terminating executor
> > executor_Task_Tracker_242 of framework
> > 201307040929-252063498-5050-27411-0000 because it did not register within
> > 1mins
> >
> >
> > Guodong
> >
> >
> > On Tue, Jul 9, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:
> >
> > > hey guodong, which of these task(s) is stuck in STAGING? also, the
> > > corresponding master's logs would also be helpful here. also which
> > version
> > > of mesos are you running?
> > >
> > >
> > > On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wa...@gmail.com> wrote:
> > >
> > > > It is very interesting that there are these logs.
> > > >
> > > > I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
> > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > > > I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
> > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > > > I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update
> > > > TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
> > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > > > from executor(1)@10.47.6.21:27786
> > > > I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received
> > > status
> > > > update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for
> > > task
> > > > Task_Tracker_230 of framework 201307040929-252063498-5
> > > > 050-27411-0000 with checkpoint=false
> > > > I0709 00:33:43.973192 10994 status_update_manager.cpp:336] Forwarding
> > > > status update TASK_FINISHED (UUID:
> > 372081cc-edf2-4183-a461-9345ab6d279c)
> > > > for task Task_Tracker_230 of framework 201307040929-252063498
> > > > -5050-27411-0000 to master@10.47.6.15:5050
> > > > I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement
> for
> > > > status update TASK_FINISHED (UUID:
> > 372081cc-edf2-4183-a461-9345ab6d279c)
> > > > for task Task_Tracker_230 of framework 201307040929-2520634
> > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > > I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received
> > > status
> > > > update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for task
> > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > > 411-0000
> > > > I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning
> up
> > > > status update stream for task Task_Tracker_230 of framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update
> > > > TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
> > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > > > from executor(1)@10.47.6.21:2310
> > > > I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received
> > > status
> > > > update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for
> > > task
> > > > Task_Tracker_224 of framework 201307040929-252063498-5
> > > > 050-27411-0000 with checkpoint=false
> > > > I0709 00:33:44.090973 11003 status_update_manager.cpp:336] Forwarding
> > > > status update TASK_FINISHED (UUID:
> > 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > > for task Task_Tracker_224 of framework 201307040929-252063498
> > > > -5050-27411-0000 to master@10.47.6.15:5050
> > > > I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement
> for
> > > > status update TASK_FINISHED (UUID:
> > 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > > for task Task_Tracker_224 of framework 201307040929-2520634
> > > > 98-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > > I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received
> > > status
> > > > update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for task
> > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > > 411-0000
> > > > I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning
> up
> > > > status update stream for task Task_Tracker_224 of framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%. Max
> > > > allowed age: 2.279168852469954days
> > > > W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > >
> > > >
> > > >
> > > > Guodong
> > > >
> > > >
> > > > On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:
> > > >
> > > > > Hi ben,
> > > > >
> > > > > I ran into the same issue here.
> > > > >
> > > > > This also happens in our hadoop framework. The slave log is like
> > these.
> > > > At
> > > > > that time, I think the work load of the node is very high.
> > > > >
> > > > > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> > > > > Task_Tracker_224 for framework
> 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> > > > > for removal
> > > > > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task
> > > > Task_Tracker_224
> > > > > for framework 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor
> directory
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> > > > > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
> > > > >
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > > >
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> > > > > with resources cpus=1; mem=1280' for framework
> > > > > 201307040929-252063498-5050-27411-0
> > > > > 000
> > > > > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task
> > > > 'Task_Tracker_224'
> > > > > for executor executor_Task_Tracker_224 of framework
> > > > > '201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked
> executor
> > > at
> > > > > 2220
> > > > > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached
> file
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%.
> Max
> > > > > allowed age: 2.295155852123924days
> > > > > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for
> > > executor
> > > > > 'executor_Task_Tracker_224' of framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
> > > > > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of
> > framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
> > > > > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> > > > > Task_Tracker_224 of framework
> 201307040929-252063498-5050-27411-0000
> > f
> > > > > rom executor(1)@10.47.6.21:2310
> > > > > I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received
> > > > status
> > > > > update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> for
> > > task
> > > > > Task_Tracker_224 of framework 201307040929-252063498-50
> > > > > 50-27411-0000 with checkpoint=false
> > > > > I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
> > > > > StatusUpdate stream for task Task_Tracker_224 of framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:46.145076 11006 status_update_manager.cpp:336]
> Forwarding
> > > > > status update TASK_RUNNING (UUID:
> > 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > > for
> > > > > task Task_Tracker_224 of framework 201307040929-252063498-
> > > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement
> > for
> > > > > status update TASK_RUNNING (UUID:
> > 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > > for
> > > > > task Task_Tracker_224 of framework 201307040929-25206349
> > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > > > I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received
> > > > status
> > > > > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for
> task
> > > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > > > 411-0000
> > > > > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> > > > > Task_Tracker_230 for framework
> 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task
> > > > Task_Tracker_230
> > > > > for framework 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor
> directory
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task
> > > > 'Task_Tracker_230'
> > > > > for executor executor_Task_Tracker_230 of framework
> > > > > '201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> > > > > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
> > > > >
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > > >
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> > > > > with resources cpus=1; mem=1280' for framework
> > > > > 201307040929-252063498-5050-27411-0
> > > > > 000
> > > > > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached
> file
> > > > >
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked
> executor
> > > at
> > > > > 2851
> > > > > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for
> > > executor
> > > > > 'executor_Task_Tracker_230' of framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
> > > > > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of
> > framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
> > > > > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> > > > > Task_Tracker_230 of framework
> 201307040929-252063498-5050-27411-0000
> > f
> > > > > rom executor(1)@10.47.6.21:27786
> > > > > I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received
> > > > status
> > > > > update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> for
> > > task
> > > > > Task_Tracker_230 of framework 201307040929-252063498-50
> > > > > 50-27411-0000 with checkpoint=false
> > > > > I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
> > > > > StatusUpdate stream for task Task_Tracker_230 of framework
> > > > > 201307040929-252063498-5050-27411-0000
> > > > > I0708 23:36:54.618443 10994 status_update_manager.cpp:336]
> Forwarding
> > > > > status update TASK_RUNNING (UUID:
> > 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > > for
> > > > > task Task_Tracker_230 of framework 201307040929-252063498-
> > > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement
> > for
> > > > > status update TASK_RUNNING (UUID:
> > 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > > for
> > > > > task Task_Tracker_230 of framework 201307040929-25206349
> > > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > > > I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received
> > > > status
> > > > > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for
> task
> > > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > > > 411-0000
> > > > > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%.
> Max
> > > > > allowed age: 2.293704423241597days
> > > > > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%.
> Max
> > > > > allowed age: 2.293703916528542days
> > > > > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%.
> Max
> > > > > allowed age: 2.293639867998055days
> > > > > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%.
> Max
> > > > > allowed age: 2.292921551567535days
> > > > > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%.
> Max
> > > > > allowed age: 2.291521098018820days
> > > > > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%.
> Max
> > > > > allowed age: 2.293668041244063days
> > > > > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%.
> Max
> > > > > allowed age: 2.292935638190544days
> > > > > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%.
> Max
> > > > > allowed age: 2.292916079066516days
> > > > > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%.
> Max
> > > > > allowed age: 2.291485324076945days
> > > > > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%.
> Max
> > > > > allowed age: 2.293641894850289days
> > > > > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%.
> Max
> > > > > allowed age: 2.293629429709074days
> > > > > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%.
> Max
> > > > > allowed age: 2.293525350847025days
> > > > > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%.
> Max
> > > > > allowed age: 2.292909289111539days
> > > > > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%.
> Max
> > > > > allowed age: 2.291438098419977days
> > > > > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%.
> Max
> > > > > allowed age: 2.293635104895313days
> > > > > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%.
> Max
> > > > > allowed age: 2.292983775931019days
> > > > > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%.
> Max
> > > > > allowed age: 2.286910009194236days
> > > > > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%.
> Max
> > > > > allowed age: 2.286909502481169days
> > > > > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%.
> Max
> > > > > allowed age: 2.284414244700093days
> > > > > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%.
> Max
> > > > > allowed age: 2.280636901540567days
> > > > > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%.
> Max
> > > > > allowed age: 2.279481899796968days
> > > > > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%.
> Max
> > > > > allowed age: 2.276566475548496days
> > > > > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%.
> Max
> > > > > allowed age: 2.275690368671817days
> > > > > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%.
> Max
> > > > > allowed age: 2.275057180034989days
> > > > > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%.
> Max
> > > > > allowed age: 2.273999467198449days
> > > > > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%.
> Max
> > > > > allowed age: 2.273472384275891days
> > > > > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%.
> Max
> > > > > allowed age: 2.282894612240220days
> > > > > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%.
> Max
> > > > > allowed age: 2.281966516603831days
> > > > > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%.
> Max
> > > > > allowed age: 2.281962260214144days
> > > > > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%.
> Max
> > > > > allowed age: 2.281791801941551days
> > > > > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%.
> Max
> > > > > allowed age: 2.281715288269849days
> > > > > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%.
> Max
> > > > > allowed age: 2.281699782850289days
> > > > > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%.
> Max
> > > > > allowed age: 2.280776044946192days
> > > > > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%.
> Max
> > > > > allowed age: 2.280772193926956days
> > > > > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%.
> Max
> > > > > allowed age: 2.279204525069213days
> > > > > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%.
> Max
> > > > > allowed age: 2.277132676719109days
> > > > > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%.
> Max
> > > > > allowed age: 2.280012428368322days
> > > > > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%.
> Max
> > > > > allowed age: 2.276733690857512days
> > > > > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%.
> Max
> > > > > allowed age: 2.272715152282546days
> > > > > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%.
> Max
> > > > > allowed age: 2.270354274804352days
> > > > > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%.
> Max
> > > > > allowed age: 2.266927678423322days
> > > > > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%.
> Max
> > > > > allowed age: 2.264218182361482days
> > > > > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%.
> Max
> > > > > allowed age: 2.261509598383137days
> > > > > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%.
> Max
> > > > > allowed age: 2.259379478031400days
> > > > > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%.
> Max
> > > > > allowed age: 2.255819920144039days
> > > > > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%.
> Max
> > > > > allowed age: 2.253314528101817days
> > > > > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%.
> Max
> > > > > allowed age: 2.250524870034248days
> > > > > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%.
> Max
> > > > > allowed age: 2.246784618270532days
> > > > > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%.
> Max
> > > > > allowed age: 2.242399422127049days
> > > > > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%.
> Max
> > > > > allowed age: 2.240250654734792days
> > > > > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%.
> Max
> > > > > allowed age: 2.240516983117894days
> > > > > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%.
> Max
> > > > > allowed age: 2.235834143724352days
> > > > > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%.
> Max
> > > > > allowed age: 2.233297436815162days
> > > > > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect
> > resource
> > > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%.
> Max
> > > > > allowed age: 2.232469873049410days
> > > > >
> > > > >
> > > > > Guodong
> > > > >
> > > > >
> > > > > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <
> > > > benjamin.mahler@gmail.com
> > > > > > wrote:
> > > > >
> > > > >> Are these the un-edited logs? I'm expecting to see some logs from
> > the
> > > > >> process_isolator or cgroups_isolator in there.
> > > > >>
> > > > >>
> > > > >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> > > > >> brenden.matthews@airbedandbreakfast.com> wrote:
> > > > >>
> > > > >> > Hey guys,
> > > > >> >
> > > > >> > I'm currently having a problem where tasks will get stuck in the
> > > > staging
> > > > >> > state, though according to the logs they should have been
> > > terminated.
> > > > >>  They
> > > > >> > hang indefinitely, or until I restart the slave.  Below is a
> > > > screenshot
> > > > >> +
> > > > >> > logs.  Also interesting is the 'Failed to collect resource usage
> > > ...'
> > > > >> > messages.
> > > > >> >
> > > > >> > [image: Inline image 2]
> > > > >> >
> > > > >> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> > > > >> >>
> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> > > > >> >> n for framework chronos
> > > > >> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> > > > >> >>
> > ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> > > > >> >> or framework chronos
> > > > >> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor
> > > directory
> > > > >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> > > > >> >>
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> > > > >> >> f2-4d1ce60d618f'
> > > > >> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> > > > >> >>
> > > 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> > > > >> >> or executor
> > > > >> >>
> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > > of
> > > > >> >> framework 'c
> > > > >> >> hronos
> > > > >> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully
> attached
> > > file
> > > > >> >>
> > > > >>
> > > >
> > >
> >
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> > > > >> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage
> 42.18%.
> > > Max
> > > > >> >> allowed age: 22.955009563956388hrs
> > > > >> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect
> > > > >> resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating
> executor
> > > > >> >>
> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > > of
> > > > >> >> framework chronos because it did not register within 1mins
> > > > >> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage
> 42.18%.
> > > Max
> > > > >> >> allowed age: 22.955009342481944hrs
> > > > >> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect
> > > > resource
> > > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by Vinod Kone <vi...@gmail.com>.
Hey Guodong,

So, looks like Task_Tracker_242 did not register with the slave within 1
minute and the slave decided to kill it because it was deemed unhealthy. At
this point the executor should've received a kill signal from the slave. Do
you see anything of that sort in the slave or executor logs?


On Mon, Jul 8, 2013 at 11:30 PM, 王国栋 <wa...@gmail.com> wrote:

> Hi vinod.
>
> I am using the code from the trunk. I think the latest commit is at Jul
> 1st. I will grep some master log in another mail.
>
> The Task "Task_Tracker_242" is stuck in STAGING. I think "Task_Tracker_224"
> and "Task_Tracker_230" exit sucessfully. But it is strange that there are a
> lot of "Fail to collect resource..." warnings.
>
> I0709 00:46:11.288698 11002 slave.cpp:739] Got assigned task
> Task_Tracker_242 for framework 201307040929-252063498-5050-27411-0000
> I0709 00:46:11.289136 11002 slave.cpp:837] Launching task Task_Tracker_242
> for framework 201307040929-252063498-5050-27411-0000
> I0709 00:46:11.291296 11002 paths.hpp:303] Created executor directory
>
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> I0709 00:46:11.291647 11002 slave.cpp:948] Queuing task 'Task_Tracker_242'
> for executor executor_Task_Tracker_242 of framework
> '201307040929-252063498-5050-27411-0000
> I0709 00:46:11.292162 11002 slave.cpp:511] Successfully attached file
>
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
> W0709 00:46:12.197242 10992 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:16.100548 10994 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:17.197463 11001 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:21.101570 11002 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:22.198303 11005 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:26.102522 11002 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:27.199403 10998 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:31.103610 10998 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:32.200248 11001 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:36.104547 11004 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:37.201236 10991 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:41.105523 10997 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:42.202250 10991 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> I0709 00:46:45.283098 11002 slave.cpp:2511] Current usage 57.43%. Max
> allowed age: 2.279812884766227days
> W0709 00:46:46.106760 10994 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:47.203474 10993 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:51.107544 11006 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:52.204280 10997 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:56.108530 10995 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:46:57.205417 10997 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:47:01.109284 10997 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:47:02.206368 11002 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> I0709 00:47:05.288517 11002 slave.cpp:2463] Terminating executor
> executor_Task_Tracker_238 of framework
> 201307040929-252063498-5050-27411-0000 because it did not register within
> 1mins
> W0709 00:47:06.110532 11005 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:47:07.207320 10997 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:47:11.111778 10996 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> I0709 00:47:11.292485 10991 slave.cpp:2463] Terminating executor
> executor_Task_Tracker_242 of framework
> 201307040929-252063498-5050-27411-0000 because it did not register within
> 1mins
>
>
> Guodong
>
>
> On Tue, Jul 9, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:
>
> > hey guodong, which of these task(s) is stuck in STAGING? also, the
> > corresponding master's logs would also be helpful here. also which
> version
> > of mesos are you running?
> >
> >
> > On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wa...@gmail.com> wrote:
> >
> > > It is very interesting that there are these logs.
> > >
> > > I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
> > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > > I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
> > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > > I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update
> > > TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
> > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > > from executor(1)@10.47.6.21:27786
> > > I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received
> > status
> > > update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for
> > task
> > > Task_Tracker_230 of framework 201307040929-252063498-5
> > > 050-27411-0000 with checkpoint=false
> > > I0709 00:33:43.973192 10994 status_update_manager.cpp:336] Forwarding
> > > status update TASK_FINISHED (UUID:
> 372081cc-edf2-4183-a461-9345ab6d279c)
> > > for task Task_Tracker_230 of framework 201307040929-252063498
> > > -5050-27411-0000 to master@10.47.6.15:5050
> > > I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement for
> > > status update TASK_FINISHED (UUID:
> 372081cc-edf2-4183-a461-9345ab6d279c)
> > > for task Task_Tracker_230 of framework 201307040929-2520634
> > > 98-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received
> > status
> > > update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for task
> > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > 411-0000
> > > I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning up
> > > status update stream for task Task_Tracker_230 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update
> > > TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
> > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > > from executor(1)@10.47.6.21:2310
> > > I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received
> > status
> > > update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for
> > task
> > > Task_Tracker_224 of framework 201307040929-252063498-5
> > > 050-27411-0000 with checkpoint=false
> > > I0709 00:33:44.090973 11003 status_update_manager.cpp:336] Forwarding
> > > status update TASK_FINISHED (UUID:
> 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > for task Task_Tracker_224 of framework 201307040929-252063498
> > > -5050-27411-0000 to master@10.47.6.15:5050
> > > I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement for
> > > status update TASK_FINISHED (UUID:
> 61d5775a-2375-412a-a5a4-80ab55163d88)
> > > for task Task_Tracker_224 of framework 201307040929-2520634
> > > 98-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received
> > status
> > > update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for task
> > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > 411-0000
> > > I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning up
> > > status update stream for task Task_Tracker_224 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%. Max
> > > allowed age: 2.279168852469954days
> > > W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > >
> > >
> > >
> > > Guodong
> > >
> > >
> > > On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:
> > >
> > > > Hi ben,
> > > >
> > > > I ran into the same issue here.
> > > >
> > > > This also happens in our hadoop framework. The slave log is like
> these.
> > > At
> > > > that time, I think the work load of the node is very high.
> > > >
> > > > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> > > > Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> > > > for removal
> > > > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task
> > > Task_Tracker_224
> > > > for framework 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> > > > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> > > > with resources cpus=1; mem=1280' for framework
> > > > 201307040929-252063498-5050-27411-0
> > > > 000
> > > > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task
> > > 'Task_Tracker_224'
> > > > for executor executor_Task_Tracker_224 of framework
> > > > '201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor
> > at
> > > > 2220
> > > > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max
> > > > allowed age: 2.295155852123924days
> > > > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for
> > executor
> > > > 'executor_Task_Tracker_224' of framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
> > > > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of
> framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
> > > > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> f
> > > > rom executor(1)@10.47.6.21:2310
> > > > I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received
> > > status
> > > > update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
> > task
> > > > Task_Tracker_224 of framework 201307040929-252063498-50
> > > > 50-27411-0000 with checkpoint=false
> > > > I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
> > > > StatusUpdate stream for task Task_Tracker_224 of framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding
> > > > status update TASK_RUNNING (UUID:
> 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > for
> > > > task Task_Tracker_224 of framework 201307040929-252063498-
> > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement
> for
> > > > status update TASK_RUNNING (UUID:
> 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > > for
> > > > task Task_Tracker_224 of framework 201307040929-25206349
> > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > > I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received
> > > status
> > > > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task
> > > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > > 411-0000
> > > > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> > > > Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task
> > > Task_Tracker_230
> > > > for framework 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task
> > > 'Task_Tracker_230'
> > > > for executor executor_Task_Tracker_230 of framework
> > > > '201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> > > > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
> > > >
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > > >
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> > > > with resources cpus=1; mem=1280' for framework
> > > > 201307040929-252063498-5050-27411-0
> > > > 000
> > > > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file
> > > >
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor
> > at
> > > > 2851
> > > > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for
> > executor
> > > > 'executor_Task_Tracker_230' of framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
> > > > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of
> framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
> > > > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> f
> > > > rom executor(1)@10.47.6.21:27786
> > > > I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received
> > > status
> > > > update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
> > task
> > > > Task_Tracker_230 of framework 201307040929-252063498-50
> > > > 50-27411-0000 with checkpoint=false
> > > > I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
> > > > StatusUpdate stream for task Task_Tracker_230 of framework
> > > > 201307040929-252063498-5050-27411-0000
> > > > I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding
> > > > status update TASK_RUNNING (UUID:
> 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > for
> > > > task Task_Tracker_230 of framework 201307040929-252063498-
> > > > 5050-27411-0000 to master@10.47.6.15:5050
> > > > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement
> for
> > > > status update TASK_RUNNING (UUID:
> 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > > for
> > > > task Task_Tracker_230 of framework 201307040929-25206349
> > > > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > > I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received
> > > status
> > > > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task
> > > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > > 411-0000
> > > > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max
> > > > allowed age: 2.293704423241597days
> > > > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max
> > > > allowed age: 2.293703916528542days
> > > > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max
> > > > allowed age: 2.293639867998055days
> > > > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max
> > > > allowed age: 2.292921551567535days
> > > > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max
> > > > allowed age: 2.291521098018820days
> > > > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max
> > > > allowed age: 2.293668041244063days
> > > > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max
> > > > allowed age: 2.292935638190544days
> > > > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max
> > > > allowed age: 2.292916079066516days
> > > > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max
> > > > allowed age: 2.291485324076945days
> > > > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max
> > > > allowed age: 2.293641894850289days
> > > > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max
> > > > allowed age: 2.293629429709074days
> > > > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max
> > > > allowed age: 2.293525350847025days
> > > > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max
> > > > allowed age: 2.292909289111539days
> > > > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max
> > > > allowed age: 2.291438098419977days
> > > > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max
> > > > allowed age: 2.293635104895313days
> > > > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max
> > > > allowed age: 2.292983775931019days
> > > > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max
> > > > allowed age: 2.286910009194236days
> > > > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max
> > > > allowed age: 2.286909502481169days
> > > > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max
> > > > allowed age: 2.284414244700093days
> > > > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max
> > > > allowed age: 2.280636901540567days
> > > > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max
> > > > allowed age: 2.279481899796968days
> > > > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max
> > > > allowed age: 2.276566475548496days
> > > > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max
> > > > allowed age: 2.275690368671817days
> > > > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max
> > > > allowed age: 2.275057180034989days
> > > > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max
> > > > allowed age: 2.273999467198449days
> > > > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max
> > > > allowed age: 2.273472384275891days
> > > > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max
> > > > allowed age: 2.282894612240220days
> > > > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max
> > > > allowed age: 2.281966516603831days
> > > > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max
> > > > allowed age: 2.281962260214144days
> > > > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max
> > > > allowed age: 2.281791801941551days
> > > > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max
> > > > allowed age: 2.281715288269849days
> > > > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max
> > > > allowed age: 2.281699782850289days
> > > > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max
> > > > allowed age: 2.280776044946192days
> > > > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max
> > > > allowed age: 2.280772193926956days
> > > > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max
> > > > allowed age: 2.279204525069213days
> > > > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max
> > > > allowed age: 2.277132676719109days
> > > > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max
> > > > allowed age: 2.280012428368322days
> > > > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max
> > > > allowed age: 2.276733690857512days
> > > > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max
> > > > allowed age: 2.272715152282546days
> > > > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max
> > > > allowed age: 2.270354274804352days
> > > > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max
> > > > allowed age: 2.266927678423322days
> > > > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max
> > > > allowed age: 2.264218182361482days
> > > > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max
> > > > allowed age: 2.261509598383137days
> > > > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max
> > > > allowed age: 2.259379478031400days
> > > > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max
> > > > allowed age: 2.255819920144039days
> > > > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max
> > > > allowed age: 2.253314528101817days
> > > > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max
> > > > allowed age: 2.250524870034248days
> > > > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max
> > > > allowed age: 2.246784618270532days
> > > > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max
> > > > allowed age: 2.242399422127049days
> > > > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max
> > > > allowed age: 2.240250654734792days
> > > > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max
> > > > allowed age: 2.240516983117894days
> > > > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max
> > > > allowed age: 2.235834143724352days
> > > > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max
> > > > allowed age: 2.233297436815162days
> > > > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_224' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect
> resource
> > > > usage for executor 'executor_Task_Tracker_230' of framework
> > > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max
> > > > allowed age: 2.232469873049410days
> > > >
> > > >
> > > > Guodong
> > > >
> > > >
> > > > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <
> > > benjamin.mahler@gmail.com
> > > > > wrote:
> > > >
> > > >> Are these the un-edited logs? I'm expecting to see some logs from
> the
> > > >> process_isolator or cgroups_isolator in there.
> > > >>
> > > >>
> > > >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> > > >> brenden.matthews@airbedandbreakfast.com> wrote:
> > > >>
> > > >> > Hey guys,
> > > >> >
> > > >> > I'm currently having a problem where tasks will get stuck in the
> > > staging
> > > >> > state, though according to the logs they should have been
> > terminated.
> > > >>  They
> > > >> > hang indefinitely, or until I restart the slave.  Below is a
> > > screenshot
> > > >> +
> > > >> > logs.  Also interesting is the 'Failed to collect resource usage
> > ...'
> > > >> > messages.
> > > >> >
> > > >> > [image: Inline image 2]
> > > >> >
> > > >> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> > > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> > > >> >> n for framework chronos
> > > >> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> > > >> >>
> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> > > >> >> or framework chronos
> > > >> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor
> > directory
> > > >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> > > >> >>
> > > >> >>
> > > >>
> > >
> >
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> > > >> >> f2-4d1ce60d618f'
> > > >> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> > > >> >>
> > 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> > > >> >> or executor
> > > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > of
> > > >> >> framework 'c
> > > >> >> hronos
> > > >> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached
> > file
> > > >> >>
> > > >>
> > >
> >
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> > > >> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%.
> > Max
> > > >> >> allowed age: 22.955009563956388hrs
> > > >> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect
> > > >> resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> > > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> > of
> > > >> >> framework chronos because it did not register within 1mins
> > > >> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%.
> > Max
> > > >> >> allowed age: 22.955009342481944hrs
> > > >> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect
> > > resource
> > > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > > >> >
> > > >> >
> > > >> >
> > > >>
> > > >
> > > >
> > >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by 王国栋 <wa...@gmail.com>.
Hi vinod.

I am using the code from the trunk. I think the latest commit is at Jul
1st. I will grep some master log in another mail.

The Task "Task_Tracker_242" is stuck in STAGING. I think "Task_Tracker_224"
and "Task_Tracker_230" exit sucessfully. But it is strange that there are a
lot of "Fail to collect resource..." warnings.

I0709 00:46:11.288698 11002 slave.cpp:739] Got assigned task
Task_Tracker_242 for framework 201307040929-252063498-5050-27411-0000
I0709 00:46:11.289136 11002 slave.cpp:837] Launching task Task_Tracker_242
for framework 201307040929-252063498-5050-27411-0000
I0709 00:46:11.291296 11002 paths.hpp:303] Created executor directory
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
I0709 00:46:11.291647 11002 slave.cpp:948] Queuing task 'Task_Tracker_242'
for executor executor_Task_Tracker_242 of framework
'201307040929-252063498-5050-27411-0000
I0709 00:46:11.292162 11002 slave.cpp:511] Successfully attached file
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_242/runs/5c47ad99-1c78-43c8-9f27-9509f1d39c3d'
W0709 00:46:12.197242 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:16.100548 10994 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:17.197463 11001 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:21.101570 11002 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:22.198303 11005 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:26.102522 11002 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:27.199403 10998 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:31.103610 10998 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:32.200248 11001 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:36.104547 11004 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:37.201236 10991 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:41.105523 10997 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:42.202250 10991 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
I0709 00:46:45.283098 11002 slave.cpp:2511] Current usage 57.43%. Max
allowed age: 2.279812884766227days
W0709 00:46:46.106760 10994 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:47.203474 10993 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:51.107544 11006 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:52.204280 10997 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:56.108530 10995 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:46:57.205417 10997 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:47:01.109284 10997 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:47:02.206368 11002 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
I0709 00:47:05.288517 11002 slave.cpp:2463] Terminating executor
executor_Task_Tracker_238 of framework
201307040929-252063498-5050-27411-0000 because it did not register within
1mins
W0709 00:47:06.110532 11005 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:47:07.207320 10997 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:47:11.111778 10996 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
I0709 00:47:11.292485 10991 slave.cpp:2463] Terminating executor
executor_Task_Tracker_242 of framework
201307040929-252063498-5050-27411-0000 because it did not register within
1mins


Guodong


On Tue, Jul 9, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:

> hey guodong, which of these task(s) is stuck in STAGING? also, the
> corresponding master's logs would also be helpful here. also which version
> of mesos are you running?
>
>
> On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wa...@gmail.com> wrote:
>
> > It is very interesting that there are these logs.
> >
> > I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update
> > TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > from executor(1)@10.47.6.21:27786
> > I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received
> status
> > update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for
> task
> > Task_Tracker_230 of framework 201307040929-252063498-5
> > 050-27411-0000 with checkpoint=false
> > I0709 00:33:43.973192 10994 status_update_manager.cpp:336] Forwarding
> > status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> > for task Task_Tracker_230 of framework 201307040929-252063498
> > -5050-27411-0000 to master@10.47.6.15:5050
> > I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement for
> > status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> > for task Task_Tracker_230 of framework 201307040929-2520634
> > 98-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received
> status
> > update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > 411-0000
> > I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning up
> > status update stream for task Task_Tracker_230 of framework
> > 201307040929-252063498-5050-27411-0000
> > I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update
> > TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > from executor(1)@10.47.6.21:2310
> > I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received
> status
> > update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for
> task
> > Task_Tracker_224 of framework 201307040929-252063498-5
> > 050-27411-0000 with checkpoint=false
> > I0709 00:33:44.090973 11003 status_update_manager.cpp:336] Forwarding
> > status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> > for task Task_Tracker_224 of framework 201307040929-252063498
> > -5050-27411-0000 to master@10.47.6.15:5050
> > I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement for
> > status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> > for task Task_Tracker_224 of framework 201307040929-2520634
> > 98-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received
> status
> > update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > 411-0000
> > I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning up
> > status update stream for task Task_Tracker_224 of framework
> > 201307040929-252063498-5050-27411-0000
> > I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%. Max
> > allowed age: 2.279168852469954days
> > W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> >
> >
> >
> > Guodong
> >
> >
> > On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:
> >
> > > Hi ben,
> > >
> > > I ran into the same issue here.
> > >
> > > This also happens in our hadoop framework. The slave log is like these.
> > At
> > > that time, I think the work load of the node is very high.
> > >
> > > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> > > Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> > > for removal
> > > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task
> > Task_Tracker_224
> > > for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> > > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> > > with resources cpus=1; mem=1280' for framework
> > > 201307040929-252063498-5050-27411-0
> > > 000
> > > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task
> > 'Task_Tracker_224'
> > > for executor executor_Task_Tracker_224 of framework
> > > '201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor
> at
> > > 2220
> > > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max
> > > allowed age: 2.295155852123924days
> > > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for
> executor
> > > 'executor_Task_Tracker_224' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
> > > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
> > > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 f
> > > rom executor(1)@10.47.6.21:2310
> > > I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received
> > status
> > > update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
> task
> > > Task_Tracker_224 of framework 201307040929-252063498-50
> > > 50-27411-0000 with checkpoint=false
> > > I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
> > > StatusUpdate stream for task Task_Tracker_224 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding
> > > status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > for
> > > task Task_Tracker_224 of framework 201307040929-252063498-
> > > 5050-27411-0000 to master@10.47.6.15:5050
> > > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement for
> > > status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > for
> > > task Task_Tracker_224 of framework 201307040929-25206349
> > > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received
> > status
> > > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task
> > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > 411-0000
> > > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> > > Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task
> > Task_Tracker_230
> > > for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task
> > 'Task_Tracker_230'
> > > for executor executor_Task_Tracker_230 of framework
> > > '201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> > > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> > > with resources cpus=1; mem=1280' for framework
> > > 201307040929-252063498-5050-27411-0
> > > 000
> > > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor
> at
> > > 2851
> > > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for
> executor
> > > 'executor_Task_Tracker_230' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
> > > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
> > > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 f
> > > rom executor(1)@10.47.6.21:27786
> > > I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received
> > status
> > > update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
> task
> > > Task_Tracker_230 of framework 201307040929-252063498-50
> > > 50-27411-0000 with checkpoint=false
> > > I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
> > > StatusUpdate stream for task Task_Tracker_230 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding
> > > status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > for
> > > task Task_Tracker_230 of framework 201307040929-252063498-
> > > 5050-27411-0000 to master@10.47.6.15:5050
> > > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement for
> > > status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > for
> > > task Task_Tracker_230 of framework 201307040929-25206349
> > > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received
> > status
> > > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task
> > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > 411-0000
> > > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293704423241597days
> > > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293703916528542days
> > > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293639867998055days
> > > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292921551567535days
> > > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max
> > > allowed age: 2.291521098018820days
> > > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293668041244063days
> > > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292935638190544days
> > > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292916079066516days
> > > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max
> > > allowed age: 2.291485324076945days
> > > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293641894850289days
> > > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293629429709074days
> > > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.293525350847025days
> > > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292909289111539days
> > > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max
> > > allowed age: 2.291438098419977days
> > > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293635104895313days
> > > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292983775931019days
> > > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max
> > > allowed age: 2.286910009194236days
> > > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max
> > > allowed age: 2.286909502481169days
> > > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max
> > > allowed age: 2.284414244700093days
> > > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max
> > > allowed age: 2.280636901540567days
> > > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max
> > > allowed age: 2.279481899796968days
> > > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max
> > > allowed age: 2.276566475548496days
> > > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max
> > > allowed age: 2.275690368671817days
> > > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max
> > > allowed age: 2.275057180034989days
> > > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max
> > > allowed age: 2.273999467198449days
> > > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max
> > > allowed age: 2.273472384275891days
> > > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max
> > > allowed age: 2.282894612240220days
> > > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281966516603831days
> > > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281962260214144days
> > > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281791801941551days
> > > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281715288269849days
> > > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281699782850289days
> > > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max
> > > allowed age: 2.280776044946192days
> > > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max
> > > allowed age: 2.280772193926956days
> > > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max
> > > allowed age: 2.279204525069213days
> > > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max
> > > allowed age: 2.277132676719109days
> > > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max
> > > allowed age: 2.280012428368322days
> > > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max
> > > allowed age: 2.276733690857512days
> > > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max
> > > allowed age: 2.272715152282546days
> > > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max
> > > allowed age: 2.270354274804352days
> > > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max
> > > allowed age: 2.266927678423322days
> > > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max
> > > allowed age: 2.264218182361482days
> > > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max
> > > allowed age: 2.261509598383137days
> > > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max
> > > allowed age: 2.259379478031400days
> > > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max
> > > allowed age: 2.255819920144039days
> > > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max
> > > allowed age: 2.253314528101817days
> > > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max
> > > allowed age: 2.250524870034248days
> > > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max
> > > allowed age: 2.246784618270532days
> > > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max
> > > allowed age: 2.242399422127049days
> > > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max
> > > allowed age: 2.240250654734792days
> > > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max
> > > allowed age: 2.240516983117894days
> > > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max
> > > allowed age: 2.235834143724352days
> > > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max
> > > allowed age: 2.233297436815162days
> > > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max
> > > allowed age: 2.232469873049410days
> > >
> > >
> > > Guodong
> > >
> > >
> > > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <
> > benjamin.mahler@gmail.com
> > > > wrote:
> > >
> > >> Are these the un-edited logs? I'm expecting to see some logs from the
> > >> process_isolator or cgroups_isolator in there.
> > >>
> > >>
> > >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> > >> brenden.matthews@airbedandbreakfast.com> wrote:
> > >>
> > >> > Hey guys,
> > >> >
> > >> > I'm currently having a problem where tasks will get stuck in the
> > staging
> > >> > state, though according to the logs they should have been
> terminated.
> > >>  They
> > >> > hang indefinitely, or until I restart the slave.  Below is a
> > screenshot
> > >> +
> > >> > logs.  Also interesting is the 'Failed to collect resource usage
> ...'
> > >> > messages.
> > >> >
> > >> > [image: Inline image 2]
> > >> >
> > >> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> > >> >> n for framework chronos
> > >> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> > >> >> or framework chronos
> > >> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor
> directory
> > >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> > >> >>
> > >> >>
> > >>
> >
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> > >> >> f2-4d1ce60d618f'
> > >> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> > >> >>
> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> > >> >> or executor
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> of
> > >> >> framework 'c
> > >> >> hronos
> > >> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached
> file
> > >> >>
> > >>
> >
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> > >> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%.
> Max
> > >> >> allowed age: 22.955009563956388hrs
> > >> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect
> > >> resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> of
> > >> >> framework chronos because it did not register within 1mins
> > >> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%.
> Max
> > >> >> allowed age: 22.955009342481944hrs
> > >> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by 王国栋 <wa...@gmail.com>.
HI  Vinod,

I grep the master log at that time. Hoping it will help. Thanks.

Guodong


On Tue, Jul 9, 2013 at 2:21 PM, Vinod Kone <vi...@gmail.com> wrote:

> hey guodong, which of these task(s) is stuck in STAGING? also, the
> corresponding master's logs would also be helpful here. also which version
> of mesos are you running?
>
>
> On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wa...@gmail.com> wrote:
>
> > It is very interesting that there are these logs.
> >
> > I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update
> > TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> > from executor(1)@10.47.6.21:27786
> > I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received
> status
> > update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for
> task
> > Task_Tracker_230 of framework 201307040929-252063498-5
> > 050-27411-0000 with checkpoint=false
> > I0709 00:33:43.973192 10994 status_update_manager.cpp:336] Forwarding
> > status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> > for task Task_Tracker_230 of framework 201307040929-252063498
> > -5050-27411-0000 to master@10.47.6.15:5050
> > I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement for
> > status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> > for task Task_Tracker_230 of framework 201307040929-2520634
> > 98-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received
> status
> > update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > 411-0000
> > I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning up
> > status update stream for task Task_Tracker_230 of framework
> > 201307040929-252063498-5050-27411-0000
> > I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update
> > TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> > from executor(1)@10.47.6.21:2310
> > I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received
> status
> > update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for
> task
> > Task_Tracker_224 of framework 201307040929-252063498-5
> > 050-27411-0000 with checkpoint=false
> > I0709 00:33:44.090973 11003 status_update_manager.cpp:336] Forwarding
> > status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> > for task Task_Tracker_224 of framework 201307040929-252063498
> > -5050-27411-0000 to master@10.47.6.15:5050
> > I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement for
> > status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> > for task Task_Tracker_224 of framework 201307040929-2520634
> > 98-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received
> status
> > update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > 411-0000
> > I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning up
> > status update stream for task Task_Tracker_224 of framework
> > 201307040929-252063498-5050-27411-0000
> > I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%. Max
> > allowed age: 2.279168852469954days
> > W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> >
> >
> >
> > Guodong
> >
> >
> > On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:
> >
> > > Hi ben,
> > >
> > > I ran into the same issue here.
> > >
> > > This also happens in our hadoop framework. The slave log is like these.
> > At
> > > that time, I think the work load of the node is very high.
> > >
> > > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> > > Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> > > for removal
> > > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task
> > Task_Tracker_224
> > > for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> > > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> > > with resources cpus=1; mem=1280' for framework
> > > 201307040929-252063498-5050-27411-0
> > > 000
> > > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task
> > 'Task_Tracker_224'
> > > for executor executor_Task_Tracker_224 of framework
> > > '201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor
> at
> > > 2220
> > > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max
> > > allowed age: 2.295155852123924days
> > > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for
> executor
> > > 'executor_Task_Tracker_224' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
> > > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
> > > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> > > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 f
> > > rom executor(1)@10.47.6.21:2310
> > > I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received
> > status
> > > update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
> task
> > > Task_Tracker_224 of framework 201307040929-252063498-50
> > > 50-27411-0000 with checkpoint=false
> > > I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
> > > StatusUpdate stream for task Task_Tracker_224 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding
> > > status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > for
> > > task Task_Tracker_224 of framework 201307040929-252063498-
> > > 5050-27411-0000 to master@10.47.6.15:5050
> > > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement for
> > > status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> > for
> > > task Task_Tracker_224 of framework 201307040929-25206349
> > > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > > I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received
> > status
> > > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task
> > > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > > 411-0000
> > > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> > > Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task
> > Task_Tracker_230
> > > for framework 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task
> > 'Task_Tracker_230'
> > > for executor executor_Task_Tracker_230 of framework
> > > '201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> > > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
> > >
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> > >
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> > > with resources cpus=1; mem=1280' for framework
> > > 201307040929-252063498-5050-27411-0
> > > 000
> > > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file
> > >
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor
> at
> > > 2851
> > > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for
> executor
> > > 'executor_Task_Tracker_230' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
> > > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
> > > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> > > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 f
> > > rom executor(1)@10.47.6.21:27786
> > > I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received
> > status
> > > update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
> task
> > > Task_Tracker_230 of framework 201307040929-252063498-50
> > > 50-27411-0000 with checkpoint=false
> > > I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
> > > StatusUpdate stream for task Task_Tracker_230 of framework
> > > 201307040929-252063498-5050-27411-0000
> > > I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding
> > > status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > for
> > > task Task_Tracker_230 of framework 201307040929-252063498-
> > > 5050-27411-0000 to master@10.47.6.15:5050
> > > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement for
> > > status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> > for
> > > task Task_Tracker_230 of framework 201307040929-25206349
> > > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > > I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received
> > status
> > > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task
> > > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > > 411-0000
> > > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293704423241597days
> > > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293703916528542days
> > > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293639867998055days
> > > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292921551567535days
> > > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max
> > > allowed age: 2.291521098018820days
> > > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293668041244063days
> > > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292935638190544days
> > > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292916079066516days
> > > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max
> > > allowed age: 2.291485324076945days
> > > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293641894850289days
> > > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293629429709074days
> > > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.293525350847025days
> > > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292909289111539days
> > > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max
> > > allowed age: 2.291438098419977days
> > > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max
> > > allowed age: 2.293635104895313days
> > > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max
> > > allowed age: 2.292983775931019days
> > > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max
> > > allowed age: 2.286910009194236days
> > > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max
> > > allowed age: 2.286909502481169days
> > > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max
> > > allowed age: 2.284414244700093days
> > > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max
> > > allowed age: 2.280636901540567days
> > > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max
> > > allowed age: 2.279481899796968days
> > > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max
> > > allowed age: 2.276566475548496days
> > > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max
> > > allowed age: 2.275690368671817days
> > > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max
> > > allowed age: 2.275057180034989days
> > > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max
> > > allowed age: 2.273999467198449days
> > > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max
> > > allowed age: 2.273472384275891days
> > > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max
> > > allowed age: 2.282894612240220days
> > > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281966516603831days
> > > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281962260214144days
> > > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281791801941551days
> > > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281715288269849days
> > > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max
> > > allowed age: 2.281699782850289days
> > > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max
> > > allowed age: 2.280776044946192days
> > > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max
> > > allowed age: 2.280772193926956days
> > > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max
> > > allowed age: 2.279204525069213days
> > > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max
> > > allowed age: 2.277132676719109days
> > > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max
> > > allowed age: 2.280012428368322days
> > > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max
> > > allowed age: 2.276733690857512days
> > > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max
> > > allowed age: 2.272715152282546days
> > > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max
> > > allowed age: 2.270354274804352days
> > > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max
> > > allowed age: 2.266927678423322days
> > > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max
> > > allowed age: 2.264218182361482days
> > > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max
> > > allowed age: 2.261509598383137days
> > > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max
> > > allowed age: 2.259379478031400days
> > > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max
> > > allowed age: 2.255819920144039days
> > > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max
> > > allowed age: 2.253314528101817days
> > > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max
> > > allowed age: 2.250524870034248days
> > > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max
> > > allowed age: 2.246784618270532days
> > > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max
> > > allowed age: 2.242399422127049days
> > > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max
> > > allowed age: 2.240250654734792days
> > > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max
> > > allowed age: 2.240516983117894days
> > > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max
> > > allowed age: 2.235834143724352days
> > > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max
> > > allowed age: 2.233297436815162days
> > > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_224' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect resource
> > > usage for executor 'executor_Task_Tracker_230' of framework
> > > '201307040929-252063498-5050-27411-0000': Future discarded
> > > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max
> > > allowed age: 2.232469873049410days
> > >
> > >
> > > Guodong
> > >
> > >
> > > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <
> > benjamin.mahler@gmail.com
> > > > wrote:
> > >
> > >> Are these the un-edited logs? I'm expecting to see some logs from the
> > >> process_isolator or cgroups_isolator in there.
> > >>
> > >>
> > >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> > >> brenden.matthews@airbedandbreakfast.com> wrote:
> > >>
> > >> > Hey guys,
> > >> >
> > >> > I'm currently having a problem where tasks will get stuck in the
> > staging
> > >> > state, though according to the logs they should have been
> terminated.
> > >>  They
> > >> > hang indefinitely, or until I restart the slave.  Below is a
> > screenshot
> > >> +
> > >> > logs.  Also interesting is the 'Failed to collect resource usage
> ...'
> > >> > messages.
> > >> >
> > >> > [image: Inline image 2]
> > >> >
> > >> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> > >> >> n for framework chronos
> > >> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> > >> >> or framework chronos
> > >> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor
> directory
> > >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> > >> >>
> > >> >>
> > >>
> >
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> > >> >> f2-4d1ce60d618f'
> > >> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> > >> >>
> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> > >> >> or executor
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> of
> > >> >> framework 'c
> > >> >> hronos
> > >> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached
> file
> > >> >>
> > >>
> >
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> > >> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%.
> Max
> > >> >> allowed age: 22.955009563956388hrs
> > >> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect
> > >> resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> > >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition
> of
> > >> >> framework chronos because it did not register within 1mins
> > >> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%.
> Max
> > >> >> allowed age: 22.955009342481944hrs
> > >> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect
> > resource
> > >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> > >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by Vinod Kone <vi...@gmail.com>.
hey guodong, which of these task(s) is stuck in STAGING? also, the
corresponding master's logs would also be helpful here. also which version
of mesos are you running?


On Mon, Jul 8, 2013 at 11:02 PM, 王国栋 <wa...@gmail.com> wrote:

> It is very interesting that there are these logs.
>
> I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
> Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
> Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update
> TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
> Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
> from executor(1)@10.47.6.21:27786
> I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received status
> update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
> Task_Tracker_230 of framework 201307040929-252063498-5
> 050-27411-0000 with checkpoint=false
> I0709 00:33:43.973192 10994 status_update_manager.cpp:336] Forwarding
> status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> for task Task_Tracker_230 of framework 201307040929-252063498
> -5050-27411-0000 to master@10.47.6.15:5050
> I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement for
> status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
> for task Task_Tracker_230 of framework 201307040929-2520634
> 98-5050-27411-0000 to executor(1)@10.47.6.21:27786
> I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received status
> update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for task
> Task_Tracker_230 of framework 201307040929-252063498-5050-27
> 411-0000
> I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning up
> status update stream for task Task_Tracker_230 of framework
> 201307040929-252063498-5050-27411-0000
> I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update
> TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
> Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
> from executor(1)@10.47.6.21:2310
> I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received status
> update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
> Task_Tracker_224 of framework 201307040929-252063498-5
> 050-27411-0000 with checkpoint=false
> I0709 00:33:44.090973 11003 status_update_manager.cpp:336] Forwarding
> status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> for task Task_Tracker_224 of framework 201307040929-252063498
> -5050-27411-0000 to master@10.47.6.15:5050
> I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement for
> status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
> for task Task_Tracker_224 of framework 201307040929-2520634
> 98-5050-27411-0000 to executor(1)@10.47.6.21:2310
> I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received status
> update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for task
> Task_Tracker_224 of framework 201307040929-252063498-5050-27
> 411-0000
> I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning up
> status update stream for task Task_Tracker_224 of framework
> 201307040929-252063498-5050-27411-0000
> I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%. Max
> allowed age: 2.279168852469954days
> W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
>
>
>
> Guodong
>
>
> On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:
>
> > Hi ben,
> >
> > I ran into the same issue here.
> >
> > This also happens in our hadoop framework. The slave log is like these.
> At
> > that time, I think the work load of the node is very high.
> >
> > I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> > Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000
> > I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> > for removal
> > I0708 23:36:44.256206 11001 slave.cpp:837] Launching task
> Task_Tracker_224
> > for framework 201307040929-252063498-5050-27411-0000
> > I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> > executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> > with resources cpus=1; mem=1280' for framework
> > 201307040929-252063498-5050-27411-0
> > 000
> > I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task
> 'Task_Tracker_224'
> > for executor executor_Task_Tracker_224 of framework
> > '201307040929-252063498-5050-27411-0000
> > I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor at
> > 2220
> > I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> > I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max
> > allowed age: 2.295155852123924days
> > I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for executor
> > 'executor_Task_Tracker_224' of framework
> > 201307040929-252063498-5050-27411-0000
> > I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
> > Task_Tracker_224 for executor 'executor_Task_Tracker_224' of framework
> > 201307040929-252063498-5050-27411-0000
> > I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
> > TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 f
> > rom executor(1)@10.47.6.21:2310
> > I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received
> status
> > update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> > Task_Tracker_224 of framework 201307040929-252063498-50
> > 50-27411-0000 with checkpoint=false
> > I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
> > StatusUpdate stream for task Task_Tracker_224 of framework
> > 201307040929-252063498-5050-27411-0000
> > I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding
> > status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> for
> > task Task_Tracker_224 of framework 201307040929-252063498-
> > 5050-27411-0000 to master@10.47.6.15:5050
> > I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement for
> > status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579)
> for
> > task Task_Tracker_224 of framework 201307040929-25206349
> > 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> > I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received
> status
> > update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task
> > Task_Tracker_224 of framework 201307040929-252063498-5050-27
> > 411-0000
> > I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> > Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000
> > I0708 23:36:50.259472 11005 slave.cpp:837] Launching task
> Task_Tracker_230
> > for framework 201307040929-252063498-5050-27411-0000
> > I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task
> 'Task_Tracker_230'
> > for executor executor_Task_Tracker_230 of framework
> > '201307040929-252063498-5050-27411-0000
> > I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> > executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
> >
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> >
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> > with resources cpus=1; mem=1280' for framework
> > 201307040929-252063498-5050-27411-0
> > 000
> > I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file
> >
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> > cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> > I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor at
> > 2851
> > I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for executor
> > 'executor_Task_Tracker_230' of framework
> > 201307040929-252063498-5050-27411-0000
> > I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
> > Task_Tracker_230 for executor 'executor_Task_Tracker_230' of framework
> > 201307040929-252063498-5050-27411-0000
> > I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
> > TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 f
> > rom executor(1)@10.47.6.21:27786
> > I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received
> status
> > update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> > Task_Tracker_230 of framework 201307040929-252063498-50
> > 50-27411-0000 with checkpoint=false
> > I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
> > StatusUpdate stream for task Task_Tracker_230 of framework
> > 201307040929-252063498-5050-27411-0000
> > I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding
> > status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> for
> > task Task_Tracker_230 of framework 201307040929-252063498-
> > 5050-27411-0000 to master@10.47.6.15:5050
> > I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement for
> > status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e)
> for
> > task Task_Tracker_230 of framework 201307040929-25206349
> > 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> > I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received
> status
> > update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task
> > Task_Tracker_230 of framework 201307040929-252063498-5050-27
> > 411-0000
> > I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max
> > allowed age: 2.293704423241597days
> > I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max
> > allowed age: 2.293703916528542days
> > I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max
> > allowed age: 2.293639867998055days
> > I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max
> > allowed age: 2.292921551567535days
> > I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max
> > allowed age: 2.291521098018820days
> > I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max
> > allowed age: 2.293668041244063days
> > I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max
> > allowed age: 2.292935638190544days
> > I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max
> > allowed age: 2.292916079066516days
> > I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max
> > allowed age: 2.291485324076945days
> > I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max
> > allowed age: 2.293641894850289days
> > I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max
> > allowed age: 2.293629429709074days
> > I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max
> > allowed age: 2.293525350847025days
> > I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max
> > allowed age: 2.292909289111539days
> > I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max
> > allowed age: 2.291438098419977days
> > I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max
> > allowed age: 2.293635104895313days
> > I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max
> > allowed age: 2.292983775931019days
> > I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max
> > allowed age: 2.286910009194236days
> > I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max
> > allowed age: 2.286909502481169days
> > I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max
> > allowed age: 2.284414244700093days
> > I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max
> > allowed age: 2.280636901540567days
> > I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max
> > allowed age: 2.279481899796968days
> > I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max
> > allowed age: 2.276566475548496days
> > I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max
> > allowed age: 2.275690368671817days
> > I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max
> > allowed age: 2.275057180034989days
> > I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max
> > allowed age: 2.273999467198449days
> > I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max
> > allowed age: 2.273472384275891days
> > I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max
> > allowed age: 2.282894612240220days
> > I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max
> > allowed age: 2.281966516603831days
> > I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max
> > allowed age: 2.281962260214144days
> > I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max
> > allowed age: 2.281791801941551days
> > I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max
> > allowed age: 2.281715288269849days
> > I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max
> > allowed age: 2.281699782850289days
> > I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max
> > allowed age: 2.280776044946192days
> > I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max
> > allowed age: 2.280772193926956days
> > I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max
> > allowed age: 2.279204525069213days
> > I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max
> > allowed age: 2.277132676719109days
> > I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max
> > allowed age: 2.280012428368322days
> > I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max
> > allowed age: 2.276733690857512days
> > I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max
> > allowed age: 2.272715152282546days
> > I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max
> > allowed age: 2.270354274804352days
> > I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max
> > allowed age: 2.266927678423322days
> > I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max
> > allowed age: 2.264218182361482days
> > I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max
> > allowed age: 2.261509598383137days
> > I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max
> > allowed age: 2.259379478031400days
> > I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max
> > allowed age: 2.255819920144039days
> > I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max
> > allowed age: 2.253314528101817days
> > I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max
> > allowed age: 2.250524870034248days
> > I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max
> > allowed age: 2.246784618270532days
> > I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max
> > allowed age: 2.242399422127049days
> > I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max
> > allowed age: 2.240250654734792days
> > I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max
> > allowed age: 2.240516983117894days
> > I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max
> > allowed age: 2.235834143724352days
> > I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max
> > allowed age: 2.233297436815162days
> > W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_224' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect resource
> > usage for executor 'executor_Task_Tracker_230' of framework
> > '201307040929-252063498-5050-27411-0000': Future discarded
> > I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max
> > allowed age: 2.232469873049410days
> >
> >
> > Guodong
> >
> >
> > On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <
> benjamin.mahler@gmail.com
> > > wrote:
> >
> >> Are these the un-edited logs? I'm expecting to see some logs from the
> >> process_isolator or cgroups_isolator in there.
> >>
> >>
> >> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> >> brenden.matthews@airbedandbreakfast.com> wrote:
> >>
> >> > Hey guys,
> >> >
> >> > I'm currently having a problem where tasks will get stuck in the
> staging
> >> > state, though according to the logs they should have been terminated.
> >>  They
> >> > hang indefinitely, or until I restart the slave.  Below is a
> screenshot
> >> +
> >> > logs.  Also interesting is the 'Failed to collect resource usage ...'
> >> > messages.
> >> >
> >> > [image: Inline image 2]
> >> >
> >> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> >> >> n for framework chronos
> >> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> >> >> or framework chronos
> >> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor directory
> >> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> >> >>
> >> >>
> >>
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> >> >> f2-4d1ce60d618f'
> >> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> >> >> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> >> >> or executor
> >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> >> framework 'c
> >> >> hronos
> >> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached file
> >> >>
> >>
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> >> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%. Max
> >> >> allowed age: 22.955009563956388hrs
> >> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect
> >> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> >> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> >> framework chronos because it did not register within 1mins
> >> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%. Max
> >> >> allowed age: 22.955009342481944hrs
> >> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect
> resource
> >> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> >
> >> >
> >> >
> >>
> >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by 王国栋 <wa...@gmail.com>.
It is very interesting that there are these logs.

I0709 00:33:43.833853 11002 slave.cpp:996] Asked to kill task
Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
I0709 00:33:43.835552 11006 slave.cpp:996] Asked to kill task
Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
I0709 00:33:43.972771 10994 slave.cpp:1692] Handling status update
TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000
from executor(1)@10.47.6.21:27786
I0709 00:33:43.973132 10994 status_update_manager.cpp:290] Received status
update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c) for task
Task_Tracker_230 of framework 201307040929-252063498-5
050-27411-0000 with checkpoint=false
I0709 00:33:43.973192 10994 status_update_manager.cpp:336] Forwarding
status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
for task Task_Tracker_230 of framework 201307040929-252063498
-5050-27411-0000 to master@10.47.6.15:5050
I0709 00:33:43.973846 11005 slave.cpp:1809] Sending acknowledgement for
status update TASK_FINISHED (UUID: 372081cc-edf2-4183-a461-9345ab6d279c)
for task Task_Tracker_230 of framework 201307040929-2520634
98-5050-27411-0000 to executor(1)@10.47.6.21:27786
I0709 00:33:43.974591 11000 status_update_manager.cpp:360] Received status
update acknowledgement 372081cc-edf2-4183-a461-9345ab6d279c for task
Task_Tracker_230 of framework 201307040929-252063498-5050-27
411-0000
I0709 00:33:43.974652 11000 status_update_manager.cpp:481] Cleaning up
status update stream for task Task_Tracker_230 of framework
201307040929-252063498-5050-27411-0000
I0709 00:33:44.090603 11003 slave.cpp:1692] Handling status update
TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000
from executor(1)@10.47.6.21:2310
I0709 00:33:44.090860 11003 status_update_manager.cpp:290] Received status
update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88) for task
Task_Tracker_224 of framework 201307040929-252063498-5
050-27411-0000 with checkpoint=false
I0709 00:33:44.090973 11003 status_update_manager.cpp:336] Forwarding
status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
for task Task_Tracker_224 of framework 201307040929-252063498
-5050-27411-0000 to master@10.47.6.15:5050
I0709 00:33:44.091279 11003 slave.cpp:1809] Sending acknowledgement for
status update TASK_FINISHED (UUID: 61d5775a-2375-412a-a5a4-80ab55163d88)
for task Task_Tracker_224 of framework 201307040929-2520634
98-5050-27411-0000 to executor(1)@10.47.6.21:2310
I0709 00:33:44.093286 11003 status_update_manager.cpp:360] Received status
update acknowledgement 61d5775a-2375-412a-a5a4-80ab55163d88 for task
Task_Tracker_224 of framework 201307040929-252063498-5050-27
411-0000
I0709 00:33:44.093359 11003 status_update_manager.cpp:481] Cleaning up
status update stream for task Task_Tracker_224 of framework
201307040929-252063498-5050-27411-0000
I0709 00:33:45.259831 10997 slave.cpp:2511] Current usage 57.44%. Max
allowed age: 2.279168852469954days
W0709 00:33:45.949470 10996 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:33:47.063181 11005 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:33:50.950412 11000 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:33:52.063576 10993 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:33:55.951427 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:33:57.064575 10998 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:00.952390 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:02.065218 10998 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:05.953456 10995 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:07.066515 10995 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:10.954479 10998 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:12.067471 11005 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:15.955461 10996 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:34:17.068209 10996 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded



Guodong


On Tue, Jul 9, 2013 at 1:59 PM, 王国栋 <wa...@gmail.com> wrote:

> Hi ben,
>
> I ran into the same issue here.
>
> This also happens in our hadoop framework. The slave log is like these. At
> that time, I think the work load of the node is very high.
>
> I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
> Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000
> I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
> for removal
> I0708 23:36:44.256206 11001 slave.cpp:837] Launching task Task_Tracker_224
> for framework 201307040929-252063498-5050-27411-0000
> I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
> executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
> with resources cpus=1; mem=1280' for framework
> 201307040929-252063498-5050-27411-0
> 000
> I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task 'Task_Tracker_224'
> for executor executor_Task_Tracker_224 of framework
> '201307040929-252063498-5050-27411-0000
> I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor at
> 2220
> I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
> I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max
> allowed age: 2.295155852123924days
> I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for executor
> 'executor_Task_Tracker_224' of framework
> 201307040929-252063498-5050-27411-0000
> I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
> Task_Tracker_224 for executor 'executor_Task_Tracker_224' of framework
> 201307040929-252063498-5050-27411-0000
> I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
> TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 f
> rom executor(1)@10.47.6.21:2310
> I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received status
> update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
> Task_Tracker_224 of framework 201307040929-252063498-50
> 50-27411-0000 with checkpoint=false
> I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
> StatusUpdate stream for task Task_Tracker_224 of framework
> 201307040929-252063498-5050-27411-0000
> I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding
> status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
> task Task_Tracker_224 of framework 201307040929-252063498-
> 5050-27411-0000 to master@10.47.6.15:5050
> I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement for
> status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
> task Task_Tracker_224 of framework 201307040929-25206349
> 8-5050-27411-0000 to executor(1)@10.47.6.21:2310
> I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received status
> update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task
> Task_Tracker_224 of framework 201307040929-252063498-5050-27
> 411-0000
> I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
> Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000
> I0708 23:36:50.259472 11005 slave.cpp:837] Launching task Task_Tracker_230
> for framework 201307040929-252063498-5050-27411-0000
> I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task 'Task_Tracker_230'
> for executor executor_Task_Tracker_230 of framework
> '201307040929-252063498-5050-27411-0000
> I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
> executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
> /data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
> orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
> with resources cpus=1; mem=1280' for framework
> 201307040929-252063498-5050-27411-0
> 000
> I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file
> '/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
> cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
> I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor at
> 2851
> I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for executor
> 'executor_Task_Tracker_230' of framework
> 201307040929-252063498-5050-27411-0000
> I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
> Task_Tracker_230 for executor 'executor_Task_Tracker_230' of framework
> 201307040929-252063498-5050-27411-0000
> I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
> TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 f
> rom executor(1)@10.47.6.21:27786
> I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received status
> update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
> Task_Tracker_230 of framework 201307040929-252063498-50
> 50-27411-0000 with checkpoint=false
> I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
> StatusUpdate stream for task Task_Tracker_230 of framework
> 201307040929-252063498-5050-27411-0000
> I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding
> status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
> task Task_Tracker_230 of framework 201307040929-252063498-
> 5050-27411-0000 to master@10.47.6.15:5050
> I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement for
> status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
> task Task_Tracker_230 of framework 201307040929-25206349
> 8-5050-27411-0000 to executor(1)@10.47.6.21:27786
> I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received status
> update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task
> Task_Tracker_230 of framework 201307040929-252063498-5050-27
> 411-0000
> I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max
> allowed age: 2.293704423241597days
> I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max
> allowed age: 2.293703916528542days
> I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max
> allowed age: 2.293639867998055days
> I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max
> allowed age: 2.292921551567535days
> I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max
> allowed age: 2.291521098018820days
> I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max
> allowed age: 2.293668041244063days
> I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max
> allowed age: 2.292935638190544days
> I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max
> allowed age: 2.292916079066516days
> I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max
> allowed age: 2.291485324076945days
> I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max
> allowed age: 2.293641894850289days
> I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max
> allowed age: 2.293629429709074days
> I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max
> allowed age: 2.293525350847025days
> I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max
> allowed age: 2.292909289111539days
> I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max
> allowed age: 2.291438098419977days
> I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max
> allowed age: 2.293635104895313days
> I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max
> allowed age: 2.292983775931019days
> I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max
> allowed age: 2.286910009194236days
> I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max
> allowed age: 2.286909502481169days
> I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max
> allowed age: 2.284414244700093days
> I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max
> allowed age: 2.280636901540567days
> I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max
> allowed age: 2.279481899796968days
> I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max
> allowed age: 2.276566475548496days
> I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max
> allowed age: 2.275690368671817days
> I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max
> allowed age: 2.275057180034989days
> I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max
> allowed age: 2.273999467198449days
> I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max
> allowed age: 2.273472384275891days
> I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max
> allowed age: 2.282894612240220days
> I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max
> allowed age: 2.281966516603831days
> I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max
> allowed age: 2.281962260214144days
> I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max
> allowed age: 2.281791801941551days
> I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max
> allowed age: 2.281715288269849days
> I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max
> allowed age: 2.281699782850289days
> I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max
> allowed age: 2.280776044946192days
> I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max
> allowed age: 2.280772193926956days
> I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max
> allowed age: 2.279204525069213days
> I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max
> allowed age: 2.277132676719109days
> I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max
> allowed age: 2.280012428368322days
> I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max
> allowed age: 2.276733690857512days
> I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max
> allowed age: 2.272715152282546days
> I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max
> allowed age: 2.270354274804352days
> I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max
> allowed age: 2.266927678423322days
> I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max
> allowed age: 2.264218182361482days
> I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max
> allowed age: 2.261509598383137days
> I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max
> allowed age: 2.259379478031400days
> I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max
> allowed age: 2.255819920144039days
> I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max
> allowed age: 2.253314528101817days
> I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max
> allowed age: 2.250524870034248days
> I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max
> allowed age: 2.246784618270532days
> I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max
> allowed age: 2.242399422127049days
> I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max
> allowed age: 2.240250654734792days
> I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max
> allowed age: 2.240516983117894days
> I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max
> allowed age: 2.235834143724352days
> I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max
> allowed age: 2.233297436815162days
> W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_224' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect resource
> usage for executor 'executor_Task_Tracker_230' of framework
> '201307040929-252063498-5050-27411-0000': Future discarded
> I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max
> allowed age: 2.232469873049410days
>
>
> Guodong
>
>
> On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler <benjamin.mahler@gmail.com
> > wrote:
>
>> Are these the un-edited logs? I'm expecting to see some logs from the
>> process_isolator or cgroups_isolator in there.
>>
>>
>> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
>> brenden.matthews@airbedandbreakfast.com> wrote:
>>
>> > Hey guys,
>> >
>> > I'm currently having a problem where tasks will get stuck in the staging
>> > state, though according to the logs they should have been terminated.
>>  They
>> > hang indefinitely, or until I restart the slave.  Below is a screenshot
>> +
>> > logs.  Also interesting is the 'Failed to collect resource usage ...'
>> > messages.
>> >
>> > [image: Inline image 2]
>> >
>> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
>> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
>> >> n for framework chronos
>> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
>> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
>> >> or framework chronos
>> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor directory
>> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
>> >>
>> >>
>> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
>> >> f2-4d1ce60d618f'
>> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
>> >> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
>> >> or executor
>> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
>> >> framework 'c
>> >> hronos
>> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached file
>> >>
>> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
>> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%. Max
>> >> allowed age: 22.955009563956388hrs
>> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect
>> resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
>> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
>> >> framework chronos because it did not register within 1mins
>> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%. Max
>> >> allowed age: 22.955009342481944hrs
>> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect resource
>> >> usage for executor 'executor_Task_Tracker_8023' of framework
>> >> '201307030043-2037266954-5050-15277-0006': Future discarded
>> >
>> >
>> >
>>
>
>

Re: Tasks stuck in 'STAGING'

Posted by 王国栋 <wa...@gmail.com>.
Hi ben,

I ran into the same issue here.

This also happens in our hadoop framework. The slave log is like these. At
that time, I think the work load of the node is very high.

I0708 23:36:44.253880 11005 slave.cpp:739] Got assigned task
Task_Tracker_224 for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:44.255221 10999 gc.cpp:84] Unscheduling
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000'
for removal
I0708 23:36:44.256206 11001 slave.cpp:837] Launching task Task_Tracker_224
for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:44.258117 11001 paths.hpp:303] Created executor directory
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
I0708 23:36:44.258467 10991 process_isolator.cpp:99] Launching
executor_Task_Tracker_224 (cd hadoop && ./bin/mesos-executor) in
/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1
with resources cpus=1; mem=1280' for framework
201307040929-252063498-5050-27411-0
000
I0708 23:36:44.258496 11001 slave.cpp:948] Queuing task 'Task_Tracker_224'
for executor executor_Task_Tracker_224 of framework
'201307040929-252063498-5050-27411-0000
I0708 23:36:44.261446 10991 process_isolator.cpp:161] Forked executor at
2220
I0708 23:36:44.261787 10996 slave.cpp:511] Successfully attached file
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_224/runs/953d3565-424c-4ab3-9926-a3fa71042bf1'
I0708 23:36:44.580497 10996 slave.cpp:2511] Current usage 57.21%. Max
allowed age: 2.295155852123924days
I0708 23:36:44.750393 11002 slave.cpp:1395] Got registration for executor
'executor_Task_Tracker_224' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:44.751095 11002 slave.cpp:1510] Flushing queued task
Task_Tracker_224 for executor 'executor_Task_Tracker_224' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:46.144317 11006 slave.cpp:1692] Handling status update
TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
Task_Tracker_224 of framework 201307040929-252063498-5050-27411-0000 f
rom executor(1)@10.47.6.21:2310
I0708 23:36:46.144745 11006 status_update_manager.cpp:290] Received status
update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for task
Task_Tracker_224 of framework 201307040929-252063498-50
50-27411-0000 with checkpoint=false
I0708 23:36:46.144821 11006 status_update_manager.cpp:450] Creating
StatusUpdate stream for task Task_Tracker_224 of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:46.145076 11006 status_update_manager.cpp:336] Forwarding
status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
task Task_Tracker_224 of framework 201307040929-252063498-
5050-27411-0000 to master@10.47.6.15:5050
I0708 23:36:46.145882 10997 slave.cpp:1809] Sending acknowledgement for
status update TASK_RUNNING (UUID: 364ee347-f6a2-4c7b-8702-460aa0ece579) for
task Task_Tracker_224 of framework 201307040929-25206349
8-5050-27411-0000 to executor(1)@10.47.6.21:2310
I0708 23:36:46.146870 10993 status_update_manager.cpp:360] Received status
update acknowledgement 364ee347-f6a2-4c7b-8702-460aa0ece579 for task
Task_Tracker_224 of framework 201307040929-252063498-5050-27
411-0000
I0708 23:36:50.258347 11005 slave.cpp:739] Got assigned task
Task_Tracker_230 for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:50.259472 11005 slave.cpp:837] Launching task Task_Tracker_230
for framework 201307040929-252063498-5050-27411-0000
I0708 23:36:50.261641 11005 paths.hpp:303] Created executor directory
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
I0708 23:36:50.262265 11005 slave.cpp:948] Queuing task 'Task_Tracker_230'
for executor executor_Task_Tracker_230 of framework
'201307040929-252063498-5050-27411-0000
I0708 23:36:50.262557 11005 process_isolator.cpp:99] Launching
executor_Task_Tracker_230 (cd hadoop && ./bin/mesos-executor) in
/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/framew
orks/201307040929-252063498-5050-27411-0000/executors/executor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd
with resources cpus=1; mem=1280' for framework
201307040929-252063498-5050-27411-0
000
I0708 23:36:50.265396 10999 slave.cpp:511] Successfully attached file
'/data/mesos-slave-work-dir//slaves/201307041648-252063498-5050-8038-5/frameworks/201307040929-252063498-5050-27411-0000/executors/exe
cutor_Task_Tracker_230/runs/1dc87acc-d090-469f-ba30-0477139ee7fd'
I0708 23:36:50.265419 11005 process_isolator.cpp:161] Forked executor at
2851
I0708 23:36:50.835607 10995 slave.cpp:1395] Got registration for executor
'executor_Task_Tracker_230' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:50.836174 10995 slave.cpp:1510] Flushing queued task
Task_Tracker_230 for executor 'executor_Task_Tracker_230' of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:54.617856 10994 slave.cpp:1692] Handling status update
TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
Task_Tracker_230 of framework 201307040929-252063498-5050-27411-0000 f
rom executor(1)@10.47.6.21:27786
I0708 23:36:54.618275 10994 status_update_manager.cpp:290] Received status
update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for task
Task_Tracker_230 of framework 201307040929-252063498-50
50-27411-0000 with checkpoint=false
I0708 23:36:54.618326 10994 status_update_manager.cpp:450] Creating
StatusUpdate stream for task Task_Tracker_230 of framework
201307040929-252063498-5050-27411-0000
I0708 23:36:54.618443 10994 status_update_manager.cpp:336] Forwarding
status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
task Task_Tracker_230 of framework 201307040929-252063498-
5050-27411-0000 to master@10.47.6.15:5050
I0708 23:36:54.619137 10994 slave.cpp:1809] Sending acknowledgement for
status update TASK_RUNNING (UUID: 7753252d-c90b-4b0d-adca-7c97f38f692e) for
task Task_Tracker_230 of framework 201307040929-25206349
8-5050-27411-0000 to executor(1)@10.47.6.21:27786
I0708 23:36:54.637682 10994 status_update_manager.cpp:360] Received status
update acknowledgement 7753252d-c90b-4b0d-adca-7c97f38f692e for task
Task_Tracker_230 of framework 201307040929-252063498-5050-27
411-0000
I0708 23:37:44.583014 11002 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293704423241597days
I0708 23:38:44.585233 11003 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293703916528542days
I0708 23:39:44.599442 11006 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293639867998055days
I0708 23:40:44.603996 10997 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292921551567535days
I0708 23:41:44.608608 11006 slave.cpp:2511] Current usage 57.26%. Max
allowed age: 2.291521098018820days
I0708 23:42:44.609956 10992 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293668041244063days
I0708 23:43:44.682621 11000 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292935638190544days
I0708 23:44:44.684306 10993 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292916079066516days
I0708 23:45:44.686172 11001 slave.cpp:2511] Current usage 57.26%. Max
allowed age: 2.291485324076945days
I0708 23:46:44.699095 10995 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293641894850289days
I0708 23:47:44.721156 10998 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293629429709074days
I0708 23:48:44.779767 10992 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.293525350847025days
I0708 23:49:44.812389 11004 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292909289111539days
I0708 23:50:44.814146 10999 slave.cpp:2511] Current usage 57.27%. Max
allowed age: 2.291438098419977days
I0708 23:51:44.814877 11005 slave.cpp:2511] Current usage 57.23%. Max
allowed age: 2.293635104895313days
I0708 23:52:44.818620 10998 slave.cpp:2511] Current usage 57.24%. Max
allowed age: 2.292983775931019days
I0708 23:53:44.829911 10997 slave.cpp:2511] Current usage 57.33%. Max
allowed age: 2.286910009194236days
I0708 23:54:44.831307 10999 slave.cpp:2511] Current usage 57.33%. Max
allowed age: 2.286909502481169days
I0708 23:55:44.902858 10994 slave.cpp:2511] Current usage 57.37%. Max
allowed age: 2.284414244700093days
I0708 23:56:44.905398 11002 slave.cpp:2511] Current usage 57.42%. Max
allowed age: 2.280636901540567days
I0708 23:57:44.933673 10991 slave.cpp:2511] Current usage 57.44%. Max
allowed age: 2.279481899796968days
I0708 23:58:44.934840 11004 slave.cpp:2511] Current usage 57.48%. Max
allowed age: 2.276566475548496days
I0708 23:59:44.936063 11001 slave.cpp:2511] Current usage 57.49%. Max
allowed age: 2.275690368671817days
I0709 00:00:44.937433 11004 slave.cpp:2511] Current usage 57.50%. Max
allowed age: 2.275057180034989days
I0709 00:01:44.938940 11001 slave.cpp:2511] Current usage 57.51%. Max
allowed age: 2.273999467198449days
I0709 00:02:44.955103 10996 slave.cpp:2511] Current usage 57.52%. Max
allowed age: 2.273472384275891days
I0709 00:03:44.956354 10993 slave.cpp:2511] Current usage 57.39%. Max
allowed age: 2.282894612240220days
I0709 00:04:44.957926 10997 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281966516603831days
I0709 00:05:44.969205 10996 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281962260214144days
I0709 00:06:44.969987 10992 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281791801941551days
I0709 00:07:44.977504 11004 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281715288269849days
I0709 00:08:44.982868 10998 slave.cpp:2511] Current usage 57.40%. Max
allowed age: 2.281699782850289days
I0709 00:09:44.997082 11000 slave.cpp:2511] Current usage 57.42%. Max
allowed age: 2.280776044946192days
I0709 00:10:44.998754 10994 slave.cpp:2511] Current usage 57.42%. Max
allowed age: 2.280772193926956days
I0709 00:11:44.999949 11002 slave.cpp:2511] Current usage 57.44%. Max
allowed age: 2.279204525069213days
I0709 00:12:45.001539 10995 slave.cpp:2511] Current usage 57.47%. Max
allowed age: 2.277132676719109days
I0709 00:13:45.002728 10992 slave.cpp:2511] Current usage 57.43%. Max
allowed age: 2.280012428368322days
I0709 00:14:45.009699 10998 slave.cpp:2511] Current usage 57.48%. Max
allowed age: 2.276733690857512days
I0709 00:15:45.013483 10996 slave.cpp:2511] Current usage 57.53%. Max
allowed age: 2.272715152282546days
I0709 00:16:45.015496 10998 slave.cpp:2511] Current usage 57.57%. Max
allowed age: 2.270354274804352days
I0709 00:17:45.016628 11000 slave.cpp:2511] Current usage 57.62%. Max
allowed age: 2.266927678423322days
I0709 00:18:45.032670 11002 slave.cpp:2511] Current usage 57.65%. Max
allowed age: 2.264218182361482days
I0709 00:19:45.043442 10998 slave.cpp:2511] Current usage 57.69%. Max
allowed age: 2.261509598383137days
I0709 00:20:45.080648 10992 slave.cpp:2511] Current usage 57.72%. Max
allowed age: 2.259379478031400days
I0709 00:21:45.081632 10995 slave.cpp:2511] Current usage 57.77%. Max
allowed age: 2.255819920144039days
I0709 00:22:45.082593 11005 slave.cpp:2511] Current usage 57.81%. Max
allowed age: 2.253314528101817days
I0709 00:23:45.193588 10997 slave.cpp:2511] Current usage 57.85%. Max
allowed age: 2.250524870034248days
I0709 00:24:45.220617 10994 slave.cpp:2511] Current usage 57.90%. Max
allowed age: 2.246784618270532days
I0709 00:25:45.241602 10992 slave.cpp:2511] Current usage 57.97%. Max
allowed age: 2.242399422127049days
I0709 00:26:45.248977 11000 slave.cpp:2511] Current usage 58.00%. Max
allowed age: 2.240250654734792days
I0709 00:27:45.250953 10993 slave.cpp:2511] Current usage 57.99%. Max
allowed age: 2.240516983117894days
I0709 00:28:45.252694 10996 slave.cpp:2511] Current usage 58.06%. Max
allowed age: 2.235834143724352days
I0709 00:29:45.254992 11003 slave.cpp:2511] Current usage 58.10%. Max
allowed age: 2.233297436815162days
W0709 00:30:06.753098 10999 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:10.715373 10996 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:11.754446 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:15.719880 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:16.755473 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:20.720330 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:21.766019 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:25.721364 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:26.768874 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:30.722605 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:31.770354 11003 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:35.724455 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:36.788751 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:40.745380 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_224' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
W0709 00:30:41.789358 10992 monitor.cpp:186] Failed to collect resource
usage for executor 'executor_Task_Tracker_230' of framework
'201307040929-252063498-5050-27411-0000': Future discarded
I0709 00:30:45.256590 11004 slave.cpp:2511] Current usage 58.11%. Max
allowed age: 2.232469873049410days


Guodong


On Tue, Jul 9, 2013 at 4:55 AM, Benjamin Mahler
<be...@gmail.com>wrote:

> Are these the un-edited logs? I'm expecting to see some logs from the
> process_isolator or cgroups_isolator in there.
>
>
> On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
> brenden.matthews@airbedandbreakfast.com> wrote:
>
> > Hey guys,
> >
> > I'm currently having a problem where tasks will get stuck in the staging
> > state, though according to the logs they should have been terminated.
>  They
> > hang indefinitely, or until I restart the slave.  Below is a screenshot +
> > logs.  Also interesting is the 'Failed to collect resource usage ...'
> > messages.
> >
> > [image: Inline image 2]
> >
> > I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
> >> n for framework chronos
> >> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
> >> or framework chronos
> >> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor directory
> >> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
> >>
> >>
> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
> >> f2-4d1ce60d618f'
> >> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
> >> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
> >> or executor
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> framework 'c
> >> hronos
> >> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached file
> >>
> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
> >> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%. Max
> >> allowed age: 22.955009563956388hrs
> >> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
> >> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
> >> framework chronos because it did not register within 1mins
> >> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%. Max
> >> allowed age: 22.955009342481944hrs
> >> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect resource
> >> usage for executor 'executor_Task_Tracker_8023' of framework
> >> '201307030043-2037266954-5050-15277-0006': Future discarded
> >
> >
> >
>

Re: Tasks stuck in 'STAGING'

Posted by Benjamin Mahler <be...@gmail.com>.
Are these the un-edited logs? I'm expecting to see some logs from the
process_isolator or cgroups_isolator in there.


On Fri, Jul 5, 2013 at 2:38 PM, Brenden Matthews <
brenden.matthews@airbedandbreakfast.com> wrote:

> Hey guys,
>
> I'm currently having a problem where tasks will get stuck in the staging
> state, though according to the logs they should have been terminated.  They
> hang indefinitely, or until I restart the slave.  Below is a screenshot +
> logs.  Also interesting is the 'Failed to collect resource usage ...'
> messages.
>
> [image: Inline image 2]
>
> I0705 16:19:51.551512  9706 slave.cpp:739] Got assigned task
>> ct:1373041190990:0:add_latest_reservation_survey_events_partitio
>> n for framework chronos
>> I0705 16:19:51.552150  9706 slave.cpp:837] Launching task
>> ct:1373041190990:0:add_latest_reservation_survey_events_partition f
>> or framework chronos
>> I0705 16:19:51.553956  9706 paths.hpp:303] Created executor directory
>> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1
>>
>> 517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8c
>> f2-4d1ce60d618f'
>> I0705 16:19:51.554576  9706 slave.cpp:948] Queuing task
>> 'ct:1373041190990:0:add_latest_reservation_survey_events_partition' f
>> or executor
>> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
>> framework 'c
>> hronos
>> I0705 16:19:51.555027  9706 slave.cpp:511] Successfully attached file
>> '/tmp/mesos/slaves/201307030043-2037266954-5050-15277-1517/frameworks/chronos/executors/ct:1373041190990:0:add_latest_reservation_survey_events_partition/runs/611ba128-557f-4b5e-8cf2-4d1ce60d618f'
>> I0705 16:19:54.048754  9724 slave.cpp:2530] Current usage 42.18%. Max
>> allowed age: 22.955009563956388hrs
>> W0705 16:19:54.108963  9724 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:19:59.110787  9729 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:04.112406  9704 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:09.114367  9705 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:14.116312  9706 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:19.118370  9699 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:24.120311  9701 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:29.122355  9700 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:34.123443  9722 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:39.125660  9718 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:44.127464  9724 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:49.129385  9725 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> I0705 16:20:51.555174  9703 slave.cpp:2482] Terminating executor
>> ct:1373041190990:0:add_latest_reservation_survey_events_partition of
>> framework chronos because it did not register within 1mins
>> I0705 16:20:54.050434  9717 slave.cpp:2530] Current usage 42.18%. Max
>> allowed age: 22.955009342481944hrs
>> W0705 16:20:54.130730  9699 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:20:59.132472  9702 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:21:04.134557  9713 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>> W0705 16:21:09.135619  9701 monitor.cpp:186] Failed to collect resource
>> usage for executor 'executor_Task_Tracker_8023' of framework
>> '201307030043-2037266954-5050-15277-0006': Future discarded
>
>
>